WeSpell - A Python .po spellchecker

Discussion of all aspects of the game engine, including development of new and existing features.

Moderator: Forum Moderators

Post Reply
User avatar
Elvish_Hunter
Posts: 1575
Joined: September 4th, 2009, 2:39 pm
Location: Lintanir Forest...

WeSpell - A Python .po spellchecker

Post by Elvish_Hunter »

As some of you may already know, I'm registered also to the Wesnoth Italian Forum. There, RockScorpion recently finished translating After the Storm, and asked for help in catching spelling errors. (If you know Italian, or want to take a look with Google Translate, here there is the topic). Clearly, checking the .po of such a campaign by hand is no easy task, so I searched if there was a spellchecker for .po files around, and to my great surprise I was unable to find one.
So, I started writing one in Python, relying on the Enchant library. Unfortunately, my first version printed the output to the command line, and became quickly clear that the Windows' Command Prompt is unable to handle Unicode, except perhaps by using some black magic.
So, I had to implement a GUI sooner than I expected, and as usual my toolkit of choice is Tkinter with the ttk library, that by the way supports Unicode like a charm.
Finally, a first version of my .po spellchecker is here. To run it, you'll need to install Python 2.7, Enchant, and the dictionary for you language as explained in the Enchant's website.
Its current features are:
  • support for Unicode
  • support for ignore files: these files are simple text files, with one word on each line. If one of them is supplied, such file is scanned, and all its words are added to Enchant's ignore list for the current session
  • support for copying the output to the clipboard
  • support for saving the output as text, HTML, PDF file as well as printing
  • released as GPL v3 or any next version
What I added for version 0.2:
  • got rid of add_to_ignore function, now it relies on an Enchant function
  • about button
  • title for the main window and the output window
  • a better regexp to split strings, and some preliminary character substitutions
  • a better formatted title in the main screen
  • menu bar, with Preferences option disabled
  • the output window locks the main window until it is destroyed
  • added a sizegrip in the output window
  • moved buttonbox to top in output window
  • make it so that Text, Entries and Comboboxes resize if the main window is resized
  • on Linux ttk's Clam style is used
  • added Separators and padding for a better layout
  • moved buttonbox in main window to its own frame
  • added GPL v3 logo (in the main screen) and text (in the zip file)
  • added imagePath function, to allow the program finding the images based on where the script is placed
What I added for version 0.3:
  • added Bluecurve icons
  • added Bluecurve CC-BY-SA copyright note
  • added Help buttons
  • added persistency file that keeps track of last used dictionary, po file and ignore file
  • better styling on Linux
  • added support for xgettext
  • added a Choose Language dialog
  • added support for multiline msgstr; however, the price to pay was losing the line counter
What I added for version 0.4:
  • Pango markup, macros and variable names are removed from spellchecking
  • added back line numbering in output
What I added for version 1.0:
  • Use of polib and PySide libraries
  • Crystal, Gnome and Oxygen icons
  • Output as HTML, PDF and on paper
Attachments
Wespell.zip
Version 1.0
(956.53 KiB) Downloaded 376 times
po spellchecker 4.zip
Version 4
(48.16 KiB) Downloaded 440 times
po spellchecker 3.zip
Version 3
(47.74 KiB) Downloaded 366 times
po spellchecker 2.zip
Version 2
(15.64 KiB) Downloaded 463 times
po spellchecker 1.zip
(2.12 KiB) Downloaded 439 times
Last edited by Elvish_Hunter on August 28th, 2013, 8:52 am, edited 6 times in total.
Reason: Uploaded version 1.0
Current maintainer of these add-ons, all on 1.16:
The Sojournings of Grog, Children of Dragons, A Rough Life, Wesnoth Lua Pack, The White Troll (co-author)
mich
Translator
Posts: 134
Joined: November 11th, 2008, 8:54 am
Location: Italy

Re: A Python .po spellchecker

Post by mich »

Hi Elvish_Hunter, and thanks for this good tool. I find it really useful (I tend to make a lot of mistakes when writing at a decent speed, damn notebook keybord...).

I want to add that it doesn't seems to work with python 2.6, while all is ok with 2.7. The problem seems to be re.split() that doesn't support "flags=" in 2.6 (and removing that there are problems with the accents). So you probably need at least python 2.7 to make it work.
Add a preferences file?
Yes, this will be useful to keep selected the correct dictionary and exception file.

It will be really useful if you can also add a way to scan the add-on folder (or the one of a mainline campaign) searching for unit names and exclude them automatically. This will reduce drastically the false positives.

Keep up the good work.
User avatar
Elvish_Hunter
Posts: 1575
Joined: September 4th, 2009, 2:39 pm
Location: Lintanir Forest...

Re: A Python .po spellchecker

Post by Elvish_Hunter »

mich wrote:Hi Elvish_Hunter, and thanks for this good tool. I find it really useful (I tend to make a lot of mistakes when writing at a decent speed, damn notebook keybord...).
No, thank you for using it so early. :)
mich wrote:I want to add that it doesn't seems to work with python 2.6, while all is ok with 2.7. The problem seems to be re.split() that doesn't support "flags=" in 2.6 (and removing that there are problems with the accents). So you probably need at least python 2.7 to make it work.
Right. I corrected my initial post.
mich wrote:Yes, this will be useful to keep selected the correct dictionary and exception file.
There is a library for this (ConfigParser), although this isn't a high-priority modification.
mich wrote:It will be really useful if you can also add a way to scan the add-on folder (or the one of a mainline campaign) searching for unit names and exclude them automatically.
I'm not so sure that collecting a bunch of untranslated unit names will be that useful - for them there is already wmllint's spellchecker.
mich wrote:Keep up the good work.
I'll do! 8)
Current maintainer of these add-ons, all on 1.16:
The Sojournings of Grog, Children of Dragons, A Rough Life, Wesnoth Lua Pack, The White Troll (co-author)
User avatar
Elvish_Hunter
Posts: 1575
Joined: September 4th, 2009, 2:39 pm
Location: Lintanir Forest...

Re: A Python .po spellchecker

Post by Elvish_Hunter »

Version 0.2 of my spellchecker is available for download in the first post. I switched to GPL v3 license, as it grants better protection than GPL v2. A full list of all changes is in the first post, that contains also the first version: I'll keep it for historical/backup purpose.
Current maintainer of these add-ons, all on 1.16:
The Sojournings of Grog, Children of Dragons, A Rough Life, Wesnoth Lua Pack, The White Troll (co-author)
User avatar
Elvish_Hunter
Posts: 1575
Joined: September 4th, 2009, 2:39 pm
Location: Lintanir Forest...

Re: A Python .po spellchecker

Post by Elvish_Hunter »

I just made the 0.3 version of my .po spellchecker available for download in the first post. Finally, I managed to add support for multiline msgstr; but unfortunately, supporting the line number of spelling mistakes proved to be quite complex, so I had to drop this function. After all, not even GNU msgexec yields the line numbers in its outputs.
The program can now be translated, thanks to Python's gettext library, and I added some .po files for this purpose.
I added some icons taken from the Bluecurve set. According to this page, I also added the requested copyright notes, so everything should be fine.
I'll try to add back support for line numbering, should I find a good solution for it.
Current maintainer of these add-ons, all on 1.16:
The Sojournings of Grog, Children of Dragons, A Rough Life, Wesnoth Lua Pack, The White Troll (co-author)
User avatar
Elvish_Hunter
Posts: 1575
Joined: September 4th, 2009, 2:39 pm
Location: Lintanir Forest...

Re: A Python .po spellchecker

Post by Elvish_Hunter »

I just added version 0.4 to the first post. This time, the main changes are that now Pango markup, variable names and macro calls are not spellchecked; also, I added back line numbering in the output.
I tried packaging the program with py2exe, but the result was a monster of about 17 Mb :shock: . So, I decided to scrap this idea...
Anyway, the application is pretty much finished, and its future versions will probably receive only translations and bugfixes.
Current maintainer of these add-ons, all on 1.16:
The Sojournings of Grog, Children of Dragons, A Rough Life, Wesnoth Lua Pack, The White Troll (co-author)
User avatar
Elvish_Hunter
Posts: 1575
Joined: September 4th, 2009, 2:39 pm
Location: Lintanir Forest...

WeSpell - A Python .po spellchecker

Post by Elvish_Hunter »

First of all, excuse me for my sudden disappearance from both the forums and IRC, but real life has been pretty hectic over the last month.
Anyway, I'm announcing the release of a new version of my spellchecker (that I decided to call WeSpell - a pun between Wesnoth and spell). You can find it in the first post.
A lot of changes were implemented, to the point that I decided to mark this release as 1.0: almost all of the old code was discarded.
The first change that you can see is that I abandoned Tkinter as GUI toolkit, and replaced it with Qt/PySide. The downside of this change is that a plain Python installation isn't enough to run the program: you'll need to install PySide as well. On the other hand, this toolkit is much faster than Tk and it has a lot of functions that Tk misses - printing, direct support to several image formats, a lot of widgets, sound support... For example, moving to Qt allowed me to implement printing to paper or PDF.
Second notable change is that I removed my parser, and replaced it with polib. You'll need to install this library as well to run the program, but it works much better than my old parser. However, it doesn't support line numbers, so spellcheck outputs now point to msgids - that, as I remember caslav.ilic saying once, are the only way to point to a certain entry in a .po file.
Speaking of output: after spellchecking a file, you can now save the output as text, HTML, PDF, or print it on paper; you can also change font and paper size.
Another interesting change is that, instead of placing the whole program in a single file, I packaged it in several modules: each window now has its own module.
You can also choose your icon set between four: Bluecurve, Crystal, Gnome and Oxygen; every set is available in five sizes (16, 24, 32, 48, 64), so you can choose the one that best suits your display. All of them are correctly credited in the About section - in this case, the GPL v3 license applies only to my own code: every other component, library or artwork is still released under its own license.

And now, time for some questions:

How do I run it?
First of all, you need to install Python. This program should run on both 2.7 and 3.3, so you can pick up what do you prefer.
Next, you need to install PySide (http://qt-project.org/wiki/Category:Lan ... :Downloads), PyEnchant (http://pythonhosted.org/pyenchant/download.html) and polib (https://pypi.python.org/pypi/polib, read the installation guide at http://polib.readthedocs.org/en/latest/ ... stallation).
Finally, extract the content of the zip file and double click on "Po spellchecker Qt.py".

Why didn't you package it as a .exe file?
Oh, I tried by using cx_Freeze (py2exe wasn't a viable solution, because it's not available for Python3). Let's say that PyEnchant refused to be properly packaged no matter what, so I had to scrap the idea.

How do I install a dictionary?
You'll need a dictionary in MySpell/Hunspell format. Two good sources of them are OpenOffice.org's archive (http://extensions.openoffice.org/ or http://wiki.openoffice.org/w/index.php? ... did=229123) and LibreOffice (http://extensions.libreoffice.org/extension-center; alternatively http://download.documentfoundation.org/libreoffice/src/, select the latest version's directory and download the file named libreoffice-dictionaries-X.X.X.X.tar.xz; you need 7-Zip to open it).
Once that you extracted the archive and got your dictionary, put its .aif and .dic files into the dictionaries folder that you can find inside this program's directory. Restart the program if it was open. That's it.
Current maintainer of these add-ons, all on 1.16:
The Sojournings of Grog, Children of Dragons, A Rough Life, Wesnoth Lua Pack, The White Troll (co-author)
Post Reply