[project] wmlxgettext - python3 version

Discussion of all aspects of the game engine, including development of new and existing features.

Moderator: Forum Moderators

Post Reply
User avatar
Nobun
Code Contributor
Posts: 129
Joined: May 4th, 2009, 9:28 pm
Location: Italy

[project] wmlxgettext - python3 version

Post by Nobun »

Hi. I'm trying to develop from scratch a python3 version of wmlxgettext, based on line-by-line regexp parsing like the current wmlxgettext perl version.
This new wmlxgettext should be 100% retro-compatible with the current wmlxgettext syntax and output:
1) it will support the current command line syntax (with output on stdout) adding some new features (like an optional -o option for printing output on file instead of the default stdout)
2)il should create the same output that would be generated by the perl script
But now I need your opinion about error/warnings handling, that could behave very differently if compared by perl wmlxgettext.
My starting questions are:
- Am I sure I should raise a warn and not an error when I encountered an invalid WML file?
- when I exactly should I consider a WML file a non-valid-one and when should I raise a warning and not an error?

The questions are related on those scenarios:
Scenario 1: Unbalanced/malformed tags encountered
Scenario 2: found a string where the "campaign" maintainer probably forgot an enclosing quote
Last thing: should I add or should I drop the translatable strings found on a "non-valid-WML" file?

-------------------------------------------
Scenario 1 can be explained easily:

for example I found an enclosing [/wrong_foo] tag when [/foo] expected (or if a closing tag is not expected at all because there isn't any tag still open)
Or... the WML file reaches its end when some tags where not still closed (for example... missing [/scenario] or [/multiplayer] closing tag)
but also could happen when we found [/foo] closing tag when #enddef expected (because my wmlxgettext will treat #define/#enddef as wmlnodes like they were tags).

How I think to manage this scenario:
- if I find a [/wrong foo] tag instead of [/foo], generate a warning, but however close the tag and continue parsing the file
- if I find a [/wrong foo] tag when NO CLOSE TAG AT ALL was expected, generate an error and stops the program (malformed wml tags)
- if I find a [/wrong foo] tag when #enddef expected, DON'T close the current node (#define root body) until an #enddef was found (generates a warning)

I wish to hear your opinion about this possible solution and about other possible solutions...
... the intention is to find the "best" solution possible

---------------------------
Scenario 2:
Perhaps you forgot a closing quote (") on WML?
This should happen when, on a multiline translatable string you find a new line that matches one of those cases:
1) you find a #define, #enddef, or #textdomain preprocessor directive
2) you find a line like " something = something else "
3) you find a line with only [something] or [/something] (probably: a open/close tag was meant here and not a string)
As far as I can understand, currently the perl wmlxgettext version checks the subcase 2, and, if encountered, treats the wml file as a non-valid wml file (while I don't know if it checks case 1 and 3).[/indent]

But I am not sure about if it is really a good idea to keep this feature active for different reasons:

- performance: this kind of check could be highly reduce wmlxgettext performance without an actual reason. Infact:
1) you cannot be sure that "campaign" maintainer actually did an error, expecially on case 2.
It could be, for example, a riddle where "something = something else" are clues you got to solve the riddle
2) before trying to make a translation you would probably tried to run your add-on and you verified if there were WML errors.
If you truncate the translatable string to a different point (example, wmlxgettext thinks the strings ended at the line before [tag] encountered) you will never translate the "actual" string
- management: if I will add this kind of check, probably it is more convenient to raise an error and not a warning. I think that producing an "uncomplete" pot file (for example excluding the wrong_wml_file) or a pot file with missing sentences (if we don't add at all the sentence in the pot dictionary or if we add the "truncated" string) could be a problem for a "campaign" developers if he will try to update translations with wrong or missing strings
- exceptions: if you really want to write a translatable string (for example a message with a riddle) containing a line that normally it is considered "suspect: probably you forgot a closing quote on string" (like "something = something else) you are stuck and, in all cases, you will never be allowed to have that string translated.
This could force us to think to introduce another special comment (like # ignore-wmlxgettext-warns) to allow to define a special exception that will allow to ignore warning and to capture the entire translatable string, but this solution imho is a no-sense.
It can be a very non-intuitive requirement for a "campaign" maintainer who actually wants to write such string

please... let me know your point-of-view and purpose any think you want to share.

---------------------------------
Last thing: should I add or should I drop the translatable strings found on a "non-valid-WML" file?

When you encounter a WML file that you consider a non-valid-one (surely will happen on scenario 1... not sure if the scenario 2 will be implemented) what it is the best thing to do?

A) ignore (totally or partially) strings we encountered in that file -->
so... the effect should be (almost) to ignore that WML file.
But, imo, is not a good solution --> the pot file will lack of strings and this could have a negative impact to translation updates

B) include however the strings -->
in this case we warn the "campaign" maintainer about the problems but we however create a "complete" pot file.
The downside is that we can include sentences that shouldn't exist.
Again.... this solution could have a negative impact to translation updates

C) generate error and DON'T create pot -->
this should be the solution I would suggest and solution I think I would choose, but I'd like to hear your opinion.
Instead of generating a pot file whenever possible (the current behaviour of wmlxgettext, if I understood correctly), I should suggest to don't generate the pot file when a malformed WML file was found.
In this way the "campaign" maintainer could know what he needs to fix and will have a translation file only when all WML files are "valid"... so translation updates should be produced exactly as expected, without "hidden" problems.
User avatar
tamanegi
Posts: 161
Joined: August 25th, 2014, 11:38 am
Location: Japan

Re: [project] wmlxgettext - python3 version

Post by tamanegi »

It's a great project! :D

IMO, second rule of scenario 1, raise an error on close tag when there is nothing to be closed, sounds too strict. That strict rule may be reasonable for scenario WMLs, but not so reasonable for macro definition WML files. Unbalanced but still valid WML tags can appear in macros. (Such macros are not so preferable, though.) For example, ABILITY_FEEDING macro in abilities.cfg file of core package (core/macros/abilities.cfg) will be rejected by that rule if directive (# wmlxgettext: [abilities]) does not exist. Forcing WML authors to add such kinds of directives in those cases is little bit annoying. I guess most authors (including me) don't know those directives well.

For the "Last thing", I basically agree with your opinion. That strict rule maybe the desirable default. But A or/and B type of output may be useful in some cases. Providing options for such use may not be so disadvantageous.

# I cannot imagine some situations. :augh: Can you provide sample or pseudo code? It must be helpful...

In addition, I wish you not to raise an error when encountering empty string (_"") in your new wmlxgettext. I don't know why current perl implementation raise an error. Just showing warning message like xgettext may be enough, IMO.
Discord: @tamanegi
It is true that we cannot be free from bugs, but at least let our bugs not always the same...
A Group in a War: my first campaign, An Independence War: and the sequel
User avatar
Nobun
Code Contributor
Posts: 129
Joined: May 4th, 2009, 9:28 pm
Location: Italy

Re: [project] wmlxgettext - python3 version

Post by Nobun »

about _"" case:
I think there is a reason why currently wmlxgettext returns an error... surely we could think about ignoring that string, generate a warning, and proceed with pot generation... but we should need to know (perhaps by a developer) why currently the scripts return a critical error in this case.

about unbalanced but still valid WML:
this situation will have a dedicated solution, currently used also on perl wmlxgettext script.
And the solution is... using special comment # wmlxgettext:

if you check ABILITY_FEEDING you could notice it use this special comment that works in this way:

you have a closing tag [/abilities] unbalanced for purpose.
But, in the previous line, you could see a # wmlxgettext: [abilities]
In this way wmlxgettext will consider what follow after "wmlxgettext:" as it was a valid WML code, even if it is not an actual WML code.
In this way we say to wmlxgettext "hey... open a dummy abilities node" so it can close that abilities node succesfully, without breaking wmlxgettext internal rules and allowing the developer to do the unbalanced tags when he did it on purpose.

EDIT:
I cannot imagine some situations. :augh: Can you provide sample or pseudo code? It must be helpful...
I'm not sure I understood your question. Do you mean pseudo-WML-code examples for scenario 2?
Post Reply