[mainline] there is a need for a en_US translation

Brainstorm ideas of possible additions to the game. Read this before posting!

Moderator: Forum Moderators

Forum rules
Before posting a new idea, you must read the following:
Post Reply
User avatar
Jarom
Posts: 110
Joined: January 4th, 2015, 8:23 pm
Location: Green Isle, Irdya or Poland, Earth - I'm not quite sure

Re: [mainline] there is a need for a en_US translation

Post by Jarom »

As a person who had been involved in Polish translation at some point, I can support the claim about minor changes breaking translations. I admit that I sometimes only skimmed texts marked as fuzzy and accepted translation without changes, after several previous changes that were only as much as adding a comma, article or a preposition, possibly missing an actually important addition like afforementioned "Shorbear dwarves" just because it was unreasonable to scan the whole text looking for a single comma.
I'll go out on a limb and say that the majority of the current dev team, myself included, have approximately 0% understanding of what goes into translating. So, yes, if they're frustrated about something then they do need to communicate that. If all that happens is translation updates are wordlessly sent in, my assumption is simply "well, I guess it's not that bad".
Well, I'll voice my complain while we're at it: campaigns get reinvented every major version, and until someone spends several hours, or even days, translating them again from scratch, there will be no translation. I don't think anything can or should be done about it though, it's part of the progress.
I'm not so much concerned about the double work (although it is more work for writers and editors, as I noted above) as I am about the fact that there will be two different English texts with neither one clearly and unambiguously the actual source text. I think this will make things more difficult for translators rather than less. For example, consider the French translation team. Previously, they really only needed to worry about maintaining the fr.po file. Now, there will likely be cases where the English strings in the fr.po file are out of date (as I explained above), so the translators will have to inspect 3 different files - their own fr.po file, the en_US.po file, and the .wml source file - to try to reconcile the differences between the three texts (the French text, the en_US text, and the simplified English text in the WML).
You're taking it wrong. There is usually only one file to inspect: your own .po, because all source strings are already there, and are an actual source text you have to translate, taken directly from wml and lua files. You only look at wml files when you're unsure what the context is (i.e. you haven't played this campaign or haven't encountered this particular string). In this case adding en_US would only provide alternative ways to look at text, possibly providing some disambiguation, not an actual source you need to check. As a bonus I'll mention the probably well-known fact, that often various programs don't have actual strings in their translation files like Wesnoth does, only a translation key, which is an obscure label serving as id, having to look for English text in a different file. I wouldn't opt for such a change here though.
Man, you're the native speaker here. When I say The meaning of the sentence hasn't changed, you think this is not an accurate statement?
All the break-down that follows just shows a change of meaning of the writer, but the sentence meaning hasn't changed to me. All the complex references to "townsfolk", "yokels" is probably lost to a large number of translators, thus unlikely to be found in any translation.
Actually I'd say that while literal meaning remains the same, the whole meaning does change and could be translated differently in many cases, reflecting the circumstantial differences of speakers and their personalities. The practicality of such solution is disputable though, as putting work into making sentences more obscure to speakers of common variant of language (I mean, not some countryside lingo) is like putting cart before the horse.
Yes there's some small issues on the dev side with forgetting to update the WML file when necessary but on the flip side this would be really useful for me. I've actually hesitated to add more "flavor" to the text in some areas like Liberty because I was worried people would be completely unable to translate it. With this, I could go crazy with the en_US translation while leaving a more basic broadly understandable version in the WML file.
I just want to remind everyone once again, that missing translations get replaced by source text. This is probably the most important reason to create en_US for "flavored" text. For just the comma issue, it would be much better to have someone check incoming changes in .pot source, and somehow un-fuzzy insignificant changes in all translations, not to violate DRY and maintain two different files. Also languages that have little or no translations at all might have speakers knowing some degree of English, but not all the intricacies and lingo.
demario
Posts: 131
Joined: July 3rd, 2019, 1:05 pm

Re: [mainline] there is a need for a en_US translation

Post by demario »

Jarom wrote: April 9th, 2022, 7:47 pm As a person who had been involved in Polish translation at some point, I can support the claim about minor changes breaking translations.
[...]

I just want to remind everyone once again, that missing translations get replaced by source text. This is probably the most important reason to create en_US for "flavored" text. For just the comma issue, it would be much better to have someone check incoming changes in .pot source, and somehow un-fuzzy insignificant changes in all translations, not to violate DRY and maintain two different files. Also languages that have little or no translations at all might have speakers knowing some degree of English, but not all the intricacies and lingo.
Much thanks Jarom, nothing better than the hard earned experience of a translator. :eng:


Let me make a status of where we are now:
  • Development team has been informed of the original problem
  • Problem was confirmed by one independent translator
  • Three ways of fixing the problem are considered:
    • Improve communication to translation-teams to inform of nature of strings changes (but we must maintain such a list independently)
    • Build a tool to automatically (or someone to) clear before translation the "fuzzy" flag of strings that are changed to improve text flavor in US English
    • Use a en_US translation
  • It is time for action...
A en_US translation for 1.16.3 version Sceptre of Fire:

As octalot has pointed out, there are many changes to the text from Sceptre of Fire in the pipe that would repeat this problem.

I have created a en_US.po file for wesnoth-sof domain that I am putting in attachment. This is the translation of strings corresponding to 1.16.2 version of BFW (git revision a623cc25) where the en_US translation corresponds to the 1.16.3 text (current rev 6a37f9f). The strings that have been changed are still in fuzzy state to highlight the change.

Using it as such is a bit of a trouble: you need to clear the fuzzy, change po/LINGUAS to define the en_US language but you also have to generate a en_US.po for each text domain (they can be blank) and then run the compilation chain again (to generate the .mo file that BFW needs).

Alternatively you can use your current copy of BFW 1.16.2 (or go back to rev a623cc25 on BFW 1.16 branch) and -unzip and- copy the wesnoth-sof.mo file in attachment to a new directory translations/en_US/LC_MESSAGES/ and you should be good to go. The fuzzy strings have already been cleared and on the first dialog of the campaign, you should see "Ay, the Sceptre of Fire. The Sceptre [...]" instead of the original "Ay, the Sceptre of Fire. The sceptre [...]". Yep, that's the kind of changes we are talking about. :whistle:

If it works for you, you can play the campaign with the most updated US English, without feeling any guilt that what you are enjoying is making the experience miserable for non-native US English wesnoth players. :twisted:

Benefits of a en_US translation:
  • All the infrastructure already exists
  • It can be implemented on a text-domain basis (for example on [some] campaigns only)
  • Could potentially make the translation easier by using more simple text
  • It can be tested right on
Attachments
wesnoth-sof.mo.gz
Compiled en_US translation for 1.16.3 version Sceptre of Fire
(47.18 KiB) Downloaded 136 times
User avatar
Pentarctagon
Project Manager
Posts: 5561
Joined: March 22nd, 2009, 10:50 pm
Location: Earth (occasionally)

Re: [mainline] there is a need for a en_US translation

Post by Pentarctagon »

To be clear, I think more communication from the translation teams is desirable entirely independently of anything else that happens.
99 little bugs in the code, 99 little bugs
take one down, patch it around
-2,147,483,648 little bugs in the code
User avatar
octalot
General Code Maintainer
Posts: 786
Joined: July 17th, 2010, 7:40 pm
Location: Austria

Re: [mainline] there is a need for a en_US translation

Post by octalot »

Jarom wrote: April 9th, 2022, 7:47 pm For just the comma issue, it would be much better to have someone check incoming changes in .pot source, and somehow un-fuzzy insignificant changes in all translations, not to violate DRY and maintain two different files.
I've just realised it's not enough to just blindly unfuzzy them; that works if the file was 100% translated before, but if it wasn't then the logic ought to preserve the fuzzy status of the old string. So my plan to send out a email saying "here's the 4 significant changes in wesnoth-sof, 2 more are insignificant changes but the the system thinks they're new strings, and for the other 94 you can simply unset the flag" doesn't work; and yes, while I was thinking it would work that way I added another 18 insignificant changes to the batch.
demario wrote: April 10th, 2022, 10:41 am I have created a en_US.po file for wesnoth-sof domain that I am putting in attachment.
The game engine can load .po files for mainline campaigns too, so it doesn't need the .mo file to be generated, and it doesn't need the LINGUS file to be edited. However, it doesn't (yet) have an option to load the .po files from the directories where they're kept in source control.
  • Find the directory called translations, under which the the .mo files are stored. For example, there's a translations/fr/LC_MESSAGES/wesnoth-sof.mo file.
  • If there's already an translations/en_US/LC_MESSAGES/wesnoth-sof.mo file, delete it. Otherwise the engine will load the .mo instead (I think, haven't tested that).
  • Create a directory translations/wesnoth-sof. Yes, translations should now contain one folder named after a campaign and about fifty named after languages.
  • Unzip demario's file and rename it to translations/wesnoth-sof/en_US.po.
  • Start Wesnoth with the command-line argument --language en_US.
Soliton
Site Administrator
Posts: 1681
Joined: April 5th, 2005, 3:25 pm
Location: #wesnoth-mp

Re: [mainline] there is a need for a en_US translation

Post by Soliton »

gnombat wrote: April 8th, 2022, 2:28 pm I.e., why is there so much churn occurring in the translatable strings? (Why are there 80 changes to a single campaign in a bugfix release of the stable branch?)
That is indeed a good question. Either way though if those text changes do not actually change the meaning the pofix tool should be used to adjust all translations so that there is no need for translators to do anything. There is no new technical solution needed.
"If gameplay requires it, they can be made to live on Venus." -- scott
User avatar
octalot
General Code Maintainer
Posts: 786
Joined: July 17th, 2010, 7:40 pm
Location: Austria

Re: [mainline] there is a need for a en_US translation

Post by octalot »

The pofix tool hasn't been part of the workflow for a long time (I'm saying that based just on the set of strings that are in it). If pofix is the answer, please could you walk us through how it should be used for the scale of changes that are happening?

I can see it working well on the conversion of ASCII apostrophes to their typographical versions, which will reduce some of the burden. However, handling commas and many of the capitalisation changes looks like it's going to need line-by-line cutting and pasting from a diff of the .pot files. The tool doesn't unwrap the word-wrapped lines in .pot files, which is why the cut&paste would need to be from a diff of the .pot files rather than the diffs of the .cfg files; either way, it's a lot of work.

Handling Nemaara's conversion of Liberty to rural slang seems completely outside the scope of pofix.
demario
Posts: 131
Joined: July 3rd, 2019, 1:05 pm

Re: [mainline] there is a need for a en_US translation

Post by demario »

octalot wrote: April 10th, 2022, 6:00 pm [...]
  • Unzip demario's file and rename it to translations/wesnoth-sof/en_US.po.
  • Start Wesnoth with the command-line argument --language en_US.
Interesting undocumented feature. It would work only after the fuzzy flags are cleared though.
Soliton wrote: April 10th, 2022, 7:14 pm
gnombat wrote: April 8th, 2022, 2:28 pm Why are there 80 changes to a single campaign in a bugfix release of the stable branch?
That is indeed a good question.
Oh you're a non-native speaker that doesn't consider improving grammar in the US English text as a bug fix, right? :lol: That's why it is done.
There are benefits of seeing US English as just another language. The speakers of this language have a right to the best reading experience too. :mrgreen:

Only the new translations should be put in an update of a stable release. So that the experience for players in all languages improves from fix releases. :doh:
And that's why these fixes should be done as part of a en_US translation update instead. :eng:
User avatar
loonycyborg
Windows Packager
Posts: 295
Joined: April 1st, 2008, 4:45 pm
Location: Russia/Moscow

Re: [mainline] there is a need for a en_US translation

Post by loonycyborg »

Having a separate textdomain for text cleanup sounds nice yes, though still a bit hackish. Just how in sync with original text strings it's supposed to be? Will it be synced before new stable release or at some other schedule?
"meh." - zookeeper
User avatar
octalot
General Code Maintainer
Posts: 786
Joined: July 17th, 2010, 7:40 pm
Location: Austria

Re: [mainline] there is a need for a en_US translation

Post by octalot »

The aim would be strings that only change if the meaning changes. So they wouldn't be synced up with the en_US translation.
User avatar
Pentarctagon
Project Manager
Posts: 5561
Joined: March 22nd, 2009, 10:50 pm
Location: Earth (occasionally)

Re: [mainline] there is a need for a en_US translation

Post by Pentarctagon »

So if:
  1. Yumi (SP content maintainer) thinks having an en_US translation would be beneficial.
  2. Nobody else making significant text contributions to mainline has objections to an en_US translation.
  3. Translators think having an en_US translation would be beneficial.
then it sounds like that's the way to go.

That said though, I'm not too clear on where #3 stands, or what would even be a good way to try and get a better answer for it.
99 little bugs in the code, 99 little bugs
take one down, patch it around
-2,147,483,648 little bugs in the code
User avatar
loonycyborg
Windows Packager
Posts: 295
Joined: April 1st, 2008, 4:45 pm
Location: Russia/Moscow

Re: [mainline] there is a need for a en_US translation

Post by loonycyborg »

octalot wrote: April 11th, 2022, 7:48 pm The aim would be strings that only change if the meaning changes. So they wouldn't be synced up with the en_US translation.
What if someone changes base string with the intent to change meaning as part of adding features and the like. Will they be expected to update en_US translation on their own too?
"meh." - zookeeper
User avatar
octalot
General Code Maintainer
Posts: 786
Joined: July 17th, 2010, 7:40 pm
Location: Austria

Re: [mainline] there is a need for a en_US translation

Post by octalot »

Excluding the special case of wanting to do a regional accent, I think people only need to change en_US.po if that exact base string already exists in en_US.po:
en_US_flowchart.png
Graphviz source for image:
demario
Posts: 131
Joined: July 3rd, 2019, 1:05 pm

Re: [mainline] there is a need for a en_US translation

Post by demario »

octalot wrote: April 12th, 2022, 12:39 am [stuff]
Wow, pretty awesome octalot.

A couple of comments:
  • "Is it meant to be a dialect(Texan drawl, etc)?"
    I think this case in your diagram refers to a new string (as a simple English needs to be put in the WML).
    I am not sure we should go to that level of cleanness. I think at the first version of a campaign, writers should be able to input any text that meets the standard for content in wesnoth. If it is in dialect/slang/... I am sure the translators will make all the necessary researches/guesses/shortcuts to come up with a translation. The goal here is to have this translation to stay valid after the original text is "improved".
  • "Is it a new string?"
    Right, when should that process be applied? For me, any change can be done for text in WML until the text is entering the first string-freeze for the first stable release. From then on, only qualified (see later) changes can be done in text from WML (including in development branch). Other changes go in po files.
  • "Has the text changed meaning?"
    That is the biggest problem here. As long as we can find native speakers who see difference of meaning between:

    Code: Select all

    «By all rights, I should have you executed on the spot, Malin. I cannot believe you let that necromancer corrupt you.»
    «By all rights, Malin, I should have ya kill’d on tha spot. I can’t believe ya let that necromancer corrupt you.»
    we will always have the same kind of problem. We should avoid arguments about how changing ", " to ". ", ". " to "! ", ". " to "...", "sceptre" to "Sceptre" are changes of meaning (hence changing the WML, thus breaking the translations). So we need a harder check to decide if the changes are qualified to be done to WML in .cfg files.

    I would go for something like "Is the change of text reported in a github issue? Was it accepted?".
    If it is not worth reporting a bug, it is definitely not worth breaking translations.
[edit]
OMG I made a diagram too
en_US.dot.png
[/edit]
Last edited by demario on April 12th, 2022, 11:07 am, edited 1 time in total.
User avatar
octalot
General Code Maintainer
Posts: 786
Joined: July 17th, 2010, 7:40 pm
Location: Austria

Re: [mainline] there is a need for a en_US translation

Post by octalot »

The tools for building .po files seem to have an unexpected feature - when building en_US.po or en_AU.po, they automatically fill in the translated text. The same doesn't happen for fr_AU.po. Once the .po file has been edited, they don't overwrite translations, it's just the initial creation step which is surprising, and I don't see options to control it.

Poedit warns that the source and destination languages are the same when editing an en_US.po, but not an en_AU.po. No idea where Poedit is getting "the source is en_US" from, it doesn't seem to be in the file itself.

These aren't insurmountable problems, but they suggest there may be more surprises ahead.
User avatar
Celtic_Minstrel
Developer
Posts: 2194
Joined: August 3rd, 2012, 11:26 pm
Location: Canada
Contact:

Re: [mainline] there is a need for a en_US translation

Post by Celtic_Minstrel »

demario wrote: April 8th, 2022, 11:44 pm All the complex references to "townsfolk", "yokels" is probably lost to a large number of translators, thus unlikely to be found in any translation.
That might be true, but that makes the resulting translation a poor translation. That nuance may be subtle but it does mean something, and a good translation would alter the translated text to give a similar nuance. For example, dialectical language is commonly translated to a dialect of the target language that has similar connotations to speakers of that language.

In order to do a good job, especially on story prose, a translator needs to have a certain level of fluency in both the source and target languages. If Wesnoth's translators can't speak English fluently, then they're making things unnecessarily difficult for themselves. If they still want to translate and do a good job, they should spend some time studying English to improve their fluency. (That said, there is something to be said for having a basic translation as well; at least, it's usually better than no translation as long as it's been proofread by a native speaker of the target language.)

This isn't such a big deal for user interface text. If there are multiple translators and some are less fluent in English, I'd recommend the less fluent ones focus on the shorter user interface strings.

Note that translation is not a science. It's an art in and of itself. The translator needs to rewrite the entire text in their target language, and do so in a way that gives speakers of the target language as similar an impression as possible as speakers of the source language would get from the source text.
demario wrote: April 10th, 2022, 10:41 am Use a en_US translation
First of all, I don't think this is the correct approach. In order to get an accurate translation, all translators should work off the source text, which in Wesnoth's case is the en_US text. If you instead set simple English as the source with an en_US translation, then translators are translating the wrong source and will produce a less accurate translation. All that flavour that people spent time putting into the English text would just be missing from the translated texts… and you might even find non-English players complaining that the writing is boring.
demario wrote: April 10th, 2022, 10:41 am Build a tool to automatically (or someone to) clear before translation the "fuzzy" flag of strings that are changed to improve text flavor in US English
This tool already exists to avoid fuzzying strings that contain only typo fixes and other changes that do nothing to the meaning (such as the letter case change you highlighted in SoF). However, it would be incorrect to use it on a string where flavour was added, for example by using dialectical vocabulary. It would also be incorrect to use it on most punctuation changes, as those usually alter the meaning subtly as well. So, I'm afraid this solution can't actually solve the problem of string churn. It can only alleviate it a little.

There may be another tool-based approach that could help, though. I think the idea that fuzzy strings are simply not shown (falling back to the source text) is not necessarily the correct choice with a complex prose-based game like Wesnoth. We could have some tool that defuzzes strings satisfying certain criteria as part of the release process, so that the translator will receive the properly-fuzzied strings, but some of those fuzzy strings will actually be shown in the release even if the translator does not update them. This might be harder than it sounds, as those "certain criteria" could end up being quite complex, and indeed I'm not sure the approach is viable (perhaps those criteria are just too complex to automate it). But it could be something to think about.
demario wrote: April 10th, 2022, 10:41 am Improve communication to translation-teams to inform of nature of strings changes (but we must maintain such a list independently)
So ultimately I think the only thing we can really do is produce a "translations changelog" for each release. I don't know what exactly this would look like, but the purpose would be to communicate to the translators which fuzzy strings they should prioritize for updates and which ones would not lose much if they did not update them (merely clearing the fuzzy status).
demario wrote: April 10th, 2022, 10:41 am Benefits of a en_US translation:
  • All the infrastructure already exists
  • It can be implemented on a text-domain basis (for example on [some] campaigns only)
  • Could potentially make the translation easier by using more simple text
  • It can be tested right on
So basically, #4 is not a benefit. It is a net negative, which will actually serve to make the translations worse.
octalot wrote: April 11th, 2022, 1:08 am The tool doesn't unwrap the word-wrapped lines in .pot files, which is why the cut&paste would need to be from a diff of the .pot files rather than the diffs of the .cfg files; either way, it's a lot of work.
Maybe it's worth updating the pofix tool to fix this?
octalot wrote: April 11th, 2022, 1:08 am Handling Nemaara's conversion of Liberty to rural slang seems completely outside the scope of pofix.
Yeah, that's definitely outside the scope of pofix.
Author of The Black Cross of Aleron campaign and Default++ era.
Former maintainer of Steelhive.
Post Reply