Machine translation in wesnoth using gplv3 apertium software

Discuss and coordinate development of mainline and user-made content translations.

Moderator: Forum Moderators

Post Reply
demario
Posts: 130
Joined: July 3rd, 2019, 1:05 pm

Machine translation in wesnoth using gplv3 apertium software

Post by demario »

I want to introduce another set of tools that can provide alternative Machine Translation (MT) services based on apertium.

First this is a gplv3 tool which can be installed locally and for which you're getting all the code. As a practical example, you are getting all the rules and dictionaries used during MT and you're free to edit them! Second the project is an spin-off of an academic work with the explicit target of focusing on less widely spread languages. Finally, this is a multi-language project that doesn't revolve around US English.

People involved in translation have noticed the improvement of MT in the recent past. Some wesnoth translation teams list using DeepL, google-translate and other MT tools as part of their current process. It gives good results as:
  • they translate from wesnoth original US English which is the best supported language
  • they often translate into other widely used languages
  • they do a human proof-reading of the MT result with manual edits when required
  • with experience, they can avoid translatable strings that are badly translated by MT (races, unit names...)
So, how can that apertium be useful for wesnoth translation? Use of apertium could help wesnoth translation with:
  • MT from different full translations in alternative widely used western languages
  • MT to western languages with less speakers (based on language proximity)
  • less need of human proof-reading as MT is done on similar languages
  • possibility to improve MT rules and dictionary based on gplv3 license
Thanks to apertium multi-language design, we can spread the contagion to other languages with less speakers. For example, from cs to sk, from de to da, from ru to uk. With no falling back to English. The theory is that MT could do a better job at translation between two languages that are close than from US English. That could reduce the need of the final human check that is difficult to get in languages with less speakers (see my failed attempt at getting translation review).

The last problem with MT is that most are web-based and require copy-paste of the translatable strings one-by-one. That is another benefit of using apertium, as the local install links with pology that can apply different processes on po files -- used in wesnoth -- as feeding apertium.

A full po file MT translation using pology/apertium is basically looking like:

Code: Select all

pomtrans apertium -s ita -t srd -d /usr/local/share/apertium/apertium-sr-ita -p srd.:it. po/wesnoth-units/srd.po
[edit] Removed a link to a post on wesnoth forum. It was a link on some teams list DeepL, ... and not on they translate from ....[/edit]
Last edited by demario on June 17th, 2023, 7:47 am, edited 3 times in total.
demario
Posts: 130
Joined: July 3rd, 2019, 1:05 pm

Re: Machine translation in wesnoth using gplv3 apertium software

Post by demario »

When discussing apertium MT in wesnoth, I will work based on the translation data for 1.14 and will apply any MT to the strings from wesnoth in that version.

The reason is twofold:
- as the MT translation will not be reviewed by a native speaker most probably, I would like that the people that are interested in testing the results from MT are taking an action (like downloading the add-on containing the core 1.14) to use them in full knowledge of the limitations.
- as respect for the work of translators in these languages, I want to give them a head start so their translations are appreciated in their own right.

There are different factors that can impact the usefulness of apertium MT is wesnoth:
  • the maturity of the translation between a pair of languages. The apertium project is maintaining different levels of state of progress (trunk, staging, nursery, incubator) and each pair of languages is associated with one of these levels. The result from translation between 2 languages in trunk will be better than 2 languages in nursery. Most pairs in incubator have been last updated long ago and have sometimes few commits. The activity around apertium seems to have taken a hit around year 2021.
  • one of the two languages from the pair needs to be actively translated for BfW version 1.14. Beside English, wesnoth 1.14 is roughly available fully translated in different European languages(*): cs, de, fr, it, sp, ru.
  • in the other language of the pair, better to have some translation available for the domains that contain the strings for generic game information (#wesnoth, #wesnoth-units, #wesnoth-lib...).


As I said before, I will focus on pairs of languages that belong to the same (or close) linguistic family. If I use wikipedia as reference for European languages, the family tree looks like:
Indo-European linguistic families
European languages available for translation: af bg ca da el et fi ga gd gl he hr hu is la lt lv mk nl nb_NO pl pt ro sk sl sr sv uk
IndoEuropeanTree.png
The European languages are grouped as follow:
wesnoth linguistic groups ("full" translation shown in bold)
Celtic insular: Irish, Scottish Gaelic, Breton, Welsh
Romance: Sardinian, Romanian
Italo-Dalmatian: Corsican, Italian
Iberian: Spanish, Galician, Portugese
Gallic: Arpitan, French, Catalan, Occitan
Germanic: Icelandic, Norwegian, Danish, Swedish
Central-Upper German: German, Luxembourgish
Low Franconian: Dutch, Afrikaans
Anglo-Frisian: English, Scots
Balto-Slavic: Latvian, Lithuanian
East Slavic: Russian, Ukrainian
West Slavic: Polish, Czech, Slovak
South Slavic: Slovene, Croatian, Serbian
Eastern South Slavic: Bulgarian, Macedonian
Uralic: Hungarian, Finnish, Estonian
Standalone: Albanian, Armenian, Greek, Basque
So taking the apertium language pairs into account, we have the following path for spreading translation for wesnoth:
apertium translation paths

### Direct paths
spa - cat (trunk)
- arg (Aragonese)
- mlt (Maltese) (incubator)
(surprisingly no pair spa - por)

fra - cat (trunk)
- oci (trunk)
- frp (Arpitan)
- por (staging)
- ron (Romanian) (incubator)

ces - pol (staging)
- slk (incubator)

ita - srd (trunk)
- cos (Corsican) (incubator)
- slv (Slovenian) (incubator)

rus - ukr (trunk)
- kaz (trunk)
- tat (nursery)
- hbs (Serbo-Croatian) (nursery)
- fin (incubator)

tur - uzb (nursery 2021)
- kir (nursery 2021)
- aze (nursery 2021)
- uig (incubator)
- tuk

eng - cym (Welsh) (trunk)
- gle (Irish) (incubator)
- sco (incubator)

deu - nld (incubator)
- dan (incubator)
- swe (incubator)
- ltz (incubator)

### 2-step paths
(this lists the output of the direct path that can be used to extend to a third language)
slv - hbs

pol - ukr
- lvs
- lt

swe - fin
- nor
- isl

cat - por (trunk)
bul - mkd (Macedonian)
bre - cym (incubator)
ga (Irish) - gd (Scottish Gaelic) (nursery)
nld - afr

## No path
ar eo eu id he ko mr my tl vi
Finally taking into account the apertium state of progress, we end up in two kinds of pairs that could be useful:
  • translation in language currently not available in wesnoth:
    French - Arpitan ; French - Occitan ; Italian - Sardinian ; Russian - Kazakh ; English - Welsh ; Spanish - Aragonese

    As the target language is not present in wesnoth, all wesnoth-specific words will end up not translated (ie identical to string in the source language), and either the word must be close enough in the target language or the target audience needs to be bilingual.
    The second problem is that the language is not defined in wesnoth, so even if the translation is generated, it can't be selected in the language selection dialog.
  • translation in languages currently present in wesnoth:
    (in parenthesis, translation percentage for mainline core domains in BfW 1.14)
    French/Spanish - Catalan (98.98%) ; Czech - Polish (93.74%) ; Russian - Ukrainian (93.07%) ; French - Portuguese (87.66%)
--
(*) the same thinking can be applied from tr to various Central Asia languages.
Last edited by demario on September 10th, 2023, 11:48 pm, edited 10 times in total.
User avatar
octalot
General Code Maintainer
Posts: 777
Joined: July 17th, 2010, 7:40 pm
Location: Austria

Re: Machine translation in wesnoth using gplv3 apertium software

Post by octalot »

demario wrote: June 4th, 2023, 12:55 pm Some wesnoth translation teams list using DeepL, google-translate and other MT tools as part of their current process. It gives good results as:
  • they translate from wesnoth original US English which is the best supported language
  • they often translate into other widely used languages
  • they do a human proof-reading of the MT result with manual edits when required
  • with experience, they can avoid translatable strings that are badly translated by MT (races, unit names...)
I get a different interpretation from the post that you've linked to. I think Michal- is doing a human translation, and then sometimes using MT as a sanity-check to compare to, rather than using the MT as the actual translation.
User avatar
Lord-Knightmare
Discord Moderator
Posts: 2337
Joined: May 24th, 2010, 5:26 pm
Location: Somewhere in the depths of Irdya, gathering my army to eventually destroy the known world.
Contact:

Re: Machine translation in wesnoth using gplv3 apertium software

Post by Lord-Knightmare »

I wanted to use this to help my translation effort on the game to BN, but I see that it's not in the support languages list so I will use one where it's supported (shown as an option at least)
Creator of "War of Legends"
Creator of the Isle of Mists survival scenario.
Maintainer of Forward They Cried
User:Knyghtmare | My Medium
demario
Posts: 130
Joined: July 3rd, 2019, 1:05 pm

Re: Machine translation in wesnoth using gplv3 apertium software

Post by demario »

demario wrote: we end up in two kinds of pairs that could be useful:
  • translation in language currently not available in wesnoth:
    Italian - Sardinian ; ...
So I have put up an experimental Sardinian translation of wesnoth in the core 1.14 ("Bienvenue"). This is Machine Translated using the platform apertium based on the text in Italian. The four campaigns in "Bienvenue à Wesnoth ! (Welcome to Wesnoth)" are also available in that language if you download the add-on and load the core "Bienvenue (1.14)". The translation has not been reviewed.
You will have to select the Burmese (mranmabhasa) language to see Sardinian translation instead when you start BfW with the option --all-translations (or edit the file data/languages/my_MM.cfg to boost the percent=0 over 80 ).

I put in attachment the BfW 1.14 #wesnoth-help domain (all MTed strings; no fuzzy, no 'mtrans' marker) in Sardinian for those who want to check it out without the boilerplate.
demario wrote:
  • translation in languages currently present in wesnoth:
    (in parenthesis, translation percentage for mainline core domains in BfW 1.14)
    French/Spanish - Catalan (98.98%) ; ...
I have used apertium to "complete" translation of the #wesnoth-help domain from BfW 1.14 in Catalan (out of personal convenience, I worked from French). I picked this domain as it is somehow less specific to wesnoth and the vanilla apertium fra-cat pair will possibly lead to some positive result. From checking the results, I can see some untranslated words from French (copiage, collage...) that I'll need to fix later, but it still looks like a foreign language to me lol.
You can check it out with the Catalan translation in attachment (42 MTed strings are identified as fuzzy, as original 'mtrans' doesn't show up).
Attachments
wesnoth-1.14.17.po.wesnoth-help.ca.po.gz
BfW 1.14 wesnoth-help m-translated in Catalan (from French)
(110.11 KiB) Downloaded 35 times
wesnoth-1.14.17.po.wesnoth-help.srd.po.gz
BfW 1.14 wesnoth-help m-translated in Sardinian
(101.57 KiB) Downloaded 31 times
Last edited by demario on June 17th, 2023, 8:08 am, edited 1 time in total.
Michal-
Posts: 5
Joined: January 18th, 2021, 10:16 pm
Location: Czechia

Re: Machine translation in wesnoth using gplv3 apertium software

Post by Michal- »

octalot wrote: June 5th, 2023, 10:53 am I get a different interpretation from the post that you've linked to. I think Michal- is doing a human translation, and then sometimes using MT as a sanity-check to compare to, rather than using the MT as the actual translation.
You are right.

For example I discovered many pop culture links in DiD achievements, whose was previously unknown for me and to translate them exactly I had to search and see movies in Czech. Using MT first in this case could probably hide some of them.

But I have tried different approach recently - automaticaly convert untranslated messages to fuzzies using msgattrib, potrans (DeepL), msgmerge and then translate these DeepL fuzzies much quicker, which is very tempting.
Post Reply