[accepted, implemented] Context-free Grammar for unit names

Brainstorm ideas of possible additions to the game. Read this before posting!

Moderators: Forum Moderators, Developers

Forum rules
Before posting a new idea, you must read the following:
User avatar
skeptical_troll
Posts: 406
Joined: August 31st, 2015, 11:06 pm

Re: How about using Context-free Grammar to generate unit na

Post by skeptical_troll » April 8th, 2016, 7:05 pm

Just out of curiosity, is the current algorithm based on MCMC on letters , not syllables or blocks? If the length is the problem, isn't just possible to include it in the likelihood by hand so that probability of long names go down?

Spixi
Posts: 52
Joined: August 23rd, 2010, 7:22 pm

Re: How about using Context-free Grammar to generate unit na

Post by Spixi » April 8th, 2016, 7:17 pm

skeptical_troll wrote:Just out of curiosity, is the current algorithm based on MCMC on letters , not syllables or blocks? If the length is the problem, isn't just possible to include it in the likelihood by hand so that probability of long names go down?
You can find the current implementation in /src/race.cpp.

I wonder if there is already an existing library which does exactly the opposite of what GNU flex does.

User avatar
Dugi
Posts: 4843
Joined: July 22nd, 2010, 10:29 am
Location: Carpathian Mountains
Contact:

Re: How about using Context-free Grammar to generate unit na

Post by Dugi » April 8th, 2016, 7:49 pm

skeptical_troll wrote:Just out of curiosity, is the current algorithm based on MCMC on letters , not syllables or blocks? If the length is the problem, isn't just possible to include it in the likelihood by hand so that probability of long names go down?
If you are lazy to find it, it takes a pair of letters to pick what will follow. It can be configured to use the last triplet or more, but it's not used anywhere as far as I know.
I have tried the algorithm with next letter determined from the previous single letter, but it sucked.

EDIT:
Some improved grammars whose average number of recruits till a pair of namesakes appear are significantly better than it was before (no experiment was made because it can be estimated mathematically, I am adding rough estimates).
Male elves (around 70)
Female elves (around 70)
Male humans (around 80)
Female humans (around 80)
Orcs (around 90)
Other grammars were already better than the current markov generator.

Some further improvements were made and the pull request was accepted and will be a part of wesnoth 1.13.5 and later.

User avatar
GunChleoc
Translator
Posts: 440
Joined: September 28th, 2012, 7:35 am
Contact:

Re: [accepted, implemented] Context-free Grammar for unit na

Post by GunChleoc » January 27th, 2017, 9:43 am

I have finally been playing with translating the name generation. I am running into a few difficulties with the town names:
  1. I have prefixed the base names with "XXX", but I get names with "XX", "XXXX" or "XXXXXX" in them. This should not happen - I should see "XXX" only here.
  2. I have prefixed the rule-generated base names with "NOCOM". I don't see any of those at all.
  3. Segmentation of base names is broken. Seems like blank space is used in addition to , for parsing the names into a list, resulting in nonsense names.
  4. There are town names that consist of base names only. Since my base names need to be in the genitive case for the composition rules, I need to get rid of pure base town names without any prefixes. I haven't found a way to do that.
I am attaching the current state of my translation file for the wesnoth textdomain, gd locale.
Attachments
wesnoth.zip
(173.6 KiB) Downloaded 85 times

User avatar
Dugi
Posts: 4843
Joined: July 22nd, 2010, 10:29 am
Location: Carpathian Mountains
Contact:

Re: [accepted, implemented] Context-free Grammar for unit na

Post by Dugi » January 29th, 2017, 10:50 am

Hello,

All mentions that I have contributed something at all were removed from my forum profile, so this is not my responsibility any more.

However, because it's you, I will give you some advice. The implementation tries to find the name generator. If it fails to find it, is falls back to the old Markov chains. The Markov chain generated names work like that, if you add XXX before all base names, the result may have a random number of X letters in it. You have prefixed the context-free grammar generated names of villages with NOCOM, but the code is messed up (the second line is missing a newline), the parsing fails and falls back to the old method. I do not know how is the old name system implemented, so I have no idea what can be broken in it and cause the segmantation issue.

HTH,
-Dugi

User avatar
GunChleoc
Translator
Posts: 440
Joined: September 28th, 2012, 7:35 am
Contact:

Re: [accepted, implemented] Context-free Grammar for unit na

Post by GunChleoc » January 29th, 2017, 4:16 pm

Thanks, Dugi. Seems like that's not the only thing that's broken in my code though, so I need to find a way to really debug this thing.

User avatar
Dugi
Posts: 4843
Joined: July 22nd, 2010, 10:29 am
Location: Carpathian Mountains
Contact:

Re: [accepted, implemented] Context-free Grammar for unit na

Post by Dugi » January 29th, 2017, 8:25 pm

You can use my website to debug it (most of those links I have posted link to that). I have expanded its functionality since then, but you probably won't come across any of the new syntactic features.

You may need to add that \n at the end of each line. The code needs the newlines to be there, but I am not sure how do the translation file deal with the newlines.

User avatar
GunChleoc
Translator
Posts: 440
Joined: September 28th, 2012, 7:35 am
Contact:

Re: [accepted, implemented] Context-free Grammar for unit na

Post by GunChleoc » January 30th, 2017, 12:07 pm

Thanks, that is a very helpful tool! It's working via the website now, but not in Wesnoth.

I compiled Wesnoth on my Linux box and added some debug output. I spent a few hours digging into the code and it seems like calling generate() for "main" always returns an empty string, even for English. This means that the $base variable is then filled by the Markov generator.

So, this is definitely a bug in the context free generator in Wesnoth, but I have no idea what's wrong with it yet.

User avatar
GunChleoc
Translator
Posts: 440
Joined: September 28th, 2012, 7:35 am
Contact:

Re: [accepted, implemented] Context-free Grammar for unit na

Post by GunChleoc » January 31st, 2017, 11:53 am

I found the bug :)

https://github.com/wesnoth/wesnoth/pull/921

I am still getting pure base names though.

Wussel
Posts: 592
Joined: July 28th, 2012, 5:58 am

Re: How about using Context-free Grammar to generate unit na

Post by Wussel » September 1st, 2018, 11:23 am

Spixi wrote:
April 8th, 2016, 6:37 pm
The problem with Markov chains is that there may be loops or dead ends which can cause very long or very short names.

This small example shows, what I mean:

Given are the following names:
LILA
ANNE
ALENA

This produces the following Markov chain:
<start> -> { A, A, L }
A -> { <end>, <end>, L, N }
E -> { <end>, N }
I -> { L }
L -> { A, E, I }
N -> { A, N, N, E }

The probability to generate the name "A" is 4/9, because 2/3 of all names start with A and 2/3 of all names end with A.
The likelihood that a name, which contains a N, contains at least three Ns in a row is (1/2)^3 = 1/8, which makes names like "ANNNA" very common.
If a name contains a I, it will contain at least four characters, because it has to contain the path L -> I -> L -> {A, E, I}



We conclude that names usually do not follow Markov chains. Many names are based on context-free grammars, however. This example shows a simple grammar for old German names:

NAME = {PREFIX} + {SUFFIX}
PREFIX = "A", "Al", "Bal", "Ed", "Eg", "Frie", "Gott", "Hein", "Hin", "Rein", "Sig", "Ul", "Wil", "Win", "Wal", "Wol"
SUFFIX = "bert", "dolf", "drich", "dulin", "dur", "fried", "helm", "hold", "lieb", "ram", "rich", "win"

Example names are: Edwin, Reinhold, Friedrich and Winfried.

As you see, this would generate names with a better quality than the current implementation.
That would be exactly how it should be. I remember the use of this for pen and paper RPG in the late 80ties. Making lists in excel and using for NPCs. How many more years will it take for Wesnoth to catch up?

Tad_Carlucci
Developer
Posts: 309
Joined: April 24th, 2016, 4:18 pm

Re: [accepted, implemented] Context-free Grammar for unit names

Post by Tad_Carlucci » September 1st, 2018, 2:41 pm

alalalalalalalalalalalalalalalalalalalalalalalal

Good recognizer, lousy generator. That probably answers why we're not "with it" .. we like junk to work and not cause infinite loops or other silliness.
I forked real life and now I'm getting merge conflicts.

Post Reply