[accepted, implemented] Context-free Grammar for unit names

Brainstorm ideas of possible additions to the game. Read this before posting!

Moderators: Forum Moderators, Developers

Forum rules
Before posting a new idea, you must read the following:
User avatar
Dugi
Posts: 4867
Joined: July 22nd, 2010, 10:29 am
Location: Carpathian Mountains
Contact:

[accepted, implemented] Context-free Grammar for unit names

Post by Dugi » February 7th, 2016, 7:17 pm

The game's name generator is quite good, but I think that in some cases, it produces quite ridiculous names. Either extremely long names, or names that contain improbable groups of letters or syllables that make it sound alien.

To fix the issue, I've come up with an idea:
How about using a context-free grammar to describe the names? It would just randomly pick one of the possible derivation rules and end when there's nothing else to expand. That would allow making a completely customisable generator of names where every strange group of letters can be eliminated.

To test it out, I have coded it and got it online ([acronym=Who needs JavaScript?]it's in C++[/acronym], so it may be also directly taken into wesnoth). An example of its application to generate human male names can be seen here.
Last edited by Dugi on April 11th, 2016, 9:53 pm, edited 1 time in total.

User avatar
GunChleoc
Translator
Posts: 448
Joined: September 28th, 2012, 7:35 am
Contact:

Re: How about using Context-free Grammar to generate unit na

Post by GunChleoc » February 8th, 2016, 9:27 am

It's a good idea :)

Some thoughts on localization:

Maybe localizers could provide possible syllables in their language somehow? E.g. for my language, if we assume the possibility of more than 2 syllables, I would need the following groups:

Start - slender
Start - broad

Middle - broad-broad
Middle - broad-slender
Middle - slender-broad
Middle - slender-slender

End - slender
End - broad

A Start - slender syllable can then be linked with a Middle slender-X syllable or End - slender syllable.
A Start - broad syllable can then be linked with a Middle broad-X syllable or End - broad syllable.

A Middle Y-slender syllable can be linked with a Middle slender-X syllable or End - slender syllable.
A Middle Y-broad syllable can be linked with a Middle broad-X syllable or End - broad syllable.

Other languages will probably have different rules.

Deciton_Reven
Posts: 93
Joined: August 6th, 2012, 4:49 pm

Re: How about using Context-free Grammar to generate unit na

Post by Deciton_Reven » February 8th, 2016, 3:51 pm

Peanut Gallery here: I think the incredibly long names and very weird names are fine. It goes to show the people of Wesnoth have their own culture and language, and we're just watching from a translator's view point. There are just some things that even translating can't get perfectly.

It at least doesn't deduct from the immersion, and can be good for a what the heck is this kind of laugh.

User avatar
beetlenaut
Developer
Posts: 2255
Joined: December 8th, 2007, 3:21 am
Location: Washington State
Contact:

Re: How about using Context-free Grammar to generate unit na

Post by beetlenaut » February 8th, 2016, 7:28 pm

Deciton_Reven wrote:It at least doesn't deduct from the immersion, and can be good for a what the heck is this kind of laugh.
I think the point is that a "what the heck is this" kind of laugh does detract from the immersion. At least it does for me. IMHO the names that Dugi's site creates are better than Wesnoth's.
Campaigns: Dead Water,
The Founding of Borstep,
Secrets of the Ancients,
and WML Guide

User avatar
Dugi
Posts: 4867
Joined: July 22nd, 2010, 10:29 am
Location: Carpathian Mountains
Contact:

Re: How about using Context-free Grammar to generate unit na

Post by Dugi » April 7th, 2016, 7:52 am

It's written, pull request is being considered: https://github.com/wesnoth/wesnoth/pull/644

More examples:
Dwarvish names
Khalifate names
Lizard names
Wose names

User avatar
Eagle_11
Posts: 757
Joined: November 20th, 2013, 12:20 pm

Re: How about using Context-free Grammar to generate unit na

Post by Eagle_11 » April 7th, 2016, 12:52 pm

Afaik, we dont do surnames in wesnoth, so the dwarven somethingson family names would be excess.

User avatar
beetlenaut
Developer
Posts: 2255
Joined: December 8th, 2007, 3:21 am
Location: Washington State
Contact:

Re: How about using Context-free Grammar to generate unit na

Post by beetlenaut » April 7th, 2016, 5:08 pm

No, surnames are a good idea. It seems a little strange that no tribes, clans, families, or individuals in all of Wesnoth want a second name. It would make more sense to imagine that it would be important for some people. Also, there are already some UMC characters that have two names.
Campaigns: Dead Water,
The Founding of Borstep,
Secrets of the Ancients,
and WML Guide

User avatar
Dugi
Posts: 4867
Joined: July 22nd, 2010, 10:29 am
Location: Carpathian Mountains
Contact:

Re: How about using Context-free Grammar to generate unit na

Post by Dugi » April 7th, 2016, 7:24 pm

Actually, Drakes and Khalifate already can have names made from more words. For Khalifate, the names on the list have forms like Arif al-Makhmud, but I am not sure what it created when it was passed through the markov chain generator. All I did with the Khalifate is that I have given it more strict rules.

For dwarves, they were not meant as last names, but as patronymics (like Icelanders and Russians have and other languages probably did use them but they eventually became last names).

User avatar
Gyra_Solune
Posts: 263
Joined: July 29th, 2015, 5:23 am

Re: How about using Context-free Grammar to generate unit na

Post by Gyra_Solune » April 7th, 2016, 8:31 pm

I actually like a lot of these names quite a lot! They have roughly the same 'feel' to them, but they sort of parse a lot better, and there's a lot more possibility, as opposed to currently where you definitely see names repeated quite a lot. It's sort of a minor thing but I'm definitely in support of this, including the last-name part, I like the idea of 'clan' names for the Khalifate.

User avatar
Dugi
Posts: 4867
Joined: July 22nd, 2010, 10:29 am
Location: Carpathian Mountains
Contact:

Re: How about using Context-free Grammar to generate unit na

Post by Dugi » April 7th, 2016, 9:15 pm

It wasn't tested how repetitive are the names created by the current generator. I plan to make a little test of it.

User avatar
Dugi
Posts: 4867
Joined: July 22nd, 2010, 10:29 am
Location: Carpathian Mountains
Contact:

Re: How about using Context-free Grammar to generate unit na

Post by Dugi » April 8th, 2016, 8:30 am

The current generator is no wonder, it seems.
I have made wesnoth generate 100 000 names of each race and applied some analysis to it. The results were quite surprising:

Most common name of the khalifate is Abdul-Din with frequency 1.542%
Most common name of gryhpon males is Korro with frequency 9.025%
Most common name of nagas is Skepz with frequency 1.532%
Most common name of drake males is Gar with frequency 1.344%
Most common name of human females is Mer with frequency 1.736%
Most common name of gryphon females is Keyya with frequency 20.315%
Most common name of dwarves is Tril with frequency 2.051%
Most common name of mermaids is Rân with frequency 3.121%
Most common name of orcs is Prag with frequency 1.042%
Most common name of trolls is Urg with frequency 2.521%
Most common name of nagini is Skepz with frequency 1.511%
Most common name of drake females is Orra with frequency 6.078%
Most common name of ogres is Kar with frequency 2.105%
Most common name of saurians is Xir with frequency 1.352%
Most common name of villages is Bre with frequency 6.68%
Most common name of human males is Addyn with frequency 1.339%
Most common name of elven females is Cel with frequency 3.226%
Most common name of mermen is Tan with frequency 1.597%
Most common name of elven males is Cel with frequency 1.336%

name_analysis.tar.bz2
Code and data used
(5.6 MiB) Downloaded 115 times

User avatar
skeptical_troll
Posts: 413
Joined: August 31st, 2015, 11:06 pm

Re: How about using Context-free Grammar to generate unit na

Post by skeptical_troll » April 8th, 2016, 10:35 am

If I think of most common names in reality, those occurrences are actually not too high. Except of course for Gryphons, but I suspect the reason there is the small basis of syllables?

Btw, I think your new generator works really well!

User avatar
Eagle_11
Posts: 757
Joined: November 20th, 2013, 12:20 pm

Re: How about using Context-free Grammar to generate unit na

Post by Eagle_11 » April 8th, 2016, 11:33 am

TIL there are female drakes.

User avatar
Dugi
Posts: 4867
Joined: July 22nd, 2010, 10:29 am
Location: Carpathian Mountains
Contact:

Re: How about using Context-free Grammar to generate unit na

Post by Dugi » April 8th, 2016, 6:31 pm

I have done a bit more research, trying to get the number of names that it has to be able generate to have lower or identical odds that a unit gets a namesake.

When making a ladder of most common names, the probabilities drop quite sharply:
Spoiler:
Long spoiler short, Khalifate have the best variety in names by far (without Abdul-Din, it would be significantly better). For other races, 20 most common names are used by 10% of population or more. The method of calculating the average quantity of units to recruit until there are namesakes was not very precise, it simply took the 100 000 names and counted them until I found a pair of namesakes, then started counting again and made an average. The result was that the race with the best randomness in names, the Khalifate, needs in average 53 units. Elven males had 40, human males had 35, others need 30 or less (gryphon females are an extreme, needing only 2.2).

Because of this, the markov generator is in the case of races with better names equivalent to a list of about 1000-2000 equiprobable names. In the case of my context-free grammars, this was usually around 300-400, so I will need to improve them.
main.cpp
The program used to calculate the statistics.
(2.03 KiB) Downloaded 112 times
Just a curiosity, some particularly rare names (probabilities below 1:10000):
Aliabrasilia (nagini)
Oshtinashirt (drake female)
Amadieliolia (elf, her father was probably Mozart's fan)
Amphelataria (mermaid, probably amphetamine addict)
Kzuuuuuuuuuu (gryphon male)
Aigaithdrala (dwarf, male!)
Amprarexirax (saurian)
Abrakpsekps (naga)
Baragarbagor (orc)
Addraddrodd (human)
Ruk Ruk Ruk (troll)
(as you probably guessed, I just read a few from the beginning an alphabetically ordered list)

I have also learned that gryphon females have only 5 possible names.

EDIT: Improved the information, you may want to read the text again.
Last edited by Dugi on April 8th, 2016, 7:51 pm, edited 4 times in total.

Spixi
Posts: 53
Joined: August 23rd, 2010, 7:22 pm

Re: How about using Context-free Grammar to generate unit na

Post by Spixi » April 8th, 2016, 6:37 pm

The problem with Markov chains is that there may be loops or dead ends which can cause very long or very short names.

This small example shows, what I mean:

Given are the following names:
LILA
ANNE
ALENA

This produces the following Markov chain:
<start> -> { A, A, L }
A -> { <end>, <end>, L, N }
E -> { <end>, N }
I -> { L }
L -> { A, E, I }
N -> { A, N, N, E }

The probability to generate the name "A" is 4/9, because 2/3 of all names start with A and 2/3 of all names end with A.
The likelihood that a name, which contains a N, contains at least three Ns in a row is (1/2)^3 = 1/8, which makes names like "ANNNA" very common.
If a name contains a I, it will contain at least four characters, because it has to contain the path L -> I -> L -> {A, E, I}



We conclude that names usually do not follow Markov chains. Many names are based on context-free grammars, however. This example shows a simple grammar for old German names:

NAME = {PREFIX} + {SUFFIX}
PREFIX = "A", "Al", "Bal", "Ed", "Eg", "Frie", "Gott", "Hein", "Hin", "Rein", "Sig", "Ul", "Wil", "Win", "Wal", "Wol"
SUFFIX = "bert", "dolf", "drich", "dulin", "dur", "fried", "helm", "hold", "lieb", "ram", "rich", "win"

Example names are: Edwin, Reinhold, Friedrich and Winfried.

As you see, this would generate names with a better quality than the current implementation.

Post Reply