Encodings && font problems

Discuss and coordinate development of mainline and user-made content translations.

Moderator: Forum Moderators

Post Reply
Sofronius
Posts: 50
Joined: November 6th, 2003, 11:14 am
Location: Czech Republic
Contact:

Encodings && font problems

Post by Sofronius »

Ok, Dave. I had downloaded last version via CVS and added line
encoding=UTF-8
to my file czech.cfg.

Its better then it was before, but there are squares instead of some characters. But thats not because of some bug in your code, its because you provide very small unicode font vera.ttf with BfW. It has only about 70 Kb, so anyone should not expect that it has all characters supported by Unicode encoding. Alas, it hasnt for example some characters that are important for my language. I deleted vera.ttf, renamed font file cyberbit.ttf (about 13 Mb - its one of fonts that support nearly all languages) to vera.ttf, so the game loaded this font instead of vera.

The result was :
1) Unexpected slowdown. Maybe you are reopening font file and refreshing its image in memory too often instead of some more effective way of using it ??
2) I saw that all czech characters can be used in game if proper font is supplied - thats good
3) I noticed, that if some phrase, which replaces some weapon name in unit window on the right, ends with non-english characters, this šforeignš characters arent displayed. That is probably bug in BfW code, I think.

I understand that its hard for english-speaking developers to support some strange encodings, because its hard for them to test their work. Well, I think I can spend some time testing. :)
Dave
Founding Developer
Posts: 7071
Joined: August 17th, 2003, 5:07 am
Location: Seattle
Contact:

Re: Encodings && font problems

Post by Dave »

Sofronius wrote: The result was :
1) Unexpected slowdown. Maybe you are reopening font file and refreshing its image in memory too often instead of some more effective way of using it ??
No, I'm loading the font file once and caching it in memory until the game exits. I'm not sure why the slowdown would be.
Sofronius wrote: 3) I noticed, that if some phrase, which replaces some weapon name in unit window on the right, ends with non-english characters, this šforeignš characters arent displayed. That is probably bug in BfW code, I think.
Okay, well send me the translation file, and a link to where I can download your font file, and an example of a string that doesn't display properly, and I'll look into it.

David
Sofronius
Posts: 50
Joined: November 6th, 2003, 11:14 am
Location: Czech Republic
Contact:

Post by Sofronius »

Hi, Dave
Dave wrote:
No, I'm loading the font file once and caching it in memory until the game exits. I'm not sure why the slowdown would be.
Hmm. I noticed slowdown with big font files just when starting game or starting scenario. If I start game and the tutorial I see message :
Opening font file:/fonts/Vera.ttf
about six times. So I thought that.... Chm. Maybe thats some feature of SDL :?

(Later : Okay, I search the sources, I understand why message is printed six times. But I still not understand the slowdown :( )
Dave wrote: Okay, well send me the translation file, and a link to where I can download your font file, and an example of a string that doesn't display properly, and I'll look into it.

David
Honestly, I had tried, but I havent found any usable free font that can be downloaded freely and support my language :cry:
In Linux, I am using some TrueType fonts from Windows and some fonts that were once free for download (i.e. without cost) but which can be no longer officially downloaded as free. Fonts from
http://savannah.nongnu.org/projects/freefont/
have some rowspacing bug at the moment, so they are unusable, I hope they will repare it. :wink:

So I tried some bugtracking on my own as its far more interesting then browsing in vain for free fonts.
Unwanted stripping was found in file config.cpp, function strip, line

str.erase(std::find_if(str.rbegin(),str.rend(),isgraph).base(),str.end());

Changing isgraph to notspace helped me(change similar to last mentioned in CVS - just five rows down). I would be very glad if some change of this sort can be made in CVS. Its still the same - in UTF-8 characters with values over 127 are used, so they should not be stripped away.

I think two more things should be done for full UTF-8 support:
1) All file preprocessing code should be rewritten so it can handle even some strange UTF-8 characters. For example consisting of two bytes - second one being ".
I am afraid that such wide characters are used in UTF-8, so there is a chance we will meet them in some translation one day. You should know it.
But this is of low priority, as no one has encountered this problem yet and it may be so that all problematic characters are from UTF-8 subsets (braill font, math font etc.) that will be never used with this game (we can hope).

2) Because there are no free (as freedom) fonts supporting most languages, its clear, that we cannot even think about distributing fonts for all translations. It would be strange if we do so, anyway.
But we should give users chance to use some of fonts they probably already have (for example with localized Windows). I think there should be option in config file "Path_to_font" or something similar.
And new tag for translation files, something like :
SpecialFontRequired=Yes/No
when Yes is written in language file, that after selecting this language, dialog box will appear with some message like
"Dear user, unfortunately we are not able to supply you with proper font for this translation. If you want translation to be displayed correctly, please fill full correct path to some of yours localized TrueType files (*ttf) to option Path_to_font in file ....blabla"

Its less dirty then renaming some font to Vera.ttf.
And I hope its easy to add it.

Thanks for support and all work you do.
Dave
Founding Developer
Posts: 7071
Joined: August 17th, 2003, 5:07 am
Location: Seattle
Contact:

Post by Dave »

Sofronius wrote: Hmm. I noticed slowdown with big font files just when starting game or starting scenario. If I start game and the tutorial I see message :
Opening font file:/fonts/Vera.ttf
about six times. So I thought that.... Chm. Maybe thats some feature of SDL :?
Well, SDL requires loading of the font file for each different font size used. The function is,

Code: Select all

TTF_OpenFont(const char* filename, size_t size);
So, it has to be called once for every different font size. That's why it gets loaded a number of times.

And that's probably the slowdown: it's cached in memory, and so there'll be like 6 x 13 megs in memory = alot of memory usage.
Sofronius wrote: So I tried some bugtracking on my own as its far more interesting then browsing in vain for free fonts.
Unwanted stripping was found in file config.cpp, function strip, line

str.erase(std::find_if(str.rbegin(),str.rend(),isgraph).base(),str.end());

Changing isgraph to notspace helped me(change similar to last mentioned in CVS - just five rows down). I would be very glad if some change of this sort can be made in CVS. Its still the same - in UTF-8 characters with values over 127 are used, so they should not be stripped away.
Oops. I fixed the bug where it was stripping characters at the front of the string a while ago, but missed fixing the same bug at the end of the string. I have committed a fix to this to CVS.
Sofronius wrote: I think two more things should be done for full UTF-8 support:
1) All file preprocessing code should be rewritten so it can handle even some strange UTF-8 characters. For example consisting of two bytes - second one being ".
I am afraid that such wide characters are used in UTF-8, so there is a chance we will meet them in some translation one day. You should know it.
But this is of low priority, as no one has encountered this problem yet and it may be so that all problematic characters are from UTF-8 subsets (braill font, math font etc.) that will be never used with this game (we can hope).
Yes, this is a problem that I have considered, but am still working on a solution. There could also be character sequences that contain '\n', null, or whitespace (that gets stripped).

I'm not sure how we're going to solve this problem. I'm still thinking about it.
Sofronius wrote: 2) Because there are no free (as freedom) fonts supporting most languages, its clear, that we cannot even think about distributing fonts for all translations. It would be strange if we do so, anyway.
But we should give users chance to use some of fonts they probably already have (for example with localized Windows). I think there should be option in config file "Path_to_font" or something similar.
And new tag for translation files, something like :
SpecialFontRequired=Yes/No
when Yes is written in language file, that after selecting this language, dialog box will appear with some message like
"Dear user, unfortunately we are not able to supply you with proper font for this translation. If you want translation to be displayed correctly, please fill full correct path to some of yours localized TrueType files (*ttf) to option Path_to_font in file ....blabla"

Its less dirty then renaming some font to Vera.ttf.
And I hope its easy to add it.
Yes, I think something like this is the best solution. I'll look into it.

David
Sofronius
Posts: 50
Joined: November 6th, 2003, 11:14 am
Location: Czech Republic
Contact:

Post by Sofronius »

For anyone reading this thread in future :

We have discovered that UTF-8 encoding is no problem for current parsing program code, because character sequences we had been
afraid of are not used in UTF-8, see for example
http://www.cis.ohio-state.edu/cgi-bin/rfc/rfc2279.html
especially end of Page 2.

So UTF-8 encoding is fully supported :D

In less technical speech - we are (with the most highest probability) able to display all characters your language needs, if proper font is supplied.

And you are encouraged to translate the game in some of yet unsupported languages :wink:

Don't be afraid if you haven't ever heard about UTF-8 encoding, or about encodings at all, we can help you so your only task will be to translate texts from English to your language using your favourite text editor on your favourite operating system.
Arndt

Re: Encodings && font problems

Post by Arndt »

Dave wrote:
Sofronius wrote: 3) I noticed, that if some phrase, which replaces some weapon name in unit window on the right, ends with non-english characters, this šforeignš characters arent displayed. That is probably bug in BfW code, I think.
Okay, well send me the translation file, and a link to where I can download your font file, and an example of a string that doesn't display properly, and I'll look into it.

David
This is something I encountered also when using the standard font and a word ending with 'ß'. The 'ß' is not displayed. I will check again, if that is "standard" behaviour and the circumstances if not.
Sofronius
Posts: 50
Joined: November 6th, 2003, 11:14 am
Location: Czech Republic
Contact:

Re: Encodings && font problems

Post by Sofronius »

Arndt wrote:
This is something I encountered also when using the standard font and a word ending with 'ß'. The 'ß' is not displayed. I will check again, if that is "standard" behaviour and the circumstances if not.
You can just send me your translation so I can search for this bug, if you want.... :wink:
Dave
Founding Developer
Posts: 7071
Joined: August 17th, 2003, 5:07 am
Location: Seattle
Contact:

Re: Encodings && font problems

Post by Dave »

Arndt wrote: This is something I encountered also when using the standard font and a word ending with 'ß'. The 'ß' is not displayed. I will check again, if that is "standard" behaviour and the circumstances if not.
Have you tried on a very recent CVS? This should hopefully be fixed now.

David
User avatar
Viliam
Translator
Posts: 1341
Joined: January 30th, 2004, 11:07 am
Location: Bratislava, Slovakia
Contact:

characters required

Post by Viliam »

:arrow: Hi! I have finished a Slovak translation (of the 0.6 version), but there were 6 characters missing in the Vera font:

0x010e, 0x010f: D/d + caron (Ď ď)
0x0139, 0x013a: L/l + acute (Ĺ ĺ)
0x013d, 0x013e: L/l + caron (Ľ ľ)
0x0147, 0x0148: N/n + caron (Ň ň)
0x0154, 0x0155: R/r + acute (Ŕ ŕ)
0x0164, 0x0165: T/t + caron (Ť ť)

So - unless someone has already added these characters in 0.6.99.3, which I have not checked yet, waiting for Win binaries - would you please please add these characters to font? I hope there will be no copyright problems.

Also please add the following characters for Esperanto (will be done in a few weeks):
0x0108, 0x0109: C/c + circumflex (Ĉ ĉ)
0x011c, 0x011d: G/g + circumflex (Ĝ ĝ)
0x0124, 0x0125: H/h + circumflex (Ĥ ĥ)
0x0134, 0x0135: J/j + circumflex (Ĵ ĵ)
0x015c, 0x015d: S/s + circumflex (Ŝ ŝ)
0x016c, 0x016d: U/u + breve (Ŭ ŭ)

:arrow: If the font must be whole loaded in memory, it could be good in future to use different font files for different groups of characters; e.g. one for Latin characters, another for Cyrilic charactes, one more for Chinese/Japanese characters (this would eat a lot of memory anyway). The only place in game where all characters are needed is the "choose language" dialog... perhaps bitmaps could be used there.
Post Reply