Idea to remove the limits of the current terrain system

Brainstorm ideas of possible additions to the game. Read this before posting!

Moderator: Forum Moderators

Forum rules
Before posting a new idea, you must read the following:
Post Reply
SkeletonCrew
Inactive Developer
Posts: 787
Joined: March 31st, 2006, 6:55 am

Post by SkeletonCrew »

This one just crossed my last post.
Viliam wrote:After reading this thread again, and some thinking...

I see no need to assign any special numbers to terrains in C code (as was suggested at the beginning of this thread). Currently all terrains are loaded from the file "terrain.cfg", and the file "terrain-graphics.cfg" specifies how they look. By adding similar code to user campaign, it is possible to create additional terrains, with almost the same functionality. (It is not possible to create additional 'basic' terrains in user campaigns, which is IMHO not necessary.)
Yes and the idea is to replace the char field with a number field. That's why the C code needs to know the mapping. It maps the old letter a to a number 1234. This way it's also possible to add new terrains to mainline.
Viliam wrote:In C code, the terrains can just be loaded into one big array, and internally referred to simply as "terrain #1", "terrain #2", etc. With regard to WML, there are two important WML attributes. Identifier "id" is a string which allows this terrain to be referred from "units.cfg" -- this is how unit movement points and defenses are mapped to basic terrains. Attribute "char" is one (Unicode) character which represents the terrain in maps, saved games, and for other purposes e.g. in terrain graphics algorithms. (At least WML language allows to use any Unicode character. Maybe C code does not support this yet... IMHO it should.)


And... the longer I think about it, the more I fail to see where is the problem. ;-)
Seriously... with Unicode letters, saved as UTF-8, we have backward compatibility, and almost unlimited supply of available letters for user campaigns. I just hope it is relatively easy to tweak the C code to support Unicode terrain characters.
I'll look into UTF-8 later but from what I remember every part of the code assumes a 1 byte char as input.
Viliam wrote:The other thing is that the current terrain system is somewhat ugly, and deserves to be redesigned. However, "ugly (but functional) terrain system" and "running out of letters for terrains" are not related. We can think about redesigning the terrain system, but we can continue with the existing one without problems, if the C code will support Unicode terrain characters.
Not entirely true, it gets more ugly with more symbols added, the * is used in terrain graphics and a valid terrain char. But my proposal was to fix the running out of numbers part and not the redesign the terrain system.

Regardless which way we go translation or UTF-8, the C code must no longer assume the data it reads is a char. So I start with that part and in the mean time I'll take a look at the possibility for UTF-8.
torangan
Retired Developer
Posts: 1365
Joined: March 27th, 2004, 12:25 am
Location: Germany

Post by torangan »

Have a look at the t_string class. Translations are using UTF-8 so some parts of the code know how to work with it and there are SDL Methods to work with it as well IIRC. Using Unicode in UTF-8 looks like the best option without an extensive engine rewrite. People will just have to use the graphical editor or a properly configured UTF-8 aware text editor. Of course, it'll get harder to remember each character (as in glyph) and its terrain but that'll happen with every extension of the number of terrains.
WesCamp-i18n - Translations for User Campaigns:
http://www.wesnoth.org/wiki/WesCamp

Translators for all languages required: contact me. No geek skills required!
Darth Fool
Retired Developer
Posts: 2633
Joined: March 22nd, 2004, 11:22 pm
Location: An Earl's Roadstead

Post by Darth Fool »

torangan wrote:Have a look at the t_string class. Translations are using UTF-8 so some parts of the code know how to work with it and there are SDL Methods to work with it as well IIRC. Using Unicode in UTF-8 looks like the best option without an extensive engine rewrite. People will just have to use the graphical editor or a properly configured UTF-8 aware text editor. Of course, it'll get harder to remember each character (as in glyph) and its terrain but that'll happen with every extension of the number of terrains.
No. Using Unicode and UTF8 will also require a major rewrite. The current code implicitly assumes all over the place that terrains are defined by 8bit chars, and that c++ basic strings can be used as vectors of terrains where each char in the vector is exactly one terrain. It is very ugly. Changing to unicode or UTF-8 will therefore, from a coding perspective, be at least as difficult as moving to an integer. Once SC completes the transition from the current char to a different base type (in this case int, but I would suggest making it a typedef so that it will be easy to change later.), other modifications should be relatively simple, whether it is an aliasing system like he plans on doing, or adding a comma seperated map that uses the raw numbers.
SkeletonCrew
Inactive Developer
Posts: 787
Joined: March 31st, 2006, 6:55 am

Post by SkeletonCrew »

Darth Fool wrote:
torangan wrote:Have a look at the t_string class. Translations are using UTF-8 so some parts of the code know how to work with it and there are SDL Methods to work with it as well IIRC. Using Unicode in UTF-8 looks like the best option without an extensive engine rewrite. People will just have to use the graphical editor or a properly configured UTF-8 aware text editor. Of course, it'll get harder to remember each character (as in glyph) and its terrain but that'll happen with every extension of the number of terrains.
No. Using Unicode and UTF8 will also require a major rewrite. The current code implicitly assumes all over the place that terrains are defined by 8bit chars, and that c++ basic strings can be used as vectors of terrains where each char in the vector is exactly one terrain. It is very ugly. Changing to unicode or UTF-8 will therefore, from a coding perspective, be at least as difficult as moving to an integer. Once SC completes the transition from the current char to a different base type (in this case int, but I would suggest making it a typedef so that it will be easy to change later.), other modifications should be relatively simple, whether it is an aliasing system like he plans on doing, or adding a comma seperated map that uses the raw numbers.
Indeed both solutions first need the engine to be changed but _if_ UTF-8 could work it has even less restrictions as my orignal proposal. Also almost an unlimited amount of tiles but also the entire range available in every map. And since it seems Wesnoth already has some UTF-8 capabilities it might be the best solution, but I first want to look at it.

Indeed a lot of code assumes a char, and the basic definition was in map.hpp so terrain.hpp didn't even use the TERRAIN typedef :shock:
I made 2 typedefs 1 for the WML data type and 1 for the internal data type since I planned to work with a translation system. I'm keeping this, the separation can't hurt. The problem at the moment is, things are going too good, unsigned long and char are compatible in the current used range. So I'm now busy changing the range and following the compiler warnings. I still hope to have things a little bit working tomorrow so I can send a first patch. Let's see what happens when all the read chars, after conversion, are multiplied by 256.

@torangan: thanks for the info.
User avatar
Viliam
Translator
Posts: 1341
Joined: January 30th, 2004, 11:07 am
Location: Bratislava, Slovakia
Contact:

Post by Viliam »

Having a new C type "terrain_t" will be useful -- it gives some level of abstraction. If it would be a Unicode character, then the conversions from and to WML would be simple. Technically, it should be a 32-bit integer, because Unicode code points are from 0 to 10FFFF. (I do not know if C supports 32-bit integers on all platforms we use. Otherwise we could use array of 4 bytes or something like this. Anyway, the majority of program should use just a "terrain_t" black box.)


Algorithm to convert Unicode code point "u" to string:

Code: Select all

if (u <= 0x7f) {
	print u;
} else if (u <= 0x07ff) {
	print 0xC0 | (u >> 7);
	print 0x80 | (u & 0x7f);
} else if (u <= 0x0ffff) {
	print 0xE0 | (u >> 14);
	print 0x80 | ((u >> 7) & 0x7F);
	print 0x80 | u;
}
Algorithm to convert first chars in string "s" to Unicode code point:

Code: Select all

if (0x00 == (s[0] & 0x80)) {
	u = s[0];
} else if (0xc0 == (s[0] & 0xe0)) {
	u = ((s[1] & 0x1f) << 7) + (s[0] & 0x3f);
} else if (0xe0 == (s[0] & 0xf0)) {
	u = ((s[2] & 0x0f) << 14) + ((s[1] & 0x3f) << 7) + (s[0] & 0x3f);
}
I hope the algorithms are correct. Please check them.
SkeletonCrew
Inactive Developer
Posts: 787
Joined: March 31st, 2006, 6:55 am

Post by SkeletonCrew »

A 32-bit storage shouldn't be a problem.
Thanks for the algorithm, but I do hope that SDL has a library function to do this conversion.

*goes back to coding*
zaimoni
Posts: 281
Joined: January 27th, 2005, 7:00 am
Location: Linn Valley, KS U.S.A.
Contact:

Post by zaimoni »

libiconv has the functions to do that conversion, and is already part of the Wesnoth project (as a prerequisite for gettext).

As for SDL...no, it doesn't. (I've looked at creating a thin wrapper, and it's not there to be wrapped).
SkeletonCrew
Inactive Developer
Posts: 787
Joined: March 31st, 2006, 6:55 am

Post by SkeletonCrew »

zaimoni torangan mentioned there might be something in SDL, what I wanted to say was that I thought there would be something in Wesnoth which could decode UTF-8. And I'm not waiting to reinvent the wheel.
:oops: Of course get text or one of its dependencies would be the logical place. Thanks for the info. As for my promise to look into UTF-8, I haven't found the time yet. I'm still busy with changing the internals of Wesnoth.

*goes back to look whether the compiler aborted or finished...*
SkeletonCrew
Inactive Developer
Posts: 787
Joined: March 31st, 2006, 6:55 am

patch

Post by SkeletonCrew »

I hoped to be able to send a patch to PWO today but it's not really worth it. For those who are really interested I'll post the patch here.

The good: it compiles
The bad: everything that gives trouble is commented out (including the editor)
The ugly: the patch :oops:

I'm still working on it but the terrain.cpp and terrain.hpp are finished (for now).
As soon as I've a patch for PWO this one will be removed, it's not worth the storage space.
For those not faint of heart, you could compile and try to run it, but I haven't tried that.

EDIT: removed patch.
Last edited by SkeletonCrew on October 10th, 2006, 7:23 pm, edited 1 time in total.
mog
Inactive Developer
Posts: 190
Joined: March 16th, 2006, 2:07 pm
Location: Germany
Contact:

Post by mog »

I only have to deal with terrain letters in terrain-graphics.cfg et al, but both proposals (the map-chars-to-integers and UTF8) would be a major pain to work with.

Having to use (potentially large) numbers in terrain-matchers would destroy any remaining pretense of readability of these files.

Code: Select all

{SWAMPADJSINGLE Yw CnQoKNhaHAmb& !CnQoKNhaHAmb& 100 reed-castle}
is bad enough, but imagine that one with each letter replaced by a number+delimiter.

The UTF8 solution would only give us a handful of easily memorable/typable letters (at best), and make editing WML much more difficult for additional chars (what was the code for Д again?). I don't see a real advantage of UTF8 over a multiletter system. Typing AltGr+0764 is not faster/easier to remember than e.g. "Vxy". The internal changes would be comperable, only that Unicode has more pitfalls for the developers.

Myself, I would prefer a multiletter system with some sort of prefix-matching, so "Vxy" could stand for a specific village type, but could be matched in WML by "Vxy", "Vx" or "V". If one assigns the codes cleverly, this could even make the WML code more readable, as you could match multiple terrains with one char.

The system suggested by SkeletonCrew could be combined with this system. You would just use strings instead of integers as terrain ids.
Aurë entuluva!
Darth Fool
Retired Developer
Posts: 2633
Joined: March 22nd, 2004, 11:22 pm
Location: An Earl's Roadstead

Post by Darth Fool »

mog wrote:I only have to deal with terrain letters in terrain-graphics.cfg et al, but both proposals (the map-chars-to-integers and UTF8) would be a major pain to work with.

Having to use (potentially large) numbers in terrain-matchers would destroy any remaining pretense of readability of these files.

Code: Select all

{SWAMPADJSINGLE Yw CnQoKNhaHAmb& !CnQoKNhaHAmb& 100 reed-castle}
is bad enough, but imagine that one with each letter replaced by a number+delimiter.

The UTF8 solution would only give us a handful of easily memorable/typable letters (at best), and make editing WML much more difficult for additional chars (what was the code for Д again?). I don't see a real advantage of UTF8 over a multiletter system. Typing AltGr+0764 is not faster/easier to remember than e.g. "Vxy". The internal changes would be comperable, only that Unicode has more pitfalls for the developers.

Myself, I would prefer a multiletter system with some sort of prefix-matching, so "Vxy" could stand for a specific village type, but could be matched in WML by "Vxy", "Vx" or "V". If one assigns the codes cleverly, this could even make the WML code more readable, as you could match multiple terrains with one char.

The system suggested by SkeletonCrew could be combined with this system. You would just use strings instead of integers as terrain ids.
The work that SkeletonCrew is doing right now is really the prerequisite to any expanded system, not just his proposed aliasing. It is very similar to what I started a long time ago in trying to get a comma delimited list, but got bogged down in the details and RL. If he gets this base system up and running, it should be straight forward to implement other methods be it multiletter or comma seperated, depending on the general concensus of what would be easiest.
SkeletonCrew
Inactive Developer
Posts: 787
Joined: March 31st, 2006, 6:55 am

Post by SkeletonCrew »

mog wrote:Having to use (potentially large) numbers in terrain-matchers would destroy any remaining pretense of readability of these files.

Code: Select all

{SWAMPADJSINGLE Yw CnQoKNhaHAmb& !CnQoKNhaHAmb& 100 reed-castle}
is bad enough, but imagine that one with each letter replaced by a number+delimiter.
That's my proposal for that specific part yes. That's why I also want to use ranges 1000-1999 are villages 3000-3999 is water. So it's little bit easier to remember.
mog wrote:Myself, I would prefer a multiletter system with some sort of prefix-matching, so "Vxy" could stand for a specific village type, but could be matched in WML by "Vxy", "Vx" or "V". If one assigns the codes cleverly, this could even make the WML code more readable, as you could match multiple terrains with one char.
Does this also mean you want a variable number of characters for one terrain? If so a separator is almost required, otherwise the decoder can become pretty complex. Like I said before maybe there should also be a terrain transition editor so the artist involved don't need to type this horrible code.
mog wrote:The system suggested by SkeletonCrew could be combined with this system. You would just use strings instead of integers as terrain ids.
If that's easier to remember it's also possible.

Like Darth Fool said, I'm busy changing the internal system so it supports an unlimited amount of terrains. At first it seem to go easy, too easy, and now I know why Darth Fool said it's was going to be difficult... It'll take longer than expected but I still want to finish it. Once that's finished and I start to change the WML part.

@mog BTW I really like the new swamp
User avatar
Viliam
Translator
Posts: 1341
Joined: January 30th, 2004, 11:07 am
Location: Bratislava, Slovakia
Contact:

Post by Viliam »

mog wrote:I don't see a real advantage of UTF8 over a multiletter system. Typing AltGr+0764 is not faster/easier to remember than e.g. "Vxy". The internal changes would be comperable, only that Unicode has more pitfalls for the developers.
Unicode is already supported, but not consistently. The WML files are in UTF8 format, but seems like not all parts of program agree on this.

Advantage is: backward compatibility with existing maps and WML scripts.
User avatar
Jetrel
Posts: 7242
Joined: February 23rd, 2004, 3:36 am
Location: Midwest US

Post by Jetrel »

SkeletonCrew wrote:Does this also mean you want a variable number of characters for one terrain? If so a separator is almost required, otherwise the decoder can become pretty complex. Like I said before maybe there should also be a terrain transition editor so the artist involved don't need to type this horrible code.
A separator, such as a comma, would be no problem.

G2,G2,G2,G2,M1
G2,G1,G1,G1,M1
G2,G1,V1,H1,M2
G1,G1,W1,H2,H1

Still very easy to read.


In terms of sanity, I support the aliasing system which uses integers internally, and defines them all in a WML file. As far as the specific aliases, we could in fact not only add additional letter representations, but also preserve backwards compatibility, if we allowed both:
• Single or Multi-Letter aliases
• Multiple aliases for a given terrain

Thus you might have:

Letter Number Group Description
A Vhs 1000 Village human (snow) hill village
a Vhh 1001 Village human hill village
B Vda 1002 Village desert village (adobe)



One niggling issue is that if we had different lengths to the representations, we might run into misalignment in the visual appearance of the map file - something which helps when people look at it, since under the current system a map is a square grid in the text.

However, if we collapse "space" characters (not all whitespace, since we still need "return/EOL"), then people could correct for this by hand, and we could further negate this by making all of the standard terrains a nice size with room to grow, such as 3 letters.
SkeletonCrew
Inactive Developer
Posts: 787
Joined: March 31st, 2006, 6:55 am

Post by SkeletonCrew »

Jetryl wrote:
SkeletonCrew wrote:Does this also mean you want a variable number of characters for one terrain? If so a separator is almost required, otherwise the decoder can become pretty complex. Like I said before maybe there should also be a terrain transition editor so the artist involved don't need to type this horrible code.
A separator, such as a comma, would be no problem.
I agree but
mog wrote:is bad enough, but imagine that one with each letter replaced by a number+delimiter.
I think mog wants a string like XVaVbVc where the X means we have a three character string. That would be horrible to decode (and do proper error handling). I also doub it would be human readable. If it becomes Vaa,Vb,Vc it would be easier to decode and read.
Jetryl wrote:In terms of sanity, I support the aliasing system which uses integers internally, and defines them all in a WML file. As far as the specific aliases, we could in fact not only add additional letter representations, but also preserve backwards compatibility, if we allowed both:
• Single or Multi-Letter aliases
• Multiple aliases for a given terrain

Thus you might have:

Letter Number Group Description
A Vhs 1000 Village human (snow) hill village
a Vhh 1001 Village human hill village
B Vda 1002 Village desert village (adobe)



One niggling issue is that if we had different lengths to the representations, we might run into misalignment in the visual appearance of the map file - something which helps when people look at it, since under the current system a map is a square grid in the text.

However, if we collapse "space" characters (not all whitespace, since we still need "return/EOL"), then people could correct for this by hand, and we could further negate this by making all of the standard terrains a nice size with room to grow, such as 3 letters.
In that case the first letter could also be a UTF-8 glyph (I think). If people like a multiple letters better than a number, the number column could disappear. Internally we could use the UTF-8 value (which is unique per glyph) as internal integer. That is if UTF-8 is possible, but I guess it will. If we do it this way the entire first proposal of a lookup table system would no longer me required.
If we're going to change the map file format, I want to uses spaces/tab to keep the alignment yes. The only thing is, a space character is a valid map character at the moment.
Post Reply