precisions on WML grammar

spir · Post by **spir** » October 15th, 2009, 3:49 pm

@ silene

silene wrote:The translation underscore also works when there is no space before it.

Well, I thought so, but parser.cpp states:

static char const *TranslatableAttributePrefix = "_ \"";

So, I rather followed this rule.

silene wrote:Braces are preprocessor symbols. Formulas use parentheses.

Ooops!

silene wrote:spir wrote:
Code: [ Select all ]
value : formula_eval / variable_eval / translation_text / text / base_value
This is hardly complete, as evaluations can be part of strings and strings themselves can be concatened. In fact, concatenation happens before evaluation, so adding spurious splits is a good way to obfuscate scenarios. Quizz: what is the value of $(2*2+1) in WML?

Well, I forgot concat totally.
Also, didn't know eval can happen inside strings. At least, if I understand you correctly, there seems to be no mutual recursion here. Is it all I missed about values?
Two precisions to be really sure:
* Is eval inside string valid for both variable and formula?
* Is it valid inside both quoted text and "free" string (meaning unquoted whatever)?

silene · Post by **silene** » October 15th, 2009, 4:01 pm

spir wrote:Well, I thought so, but parser.cpp states:
Code: Select all
static char const *TranslatableAttributePrefix = "_ \"";
So, I rather followed this rule.

You are mistaking the pretty printer and the parser. When Wesnoth outputs WML (savegame for instance), it puts a space; but it is not required in input. As a matter of fact, it is often avoided in input, since it makes translatable strings as macro parameters painful.

spir wrote:Also, didn't know eval can happen inside strings. At least, if I understand you correctly, there seems to be no mutual recursion here.

It is worse than that (take a look at the quizz again!). Formulas and variable substitution are not part of WML syntax. For instance, both values below are valid:

Code: Select all

"$(2*" + "2) is 4"
"$un" + "it." + "hitpoi" + "nts"

spir wrote:Two precisions to be really sure:
* Is eval inside string valid for both variable and formula?
* Is it valid inside both quoted text and "free" string (meaning unquoted whatever)?

Yes and yes. Keep in mind that WML is completely untyped, there is no such things as strings (or integers or formulas or whatever), just sequence of characters that are interpreted on the fly.

Post by **Anonymissimus** » October 15th, 2009, 8:31 pm

silene wrote:Quizz: what is the value of $(2*2+1) in WML?

I'm taking a wild guess here that the value of this is a missing closing sign somewhere in (of course) core/..., since I had the impression these formulas only work when enclosed in quotes.

silene · Post by **silene** » October 15th, 2009, 8:53 pm

Anonymissimus wrote:I'm taking a wild guess here that the value of this is a missing closing sign somewhere in (of course) core/..., since I had the impression these formulas only work when enclosed in quotes.

No, no error at all; the scenario will load just fine and the value will evaluate to an integer when used. But once you know what the integer is, you will find that your impression wasn't that far from the truth.

Post by **Gambit** » October 15th, 2009, 9:38 pm

5

So your quiz wasn't a trick question?

Now I'm even more confused than when I thought you were trying to prove some point...

Post by **solsword** » October 15th, 2009, 9:38 pm

It's 42, isn't it. Because Wesnoth sees $(2*2+1) and goes: "Oh, there's a +, let me just concatenate these strings, and then I can evaluate this" and then goes "Huh, what was $(2*21) again? Oh right, 42."

"$(2*2+1)", on the other hand, should be 5.

Post by **Gambit** » October 15th, 2009, 9:40 pm

Yeah I just tried it again without my habitual quotes. You're right solsword.

So not putting it in quotes causes it to do string operations rather than math?
And putting it in quotes causes it to do math instead of string operations?

Does the $ work differently in and out of quotes?
I understand what it's doing now but not why.

Post by **solsword** » October 15th, 2009, 9:56 pm

Basically, there are two phases of interpretation. In the first, string operations happen. If you've got quotes, then there's nothing to do in this phase (because it recognizes stuff within quotes as one big string already). If you don't, the addition is interpreted as a string operation and performed.

Then, in the second phase, things like variable substitution and FAI evaluation are performed. They are run on the results of the string operations.

Effectively, putting things in quotes shields them from being interpreted as string operations. Ultimately, the $ doesn't know about quotes, because it's interpreted after the quotes have been taken out of the picture (the string operations phase doesn't pass the quote characters on to the next phase, obviously). So if I have "$" + "unit" + ".id" as my value, the string operation phase reduces that to $unit.id, and the variable substitution phase works with that, putting in the appropriate value.

spir · Post by **spir** » October 15th, 2009, 11:53 pm

solsword wrote:Basically, there are two phases of interpretation. In the first, string operations happen. If you've got quotes, then there's nothing to do in this phase (because it recognizes stuff within quotes as one big string already). If you don't, the addition is interpreted as a string operation and performed.

Then, in the second phase, things like variable substitution and FAI evaluation are performed. They are run on the results of the string operations.

Effectively, putting things in quotes shields them from being interpreted as string operations. Ultimately, the $ doesn't know about quotes, because it's interpreted after the quotes have been taken out of the picture (the string operations phase doesn't pass the quote characters on to the next phase, obviously). So if I have "$" + "unit" + ".id" as my value, the string operation phase reduces that to $unit.id, and the variable substitution phase works with that, putting in the appropriate value.

!Hombre, it's crazy!
Is there something to do for the evaluation happen ~ like what we may expect, meaning transformations apply uniformly (protecting from further transformations with quotes would not be possible for messages often embed $vars).

Think I've read somewhere WML is truly an easy language for newcomers.

I guess, but may be wrong, that all of that belongs to preprocessing, between macro subst. & file inclusion on one hand, and parsing properly speaking on the other. Would you outline a rough but complete schedule from plain source text to actual game engine actions? All the interpretation phases, I mean.
Also, is there a uniform representation of WML code (parse tree reprocessed, nested objects, or whatever). Does the representation mirror the WML code structure?

Post by **solsword** » October 16th, 2009, 1:45 am

spir wrote:!Hombre, it's crazy!
Is there something to do for the evaluation happen ~ like what we may expect, meaning transformations apply uniformly (protecting from further transformations with quotes would not be possible for messages often embed $vars).

Not quite sure what you're trying to ask here... can you rephrase this?

spir wrote:I guess, but may be wrong, that all of that belongs to preprocessing, between macro subst. & file inclusion on one hand, and parsing properly speaking on the other. Would you outline a rough but complete schedule from plain source text to actual game engine actions? All the interpretation phases, I mean.

Well... I'm no expert, so don't rely on this, but here's what I'm pretty sure goes on:

1. The preprocessor runs. This step converts a single file (the data/_main.cfg file) into a giant blob of text with files from all over the place. Depending on whether you're playing a campaign, playing multiplayer, running the editor, etc, different things can get included here (all according to #ifdef statements in the code... it's not like there's magic at this stage beyond the special values like MULTIPLAYER and EDITOR that are used). These various definitions are the reason that there's a loading screen whenever you transition between the various modes, like when you start the editor or start a campaign. Wesnoth has to add the special additional constant to the preprocessor (like EDITOR or whatever the campaign said to define) and re-parse all of the WML, everywhere. When you see the loading page with the blue bar, this is what is happening.

2. Now you have a giant glob of text without any macro stuff in it. At this point, I think, the WML is converted to a tree-like structure, with text at each of the nodes. The nodes have names (what kind of tag they are) and can have any number of values, some of which can be other nodes, and some of which are just plain text. This structure is defined by the "config" class in config.[ch]pp in the source code. Each 'config' object just has a group of children (other 'config' objects, indexed by strings) and a group of attributes (string objects, indexed by strings). If I had to guess, I'd say that by this point, the attribute values have been reduced to simple strings. That is, an entry like "$uni" + "t" appears as the string $unit in the 'config' object, but I don't actually know whether this is the case.

3. Wesnoth runs. Various parts of the C code use various different config objects, pass them around, read and write their attributes and children, and generally wreak havoc. In most cases, when a config object's attribute strings are used, they get expanded first, at which point variable substitution occurs (see the vconfig object defined in variable.[ch]pp in the source, as well as the interpolate_variables_into_string function in formula_string_utils.[ch]pp). The game then acts based on these variables-expanded versions of the preprocessor-processed strings.

Of course, additional complexities are introduced by things like the [insert_wml] tag. And there are some tricky bits, like the delayed_variable_substitution flag for nested events.

spir wrote:Also, is there a uniform representation of WML code (parse tree reprocessed, nested objects, or whatever). Does the representation mirror the WML code structure?

As far as I understand the question and the code, yes. The root config object, along with all of the contained objects and attributes, is the data that corresponds to raw WML. And yes, it pretty much follows it in terms of structure (I think/assume).

Website · Post by **Sapient** » October 16th, 2009, 4:20 am

I think the choice of '+' as a string concatenation operator was probably a very unfortunate choice, as demonstrated by that example of silene's.

However, keep in mind that FormulaAI is an entirely different language than WML.

So if you are debating whether or not WML is a simple language, you should probably not base that evaluation on either Lua or FormulaAI, even though they may both be used within WML for convenience of those who understand them.

For math operations, WML is an awkward language indeed.
But the event code for a typical Wesnoth scenario involves very little math-- usually none at all.

silene · Post by **silene** » October 16th, 2009, 5:12 am

Sapient wrote:I think the choice of '+' as a string concatenation operator was probably a very unfortunate choice, as demonstrated by that example of silene's.

I disagree, '+' is perfectly fine as a string concatenation operator. What is unfortunate is that it didn't occur to FormulaAI developers that it would be a one-liner to avoid this whole issue in the parser (that is, don't ignore '+' if the previous token is not a quoted string, for instance).

Sapient wrote:However, keep in mind that FormulaAI is an entirely different language than WML.

Right, that was the whole point of the quizz: to show that the grammar of WML only cares about strings; formulas are not part of it.

solsword wrote:If I had to guess, I'd say that by this point, the attribute values have been reduced to simple strings. That is, an entry like "$uni" + "t" appears as the string $unit in the 'config' object, but I don't actually know whether this is the case.

This is the case. The string is not simple though, since it also remembers which parts of it are translatable and in which text domains these parts have to be looked for.

spir · Post by **spir** » October 16th, 2009, 10:28 am

This looks like a kid joke in which 1+1=11

. I propose next version of WML be called "Alice" (in Wonderland) (playing with language).

Two comments on all of this.

First, I have the impression it's not possible at all to define a grammar for the value part of configs. There should be instead separate grammars for each phase. The reason, IIUC, is a perfectly valid expression at a later stage can become meaningless because of a transformation applied earlier (eg $(1+$var) would become $(1$var) ?), and conversely (eg "$("+1+"+$var)" would become $(1+$var) ?). Well, actually, I guess the first example would still pass and return eg 12 if $var evaluates to 2, lol!.

Second, if ever you intend to seriously modify the syntax/parsing/evaluaton of attribute values one day in a far fututre, here are some proposals (some may be stupid because I do not understand everything yet, still here they are):

Text is defined as a (conceptuel, semantic) "kind" of value. ("Text" is here to be understood as a collective noun like in "a bit of text"). It applies to all kinds of text, mainly user-displayed, including names, indeed. In fact, AFAIK all textual attribute values are possibly displayed, or am I wrong on this, are there "real" texts, that never get displayed? (*) To be very clear: text is not a data type. But it may be used, indeed, to check value format validity (semantic validation), for attributes that expect text.
Text is always enclosed in quotes, and quotes can only enclose text. As a consequence, a text value made of a single bit of text is always all-enclosed in quotes, and conversely an all enclosed value field can only represent a text.
To further make clear the distinction with ids, use a real, strict, id format to point to identified things such as unit types & event types (I mean an [event] "name" attribute). (**)
Text is always translatable by default. The reason for this is that most if not all texts are user-texts, so that it's stupid to have a special idiom for what is actually the common case. AFAIK, again, all textual attribute types (eg unit name) are translatable, only instances (eg "Kalenz") can be non-translatable. We can let the decision of non-translation to translator teams (they are not stupid, they will probably translate "Lord Thunder", probably not "Kalenz, but this can also depend on specific language/culture); or define a special syntax (eg ""Kalenz"" or 'Kalenz'), or a pseudo-comment, to allow game designers specifying non-translation. It's kind of metadata, anyway.
Concatenation applies first, like now, meaning before evaluation of $ thingies.
$(...) defines a snippet protected from concatenation be performed inside it, exactly like "..." (also possibly from other text operations I do not know of).
Option: Concatenation applies only between text and to-be-evaluated bits (vars, formulas). This would require a kind of tokenization before string operations, to distinguish texts, to-be-evaluated parts *outside* text (there can be evaluations inside, too), and kind of yet undefined value bits (mainly ids, files/dirs, numbers, logical values and other codes such as "human"). The latter beeing not considered for concatenation. Eg "foo"+yes+$baz, which is meaningless anyway, would stay as is (except "foo" unquoted).
Option: formulas can only be stand-alone. Meaning the result must first be assigned to a variable before beeing used inside a more complex value, eg inside a text: x=$(...), and then only {DEBUG_MSG "x: $x"}. This would be much KISSier, I guess, but would break the parallel between vars and formulas.
Option: inside WML proper, I would happily replace AI formulas by single/simple and stand_alone operations in value fields, like "new_hitpoints=$unit.hitpoints+bonus". Simple operations (only one operator(***)) to avoid precedence issues. I guess this would both be KISSy enough and fulfill user expectations and needs.

(*) WML internal ids are not text; filenames are not text neither in this very sense, they are rather kinds of ids used to point to the definition of another game element, exactly like a unit type id.

(**) As a side note, elements' own ids are not attributes in WML, whatever their status on the C++ side, and should not appear as such. While indeed the id of another element can be an attribute value. So, there should be no 'id' key, or synonyms, instead the id may appear in the start tag, which also would be a great help for clarity, lookup, and code review:

Code: Select all

[unit Kalenz]
	name="Kalenz"
	type=...
	...
[/unit]

Syntax highlighting may even highlight what would then clearly be section ids (game element names).

(***) Unary '-' maybe excepted, as well as 'not' if logical operations are introduced. Both are interpreted as "sticky" (apply to next item only).

The Battle for Wesnoth Forums

precisions on WML grammar

Re: precisions on WML grammar

Re: precisions on WML grammar

Re: precisions on WML grammar

Re: precisions on WML grammar

Re: precisions on WML grammar

Re: precisions on WML grammar

Re: precisions on WML grammar

Re: precisions on WML grammar

Re: precisions on WML grammar

Re: precisions on WML grammar

Re: precisions on WML grammar

Re: precisions on WML grammar

Re: precisions on WML grammar