WML semantic structure -- parsing

spir · Post by **spir** » October 7th, 2009, 10:06 am

While designing a personal WML parser (see post to come about "statistics ... autoplays"), I stepped on an (IMO interesting) aspect of such a language. It may be trivial for you, anyway if you have hints on pointers on the topic...

The overall WML grammar is extremely simple and easy to parse. But grammatically valid WML code is no valid WML code, so that parsing it only from the grammatical point-of-view is only part of the job.
What I mean is that

Code: Select all

    [event]
        speaker=leader1
        text="Hello, world!"
        ...
    [/event]

is grammatically correct but is no WML nonetheless.
So, I am talking of semantics? I wrote once the following and was unable to find the error even after consulting multiple times the matching page of the wiki.

Code: Select all

    [event]
        type=moveto
        ...
    [/event]

This is semantically sensible, I guess, but still no WML.
Actually, the point is very similar to natural languages. The set of grammatically correct sentences is infinitely bigger than the one of semantically sensible ones; and the latter is far bigger than the set of sentences that sound right to a native speaker's ear. (As an example, my writing in english may look weird, if not charabia, to you.) Also think at the famous (and for foreigners infamous

english idioms.

WML's specification must include a semantic structure level. At first view, I think this needs to define:

The keyword identifying possible/expected elements (key or tag) in a defined section (type).
Whether each element is itself defined by a simple key=value property, or a sub-section of its own.
If it a unique, optional, or repeted, element.
In case of a (sub-)section, the nested definition of this section.

This, for each possible section, starting from the top-level ones. Am I right on this?

Upon parsing, I first thought and tried to combine both grammatical and semantic levels. Meaning match func must include additional parameter(s) such a key-word. Then, I realised that this leads to a fixed order (for parsing grammars do not allow specifying unordered items, match funcs usually can sequences, not sets) -- which is wrong for WML.

So, I think at simply adding a semantic validation phase. Now, remains the question of how to specify the expected semantic structure to such a validation func. I thought at

a very complicated tuple parameter sequence that tell all of that.
a mini-language, itself to be parsed.

A typical parameter would look as follows, for the case of an [about] section (simplified and maybe wrong):

Code: Select all

(about, OPTION, SECTION, [("title", UNIQUE, PROPERTY), ("text", OPTION, PROPERTY), ("entry", ONEORMORE, SECTION, [("name", UNIQUE, PROPERTY), ("comment", OPTION, PROPERTY)], ("end_text", OPTION, PROPERTY)])

I prefere to use a custom format:

Code: Select all

about ? s
	title 1 p
	text ? p
	entry + s
		name 1 p
		comment ? p
	end_text ? p

Much nicer to my eyes, but needs to be parsed (and produce anyway a param structure analog to the one above). [Note: transforming an indented structure into a delimited one is no issue, I have custom func that -- and reverse, too]

Also, sub-sections may be defined apart, but this makes use much more complicated, probably.

I'm really curious of your comments about all of this.

Post by **Soliton** » October 7th, 2009, 12:15 pm

Check out data/tools/wesnoth/wmlgrammar.py; there is also a wml parser in the same dir.

Post by AI » October 8th, 2009, 7:09 am

Wmlgrammar is unfinished though, in need of an overhaul. When I have some time, I'll start with cloning frogatto's schemata, then add functionality as needed.

Post by **solsword** » October 8th, 2009, 5:46 pm

spir wrote: WML's specification must include a semantic structure level. At first view, I think this needs to define:

The keyword identifying possible/expected elements (key or tag) in a defined section (type).

Whether each element is itself defined by a simple key=value property, or a sub-section of its own.

If it a unique, optional, or repeted, element.

In case of a (sub-)section, the nested definition of this section.
This, for each possible section, starting from the top-level ones. Am I right on this?

Correct me if I'm wrong here (and I don't know the codebase, so I may well be) but I don't think that there's any explicit semantic structure specification for Wesnoth. "Which keys are used by the program at run time" certainly forms an implicit semantic specification: i.e. valid semantics are semantics which don't break the game (and/or which don't provide unnecessary tags). But do you really need semantic validation? Without one, errors about missing keys and such are pushed to run-time (i.e. when your statistics bot can't find a key it needs, it will complain), but trying to catch them at parse time seems like a lot of extra work for very little gain (well, I guess it would speed up debugging of certain kinds of mistakes, but with a bit of caution, those mistakes are rare in any case). Also, note that you can't be strict about this, because Wesnoth allows arbitrary extra keys in addition to whatever is used by the engine. As a concrete example, I use the 'translation_note' key in several places, as a simple way to insert a translatable string that isn't used for anything, but that will be visible to anyone translating my campaign. Since it's not a key defined by Wesnoth, it gets ignored by the engine, but when translatable strings are extracted, because I give it a value starting with _", it gets bundled into the translation files (at least, this is what I assume and what I've been told).

As nice as it would be for your script to be assured of valid WML coming in, it's almost certainly less work to use exceptions at runtime to handle invalid WML input than to validate all WML against a standard that doesn't exist explicitly. Of course, if I'm wrong about that explicitness...

Post by AI » October 8th, 2009, 9:55 pm

You can use # po: comments for that.

Anyway, semantic validation has its uses. wmlgrammar is far from finished, but I found quite a bit of dead/malfunctioning WML while writing it.

spir · Post by **spir** » October 10th, 2009, 7:45 pm

Right, I've done it anyway, at least a draft version.
There are two modules, one for grammatical parsing, one for semantic validation. It's rather extensively commented and there are test examples in each file. Instead of once more writing about it, here are the top file doc texts for each.

grammatical parsing:

Code: Select all

''' WML grammatical parser -- object
	
	purpose:
		Use this parser to read documents grammatically structured like WMl.
	
		Only the plain syntactic format is parsed here. Morphology may be
		later introduced (esp. for value) if ever I get better specification.
		Also maybe more elements -- don't know exactly what's actually
		defined in WML, need to get better info. Presently, only the basic
		section/attribute syntax is parsed. But is there more in WML?
		
		The actual *content* of the file, reflecting the meaning,
		is not considered at all. In other words, tag names, atribute keys
		and values may be whatever you want. The parser can thus process
		any kind of WML-like document, but it tells nothing about
		what it represents: may be a picture, an email, a table or...
		a scenario config ;-)

		Another phase is semantic validation:
		The validator will actually check that appropriate attributes and
		sub-sections are present in each section, according to a reference
		called semantic schema.
		See the module WML_semantic_validation.
	
	use:
		from WML_grammatical_parser import parser
		tree = parser.parseTree(source_doc)
		print tree
		print tree.treeView()	# nicer!
	
		See tests for concrete example.
		
	output:
		You'll get a really nice parse result tree (I guess ;-).
		The tree's root node is a Section, actually mirroring
		the document's top section.
		In a section, any element definition may be either a simple
		Attribute or a sub-Section. Both sections and attributes have a name.
		An attribute holds a value, a section the list of its elements.
		
		See class documentations and comments for details.
		
		text = parser.parseTree(my_text)
		(title,body,footer) = text.elements
		paragraphs = body.elements
		print paragraphs
		# ==> [paragraph:foo\n, paragraph:bar\n, paragraph:baz\n]
		print paragraph[1].value
		# ==> bar
		# Section also builds "under the hood" an element dict "register"
		# to give access by name...
		print text.register["title"]		# ==> "an example text"
		# and even by pseudo-attribute through overloading of __getattr__
		print text.title					# ==> "an example text"
		
		You may also have a look at WML_semantic_validation for
		another example of tree/node use and (recursive) walk.
	'''

semantic validation:

So, if anyone interested, I can post or send the files. Anyway, i'll put it online when a bit polished.

About the purpose of validation, well, what's the purpose of XML validation? I think the job done here has to be done anyway by the engine if there isn't a specific phase for it. Also, the engine can be confident, no additional checking is necessary. But much more can be integrated there, namely post-parse action to build something closer to a semantic tree than a plain parse tree -- but this is another story and IMO belongs to yet another text processing phase.
You may have a look at pijnu.

A further _huge_ advantage of such separate phase is the ability to let the language evoluate freely and easily: as long as the plain, basic, syntactic structure (in fact this is the simple bit) doesn't change, one needs only to change the semantic schema. If the validator further reorganises data, you may even get the same resulting parse tree as before.
With an approach such as seems to be the one in wesnoth, where maybe semantics is implicitely hard-coded in the engine, any slight change may lead to loads of unpredictable consequences. Well, that's my opinion.

Post by **solsword** » October 10th, 2009, 8:44 pm

Awesome.

I guess my previous post was a bit discouraging, but if someone *else* is going to do the work of writing out the syntax requirements, I'm not going to stop them; I'll cheer them on! I just thought that writing out the rules would be a big mess.

However, if it works, a semantic validator would be super-useful. Who else here has ever written:

Code: Select all

    [clear_variable]
        variable=not_going_to_be_cleared_no_sir-ee
    [/clear_variable]

This should be mainlined as a useful debugging tool.

spir · Post by **spir** » October 11th, 2009, 3:02 pm

Holà,

I have stepped to the next stage, namely the study of a "meta-language" to express WML's semantic schema -- but this time "meta-using" WML itself. Below a summary of my thoughts on the topic.

thanks for reading, denis

WML semantic schema language

Study to define a language able to clearly specify a semantic schema for documents formatted according to a WML-like syntax.

~ introduction ~

The purpose of semantic validation is that the very general syntax used in WML only basically structures a document. This means that it can be grammatically correct whatever its actual content and meaning, if any.
A semantic schema will further define what can be found where and how. Precisely, the possible content of each possible section is specified. This mainly means describing the use of a precise vocabulary of tags, keys, values. Obviously, this schema is in direct relationship with the sense carried by documents, the reason why it is called semantic schema.

A semantic schema document is itself a text written using a grammar such as the one studied here. The document is parsed into an object (parse tree or convenient data structure) representing the schema. This object allows creating a specific validator able to check documents of this kind, id est which are supposed to comply with the schema. The validator can then be fed with parse trees of such documents, and return validation outcome or report.
Another point of view is that the validator checks the expectations of the program that will further use the data extracted from a document: in the case of BfW, the game engine. From a human standpoint instead, a document that passes the semant test is sensible, meaningful.

WML-like syntax, as said above, mainly means that documents are strutured in sections having a name (tag name in WML) and a body which itself is a collection of nested definitions (I will call them "items"); items can themselves be either attributes (assignments of parameters) or sub-sections.
But the actual structure can well have another form than WML's tag-based one; it can well use delimiter tokens (eg "{...}" in C), or indentation (like python). What a semantic validator actually processes is an abstract representation of the source doc, which should be identical in any case.
As a consequence, when grammar and semantic levels are clearly separated, and both processed before further use of the data, it is well possible
* to modify the syntax without affecting the semantics (only the purely grammatical parser has to comply)
* to change, extend, restrain, the semantics without any change of the syntax (only the semantic validator must cope with the modification, meaning developpers only need editing the schema document)

Note that semantic schemas allow describing any language / type of document using this kind of syntax. This could well be a format for vector drawings, email posts, or styled texts. We will indeed mainly concentrate on proper WML as used to describe BfW games.

The semantic schema language will itself use a WML-like syntax.

~~~~~~~~~~~~~~~~~~~~~~~

~ base schema ~

A WML semantic schema is mainly a list of section schemas, each including individual elements. For many document types, this would be enough. Well, I have defined a base format for these simple and most common cases. Here is an example to introduce it concretely.
Imagine a game in which units are defined as follows:

example unit definition:

The matching schema for unit sections would look like this:

unit definition schema:

I guess you can more or less understand how this works at first sight. Some comment, anyway:
* Sections each have their own separate schema. Nesting would mirror a document structure, but would soon build rather complicated schemas. On the other hand, separate schemas imply constant navigation in the schema document itself.
* Sections appear twice: once in their own schema, once referenced as item of the super-section -- except for the top section.
* Section schema titles are the section names. Instead of [section], for instance, with a name attribute. This is intended to find them easierly. On the other hand, there can only be section schemas at the top level of a schema document.
* Attributes are not treated equally, in that they do not have their own separate schema. The reason is they cannot be further nested, so that the definition ends there. In other words, attributes are terminal document elements, meaning leaves in the parse tree, while branches are sections. (And the root node of the parse tree represents the top section.)
* Attribute schemas have an additional attribute, namely "value". May be used or not.
* There may be more additional attributes for attribute or section schemas, so as to perform other kinds of checkings; or to set additional attributes (flags, counts, computations); or even to launch actions defined else where (external funcs).
* Items can be repeted or optional. Both are common is BfW.

When the validator walks a document parse tree, it first steps on a top section. There, it checks that all required (non optional) items are here as expected. Then, for each item, it will perform whatever controls are specified, such as about an attribute value.
When a section holds sub-sections, the validator will recursively check the said sub-sections.
As result, a validator may basically return a success/failure outcome; and stop at first error. It may also go on checking anyway. It may write and/or output a validation report at each level and step, or only on issues, or whatever.

~~~~~~~~~~~~~~~~~~~~~

There are numerous choices and issues to introduces further features in the schema languages. Here is a kind of catalog.

-1- contextual section names

It often happens that distinct section types have the same name, while appearing inside different super-sections and having both different content and meaning. This is indeed a language flaw

. (For the user.) Think at [unit] for a rather extreme example.
But we can jump on this opportunity to introduce namespaces in the schema language, so as to contextualise section names, eg [scenario.side.unit].

When stepping on a [unit] section inside a side, itself indeed inside a [scenario], the validator will first look for a [scenario.side.unit] section schema. If not found, it will then look for [unit]. Is it clear?
A major advantage of this is that when "parsing" the set of section schemas we immediately know where to look for their context in the schema itself, and where each matching actual section takes place in a document.

On the other hand, contextual names must not be required, for there may well be identical sections, both in content and meaning, appearing in various super-sections. This is another big advantage of separate section schemas: identical section types (both in content & meaning) need to be defined only once.

-2- free items

A language may allow undefined items to appear in sections. This is the case of WML, in which custom attributes are simply stored, then ignored by the game engine, but further accessible in WML code; extremely handy, sure! In other cases, it happens that the language cannot predefine what kinds of items are to be expected.
So, as a default, the validator would not care about additional items, meaning not check that all items names are listed in the schema. There may be a config parameter to set this behaviour.
For WML code, ignoring undefined items may be a "primitive" rough solution for issues such as variable setting (a whole bunch of operations can be performed) in [variable], or for standard filters (loads of possible criteria) inside various sections. But beeing able to specify a choice is indeed better.

-3- choice

A simple solution to cope with choices, meaning one of several kinds of items may appear in a defined context, is to have a [choice] section refer to a separate schema, like for section schemas:

choice schema:

Comments:

"operation" is a name chosen by the author of the schema document to cope with the present issue, it does not belong to WML.
The whole choice may be optional or "repete-able"; but this is defined at the super-section level, where the choice is referenced.
Choices may contain both attributes and sections, even a mix of them.

An issue is that choices appear in the global schema at the same level as section schemas. It's rather confusing, not to evoke name conflicts. This may be adressed by an additional level of structuration for schema documents -- see below.

-4- sequence

WML for game BfW scripting does not hold sequences AFAIK. Meaning cases where the order of items is meaningful, for instance the structure of a wiki article -- or of a game map

.

We may introduce a [sequence] tag, similar to choice, to cope with such cases. But I guess it would be nice to let unorder (*) be the default, so that we do not need an additional [unorder] tag for that. Even better, there may be a config parameter for the validator to say whether order or unorder is the default; while the default value for this parameter itself would be unorder.

(*) We cannot use the term "set" here, as opposed to "sequence", for a set (in the technical sense of the word) is supposed to be a collection of all different items, while here there can be repetitions. I don't know actually how to call what a WML section holds; I know of no name for such collections in maths or in the programming field. Please tell me if you know one (or one good candidate) -- I need it for another topic. I think at using "register".

-5- top section / root node

It would be really helpful in numerous cases to tell the validator what a top section, actually o root nood of the parse tree, should be. There several reason for that:
* Simply check it like other sections. (eg check the top section is [campaign])
* Allow having multiple top sections of the same kind. (eg several [unit] type defs in same doc) (or is we simply want to validate several unit type defs at once). In this case the parse is actually a "forest" (this is actually the technical word

see eg http://en.wikipedia.org/wiki/Forest_(graph_theory)).
* Allow several top level section types (eg [scenario] vs [multiplayer]).

But an even more handy feature for the developper is the ability to tell the validator which sections are to be considered as top-level ones, to search for them in parse tree, and to start validating there only. It does not seems a difficult feature to implement; but what a help! This allows writing schemas for section types independently, then try them by simply feeding the validator with any config file...
I'm thinking at a section such as:

indication of root node(s):

If no search is required, indeed the validator considers the actual root node as a top-section.

-6- validator config

There may be config parameters to drive the validator, such as:

behaviour when error found
verbosity of report
refuse/check/inform about free items
perform some optional checks or not
perform additonal actions or not

-7- schema superstructure

An additonal layer of structure for the semantic schemas would provide higher clarity when several of the above features are used. For instance:

schema super-structure:

(This would also remove the need for a "type" attribute (=section/choice/sequence) inside schemas.)

Well, I have no example illustrating all of this yet; but tried to be as clear as possible.
Comments?

[edited details -- oct 12]

spir · Post by **spir** » October 11th, 2009, 4:26 pm

Hello again,

I realize now that when replying to questions about the purpose of semantic schema & validation, I simply forgot to mention what's probably the most important: a reference.

I guess WML crually misses any kind of reference manual or more technical specification; to deal as "bible" for users, designers, developpers and all such weird people... who sometimes would like to know how actually works what they try to play with (I mean here WML for game design), and how they should manage to play better.
This is at least as important as feeding the game engine with reliable, preprocessed, meaningful, data, so as to allow it concentrating on its real job(s).

In other words, I think helping the machine is great, helping humans has no price.

It's not astonishing that many wiki pages on WML are so messy, or even terrible (IMO). There is no reference to help the authors and do a better jor, or deal as an incentive for others to improve our documentation. And it's not astonishing there is no reference guide, for there is no specification. Writing a reference is a job nobody wants to do, probably, because in absence of specification or more precisely semantic schema, it's a huge, endless, painful, job for such a machine as WML.
In absence of such a document or a semantic validation phase, the only reference is the game engine

Meaning that both the sense of a tag and how it should look is only determined by the way the engine copes with the data the tag generates.
I would enjoy taking part to this myself -- waiting for the day when it's possible I'll go on playing on this semantic track.

spir · Post by **spir** » October 11th, 2009, 10:01 pm

Holà again,

My brain is exploding these days

The previous post has let me further think about this question of lack of real reference for WML. And the result is:

let's generate automatically draft semantic schemas from typical code
and
let's generate reference guide templates from semantic schemas

-1- semantic schema

The main issue for it (probably the main reason why we have no reference yet) is that the amount of unpleasant work is huge. But (I guess), most of the information we need is implicitely present in actual config files. If we parse a random [side] section and generate a schema from that, it will be wrong, mainly incomplete, but still a relevant first step. At first sight, the parser I already have only needs few improvement and add_ons to deal as a good base. And the text generation would be quite mechanical, if not simple reformatting.

Now, instead of using data from real games, we could write (or edit existing ones) a typical config file, or rather a false one, according to such criteria:
* Insert all possible items in sections -- including ones that should be a choice (eg value/add/sub... inside [variable]).
* Append to optional or repeteable tag and attribute names a matching code ('?' & '+').
* Take the opportunity to add some information about attribute values, such a type or default.

From this, we would generate a much more accurate and complete semantic schema, what do you think? (See previous posts for details about semantic schemas.)
There would probably be remaining issues, but then we would have all the base material available, which is much nicer to start working. The main issue is indeed the question of choices:
* Either we find a trick to insert hints about that in the source doc, without complicating too much further parsing and processing.
* Or we do it manually by copy-pasting the relevant bits of the schema inside appropriate choice schemas.

Moreover, with a bit of more work (and imagination), we can really insert valuable meta-data in the pseudo source docs, rather to be used in the reference guide. I think eg for values at valid ranges, list of possible values, such stuff... The issue is that the parser must then parse a more more complex doc, ignore the additional stuff but store it in any possible way for further use at text generation phase.

-2- reference guide

We could certainly, using a similar process, generate draft reference templates, but certainly it would be better to start from reviewed, complete, correct semantic schemas.
The idea as I see it now is to produce standard (and well-thought) forms with the basic data already inserted. Then would remain the job of filling all appropriate informative fields, both "pedagogical" (thinking at the target audience) and technical. Things such the meaning of the attr/tag, purpose, context, actual use, example(s)...
Upon parsing the semantic schemas, that's precisely the first job of a semantic validator, indeed, and I already have a first draft version.

I imagine that if we provide the community with such accurate and pre-filled template,s many would feel like contributing to the effort.

[I can cope with the coding part, maybe with some help by people familiar both with python and text processing.
I can also take part to the first effort of writing pseudo-configs, but my knowledge of WMl is rather limited, this would be to deliver "experts" first versions to be reviewed...]

There are probably other problems I'm not aware of now; still, I'm keen to go for this adventure

Post by **solsword** » October 11th, 2009, 10:55 pm

...the problem that I see with such a 'canonical' file is that it requires additional maintenance overhead (i.e. someone has to run your parser every time there's an update, and then new templates have to be made, etc.), and it's still distant from the code (i.e. the people who will be updating it won't necessarily be the author(s) of the code change that necessitated the update).

What I'd like to see is something that parses code and comments and generates documentation, and then a requirement that when the code is changed, the coder(s) change the comments to match. I think that that's the best way to keep the documentation up-to-date and accurate.

/me starts working on a script to generate minimal documentation on WML functions from the code.

Post by AI » October 11th, 2009, 11:36 pm

'The code' as in the C++? All you can do there is get the ActionWML and most of its attributes.
Generating schemas from existing WML is a bad idea, as there as invalid WML in core and there probably still is.

On the other hand, if we do write nicely formatted schemata (you should really take a look at frogatto's version, it's missing some features, but it's a good place to start), that would be the perfect place to put extra information to automatically generate documentation from.

Post by **solsword** » October 12th, 2009, 1:14 am

Well... don't know if this will go anywhere or not, but here's a partial proof-of-concept:

Code: Select all

#!/usr/bin/env python

import re

fin = open("/home/pmawhorter/programming/wesnoth/wesnoth-trunk/src/game_events.cpp")
data = fin.read()
fin.close()

doc = ''

tag_func_re = re.compile("WML_HANDLER_FUNCTION\(([^\n,]+),[^\n]*\)\n\W*{(.*?)^}$", re.DOTALL|re.MULTILINE)
used_re = re.compile(r'cfg\["(.*?)"\]', re.MULTILINE)

wml_tags = tag_func_re.findall(data)

for tname, fbody in wml_tags:
  usedvars = used_re.findall(fbody)
  print '[' + tname + ']'
  for v in set(usedvars):
    print "   ", v
  print '[/' + tname + ']'

with output that looks like this:

Code: Select all

[foo]
    y
    x
    logger
    message
[/foo]
[lua]
[/lua]
[remove_shroud]
    clear_shroud
    ignore_passability
    animate
[/remove_shroud]
[unpetrify]
    blue
    repeat
    name
    recruit
    green
    type
    side
    red
[/unpetrify]
[delay]
    time
[/delay]
[scroll]
    y
    x
[/scroll]
[scroll_to]
    check_fogged
[/scroll_to]
[scroll_to_unit]
    variable
    turn
    amount
    side
    check_fogged
[/scroll_to_unit]
[modify_ai]
    user_team_name
    hidden
    name
    gold
    shroud_data
    add
    recruit
    value
    share_view
    current
    shroud
    controller
    fog
    variable
    village_gold
    income
    switch_ai
    team_name
    share_maps
    side
[/modify_ai]
[move_unit_fake]
    rand
    modulo
    divide
    to_variable
    image
    random
    team_name
    visible_in_fog
    sub
    ipart
    note
    add
    role
    defeat_string
    type
    round
    string_length
    format
    variation
    multiply
    fpart
    halo
    silent
    victory_string
    name
    gender
    value
    summary
    mode
    time
    y
    x
    side
[/move_unit_fake]
[sound_source]
[/sound_source]
[remove_sound_source]
    id
[/remove_sound_source]
[terrain]
    layer
    replace_if_failed
    terrain
[/terrain]
[terrain_mask]
    to_variable
    mask
    y
    x
    animate
    border
[/terrain_mask]
[recall]
    cannot_use_message
    description
    show
    image
    silent
    duration
    id
    name
[/recall]
[print]
    blue
    fire_event
    name
    variable
    text
    image
    description
    needs_select
    y
    green
    duration
    x
    animate
    id
    red
    size
[/print]
[store_unit_type]
    variable
    type
[/store_unit_type]
[store_unit]
    blue
    advance
    find_vacant
    name
    text
    kill
    mode
    variable
    green
    owner_side
    side
    red
[/store_unit]
[endlevel]
    next_scenario
    carryover_percentage
    end_text
    bonus
    carryover_add
    carryover_report
    music
    result
    linger_mode
    save
    end_text_duration
[/endlevel]
[redraw]
    side
[/redraw]
[animate_unit]
[/animate_unit]
[label]
[/label]
[heal_unit]
    animate
    amount
[/heal_unit]
[command]
[/command]
[allow_undo]
[/allow_undo]
[if]
[/if]
[while]
[/while]
[switch]
    variable
[/switch]
[message]
    sound
    map
    delayed_variable_substitution
    remove
    shrink
    speaker
    side_for
    message
    id
    expand
[/message]

(Yes I know that there are some errors in this output; the script does need to be more sophisticated)

A slightly more sophisticated script, run on all of the source .cpp files, ought to be able to produce a list of all possible attributes for each tag, along with the list of tag types, which is a starting place for documentation. The advantage of this over a separate schema is that it's never out-of-sync with the code. Of course, to get real documentation, you'd need some way of adding comments to the source that the script can extract that say something about the various variables (and perhaps indicate whether they're optional, etc.). I'm imagining something like the frogatto stuff that you posted, but included as comments in the source code, to be parsed out later.

I guess it's ultimately debatable whether the schema should be in a separate file (nice separation and clarity) or within the source code (don't have to duplicate structure, plus potentially less likely to be forgotten when the code is changed). I just wanted to point out that it wouldn't be that difficult to extract documentation from the source, given a little help in terms of well-formated comments. (Yes, I realize that there are issues that need to be dealt with: macros could cause problems, the regex that I'm using is probably missing some stuff, and tuning it to be 100% accurate would be some work; ultimately it's difficult, but not *that* difficult).

spir · Post by **spir** » October 12th, 2009, 11:31 am

Waow! I'm surprised this can simply be done. Let me check whether I really understand what you script does:
It looks for funcs which name starts with WML_HANDLER_FUNCTION, then reads & outputs the rest of the funcname and the list of args. Correct?
Then we have a list of tags (in the sense of types of sections), each with parameters. This means that there is a kind of layer on the c++ side between parsed WML and the engine. Now, I would enjoy knowing what this layer actually performs (have a look in the source, maybe); but I can imgine it may well be kind of similar to semantic validation... if ever the parser properly speaking only deals with grammatical structure.

Now, I haven't checked precisely, but how do you filter out parameters that are not WML attributes? For instance, IIRC [store_unit] only defines filter, variable, kill and a weird "mode"; while here there are numerous othe parameters.
Also, this does not seem to be able to inform about the semantic structure, I mean wich sub-sections go where. Even, I don't don't see any mention of sub-section at all, only attributes and additional params. Eg [message] has no [option].

Well, there may be ways I cannot figure out right now to enhance this first result.

Upon pseudo config files used to generate schemas and/or reference doc:

solsword wrote: ...the problem that I see with such a 'canonical' file is that it requires additional maintenance overhead

Well, sure! But the overhead is the same wherever and by whoever it is done (correct? my english is unstable). What do you think? To get the identical result in the final doc, we need the same raw material in source, be it C++ or WML. The only difference is that, according to your proposal, the dev team must do the job, while if we do it from WML the whole community of UMC designers can take part to the effort; not only this, but they may be more motivated because they are precisely the first ones who need it. I give you less than 1% CtH on this

(and much more CtbH, lol!)
What I imagine is (1) we will not put the same degree of requirements (about quantity and quality of metadata) on the back of devs (2) in the best case, we will have to rework all of the output... manually.
I will never spit on developpers, rather the contrary. It's simply not their job. I would rather let developpers do their job and simply inform us of changes. We can have a usual contact in the dev team to make things smoother. They *must* inform us anyway, they do already, otherwise how do we know about game & WML changes?

Still, I may be wrong. (Also, sorry for demotivating, I'd rather support your efforts; but in this case, I simply cannot for it would be false.)

Now, I imagine that this could be a relavant additional source of information, or a check list to avoid omitting attributes (if we a way to sort out false ones), or more. But I find irrealistic to ask for and rely on metadata for the source (coders usually don't do it for their pairs -- not even for themselves).

But your idea to insert metadata as comments is great! I thought at doing it as pseudo-values, but your way is much easier and cleaner.

AI wrote: (you should really take a look at frogatto's version, it's missing some features, but it's a good place to start)

I had a look, thank you very much! Well, that's what I imagine in a very simplified form. (Frogatto's WML is hundred times smaller than BfW's and seems to have a single level of complexity. Not to talk about choices, name conflicts & namespaces, separate files & macros, such niceties.)
But this can give a good idea of a kind of prototype result for the semantic schema, and also of "homomorphisms" between schema and user reference. That is what I have in mind.

Well, I'm gonna work on this on my side today and tell you about results, if any. In the best case, I may have a proof of concept similar to solsword's.

SP @ solsword: had a look at your website and find it really pleasant and interesting. I like your sculptures, esp. in copper.

Post by AI » October 12th, 2009, 2:22 pm

spir wrote:It looks for funcs which name starts with WML_HANDLER_FUNCTION, then reads & outputs the rest of the funcname and the list of args. Correct?
Then we have a list of tags (in the sense of types of sections), each with parameters. This means that there is a kind of layer on the c++ side between parsed WML and the engine. Now, I would enjoy knowing what this layer actually performs (have a look in the source, maybe); but I can imgine it may well be kind of similar to semantic validation... if ever the parser properly speaking only deals with grammatical structure.

WML_HANDLER_FUNCTION *only* works for ActionWML, and not even all of the attributes/elements involved.

There is not one place where WML is parsed. It is simply loaded into a large structure that is interpreted when needed. Figuring out WML specification from that is akin to solving the Halting Problem.

So, while that script work for the basic semantics of ActionWML, that's also the only thing that an reasonably be parsed.

The Battle for Wesnoth Forums

WML semantic structure -- parsing

WML semantic structure -- parsing

Re: WML semantic structure -- parsing

Re: WML semantic structure -- parsing

Re: WML semantic structure -- parsing

Re: WML semantic structure -- parsing

Re: WML semantic structure -- parsing

Re: WML semantic structure -- parsing

Re: WML semantic structure -- parsing

Re: WML semantic structure -- parsing

generating semantic schemas & reference guide templates

Re: WML semantic structure -- parsing

Re: WML semantic structure -- parsing

Re: WML semantic structure -- parsing

Re: WML semantic structure -- parsing

Re: WML semantic structure -- parsing