A Flawless[1] Fix for the current broken preprocessor

Invisible Philosopher · July 12th, 2005, 4:07 pm

The current preprocessor is very unreliable. With the previous, old one, at least I knew about what it could do, though it was not very flexible. With the current one, there are many situations in which it fails for no reason I can figure out (My preprocessor implementation finds no problem there). Here is a list of some situations where I know it may fail:

Things in comments (when they contain macros?)
macros within macro names, textdomain names, and file inclusions
treating macro arguments inconsistently to ordinary macros -- for example they require parentheses if passed to another macro
#textdomain appears to be completely ignored inside macros
Using Elvish Pillager's pseudorandom seemed to give errors with the current preprocessor (but mine found no errors, and I do not believe there were any, based on how the preprocessor should obviously work).

Also, though these are not precisely bugs,

not allowing the definition of definitions
really textually long inclusion chains in error messages, which are hard to follow
a lot of the time, giving no error message for errors, or if I'm lucky I'll find an obscure hint to what's wrong somewhere in the flood of messages generated by --log-info.

My preprocessor implementation has months of thought behind it and solves all the problems listed above. It has found real bugs in both my campaigns, Wesvoid and the Campaign Template, one of which would have taken me much longer to find otherwise, and the other of which I would likely never have found at all, leaving a subtle and mysterious bug forever in Wesvoid. That was where I included {./Ritual/mage_event_ai} where I should have included {./Ritual/mage_event_ai.cfg}. The code only tried to manipulate the AI, and had no directly observable effect. The Campaign Template is currently quite inconvenient, forcing the user to change the campaign's name in many places, but my preprocessor allows the user of the Campaign Template to only change two things: the name of the directory, and the thing that Campaign_Name is defined to.

My implementation is also well-optimized (at least compared to how it was before I implemented some optimizations!), preprocessing the basic data/game.cfg and what it includes fairly quickly:
Optimization: GCC -O2
Computer: 350 MHz
Time: 2-3 seconds.
I suspect much of the time is taken going through the very large number of unit files. When preprocessing a huge file that doesn't contain stuff the preprocessor cares about other than quotes, it takes next to no time. It takes my computer about 1 second to additionally preprocess all the Wesvoid campaign stuff (inside the #ifdef) (which uses the preprocessor extensively). Merely preprocessing a campaign.cfg file (where the campaign is not defined) takes very little time.

As an extensive user of the Wesnoth preprocessor, I strongly wish that my preprocessor implementation be accepted into Wesnoth, because I know it works (and if it just so happens to contain a bug, I will certainly try to fix it!). I have put my considerable C++ design skills into this. There are a few less than fortunate aspects, though:

I use boost::shared_ptr extensively, although I have taken care to avoid using any other Boost libraries. The boost::shared_ptr implementation is header-only, so it should not be difficult to start using.
My implementation contains a considerably larger amount of code than either of the previous ones. This is, I believe, an artifact of the good C++ design that must be put into what is essentially a programming language implementation (the Wesnoth preprocessor is (an important part of) the programming language of Wesnoth). C++ is a wonderful language, but it does not lend itself well to designs that are both incredibly concise and not being hacky to match. I had tried to implement in one source-file a preprocessor implementation with features like the one I offer now and eventually found it too ill-designed to work with. The lessons I learned from that failed attempt were applied to this one, making it much more flexible (including much more possible to optimize without compromising the design).

I have also found a few deficiencies in other parts of the Wesnoth code that interfere with this implementation:

Wesnoth's exception hierarchy is not organized as one in a good C++ program should be. In particular, I have inherited a ODR[2]-violating definition of config::error from the previous preprocessor implementation. IMO, something should be done to make the declaration of a config exception not so costly to include, it should follow the standard exception hierarchy like any exception that has a message, and the derivation should go this way:
std::exception > std::runtime_error > config_error /*whatever it gets called*/ > preprocess::error > preprocess::partial_error. My current workaround throws an exception derived from both config::error and preprocess::partial_error (std::runtime_error > preprocess::error > preprocess::partial_error) to allow the catching of either one when there is an error in the input.
Exception safety: the usage of scoped_istream does not make returning a raw pointer safe; std::auto_ptr<std::istream> is the correct return type for preprocess_file(). Here is the relevant part of a CVS log message explaining the introduction of scoped_istream:
"Also introduce a scoped_istream to prevent leaking descriptors in case of exceptions." (silene)
Did you think of std::auto_ptr? It is designed for returning dynamically allocated objects that the caller destroys, among other things. I consider a function that new's or deletes parameters or return values unevenly without very clearly saying so a memory leak: if not now, then sometime later. For example the old preprocessor code was NOT exception-safe either -- if an exception was thrown from the middle of it (and that's always a possibility in C++), the std::stringstream would leak, having been new'd and not deleted. That is also why to use std::auto_ptr for such things. Is there some reason I am not aware of that Wesnoth does not use it, or is it just an instance of someone not realizing the possibility? My implementation uses std::auto_ptr<std::istream> and my patch modifies filesystem.hpp replacing class scoped_istream with std::auto_ptr<std::istream>, and the few other things that go along with it. Testing shows that Wesnoth works just fine with that change. I would suggest to actually textually replace all instances of scoped_* with std::auto_ptr<std::*> if my change is accepted.
(Although this is not related to the particular preprocessor implementation,) unless it has been fixed already, the parser should ignore the newline that terminates a \376textdomain that is sent to it. I am willing to attempt to implement that if no-one else is motivated to do so. Although a better solution is to also end \376textdomain and \376line with a \376 instead of a newline, because (1) using \376 like this already relies on it not appearing in the input, and (2) it is possible for a macro or filename to contain a newline, and that would break the current scheme. There are WML-preprocessor techniques that I wish to use that may involve having a newline in a macro name.
directory_name() in Wesnoth returns the name with a / on the end. This is not the behavior I believe it should have, it caused a problem in my preprocessor implementation, and I do not believe changing it would break any non-preprocessor piece of Wesnoth because none of them even call the function! I use a mostly-copied function preprocessor::directory_name__fixed() currently. I would prefer to modify Wesnoth's directory_name().
get_user_data_dir() takes a quite significant amount of time unnecessarily. I cache its return value since I need to use it so often, but get_user_data_dir() should be the one doing that -- after all, the location of one user's data directory can't change! And neither can you correctly think that it provides safety to create the directories if necessary every single time it is called, because not only is it unnecessary to create them at all in most cases, there is also the possibility of the directories being removed by someone else(some other process) before the return value is used.
There are some inclusions of nonexistent files and directories in data/game.cfg and data/units.cfg. My patch removes those errors, because my preprocessor has the sense to recognize them as such.
I may not have gotten the copyright attributions at the beginning of the C++ files quite right, so feel free to tell me what I did wrong, if anything.

This is how I am providing my code:

A cvs diff for all the little modifications to Wesnoth code needed for this to work.
A zip of my code plus some Wesnoth code that this depends on, slightly modified to not require SDL or gettext, and a main.cpp, which compiles to a command-line executable useful for testing and trying the preprocessor (since I currently use OSX, it contains the XCode project I used to compile it). To add to Wesnoth, replace its serialization/preprocessor.*pp with mine, and add my serialization/preprocessor directory to Wesnoth's serialization directory.
A link to get boost::shared_ptr from.

[1] only almost flawless, obviously

[2] One Definition Rule

silene · Post by **silene** » July 12th, 2005, 7:49 pm

First, I just want to make sure you are aware that Wesnoth has been in code freeze for 5 weeks now in order to release 1.0, and that a 150KB preprocessor (hell, it's 6 times bigger than the current one) has no way to ever be accepted at this stage.

Second, as we are currently preparing a stable release, any bug you know about is worth reporting. So far, the only bug I know about concerning the current preprocessor is the problem when there was no character after a #endif at the end of a file (because the STL was not setting the eof flag in this case), and ott committed a patch to fix it. So, as far as I know, the current preprocessor is bug-free, and nothing in Savannah indicates me it could not be the case. Please be a bit more precise than "may fail", "appears", "seemed".

Invisible Philosopher wrote:treating macro arguments inconsistently to ordinary macros -- for example they require parentheses if passed to another macro

Concerning this one, I would just like to mention that your example is nothing new. It was already the same behavior with the old preprocessor. So it doesn't really qualify as a bug.

Invisible Philosopher wrote:Did you think of std::auto_ptr?

Yes I did. But auto_ptr didn't allow me to test stream flags on construction and destruction, this is why I didn't use it.

Invisible Philosopher wrote:the parser should ignore the newline that terminates a \376textdomain that is sent to it

Okay, I see what you mean. The tokenizer modifies the line number at the end of a \376textdomain directive, when it parses it right in the middle of a WML concatenated string. I'm committing a one-line fix.

Invisible Philosopher · July 13th, 2005, 9:40 am

Oh well, I figured out how to use it anyway. Unfortunately I have no real data about when the current preprocessor fails.