compressed saves, idle musings

General feedback and discussion of the game.

Moderator: Forum Moderators

Post Reply
User avatar
iceiceice
Posts: 1056
Joined: August 23rd, 2013, 2:10 am

compressed saves, idle musings

Post by iceiceice »

Dugi wrote:
artisticdude wrote: One thing that does strike me is that organizing WML into multiple files without any real reason to do so would mean the game has to read more files from disk, and disk reading from multiple separate files can be a very expensive operation in terms of performance. But then, I have a limited (and possibly outdated now) understanding of how Wesnoth handles these operations, so I can't comment on the matter with any real authority.
This is perfectly right. Accessing many files was always a problem, because it requires to ask the OS to give it access to the file, the OS reads if the program can have access to that file, then the file is added to the program as file stream and can be opened. The number of files Wesnoth loads is probably the main reason why it takes so much time to load (I have seen an XML parser that would be able to parse all wesnoth code in less than a second if it was in XML and it could read t quickly enough, so the time the parsing takes isn't the reason probably, and normal disks can be read with a speed of about 40 megabytes per second). Most modern games solve this by creating archives where all small files are stored (and archives like .zip are read and decompressed way faster than many small files).
Reading Dugi's remarks here reminded me of another issue I always hoped to ask a dev or other knowledgeable person about.

When I first started playing wesnoth I didn't know anything about wml or the save game format, so I decided to turn off compressed saves since I had a lot of hard drive space and I thought it might improve performance slightly. After 6 months, and fooling around with SXRPG with autosaves on, I discovered I had betwen 6 and 8 gigabytes of wesnoth save games on my disk (!!!). More than all other movies and media combined. I knew that I had made a big mistake then and put compressed saves on -- it completely blew my mind that that I had this much save data.

After this I learned a bit about the save game format and all that. Obv it is very verbose and it seemed odd at first that wesnoth stores all of the info like the unit stats and such in the save game, but I have come to see that this makes a lot of sense, store everything you might need in the save file in the most convenient format and just rely on gzip / bzip to make it small.

However as I understand what gzip and bzip do, it seems that they can only take advantage of redundancy within a file. So if there is a long sequence that occurs once in every single file, for instance, the [unit] tag defining the elf fighter, then this is basically uncompressible with the current system, and should occur with only minimal compression in every campaign save file. And so on for all similar info.

A different strategy would be to put all the files in a tarball or something like this, then you would only need these definitions once; like, most likely the entire "default era" would get compressed down to a few symbols in every file. I don't know how easy it is to grab one file out of a large tarball though, this might seriously hurt load run times? Maybe a better strategy would be that if you are playing a scenario, all the autosaves go together in a tarball specific to that scenario, but otherwise the scenarios are separate? For short scenarios this might not make a big difference but if you have like a 100 turn scenario it might make a big difference. Another idea that just randomly occurred to me is that you could store diffs of files like git does instead of separate copies of every autosave -- it seems like that is doomed to scale up badly, like n^2 for a n turn scenario. I have a feeling that this would potentially make a big difference for scenarios like SXRPG.

This is all pretty obvious and I'm sure I haven't thought it through completely, so I was just curious what the other reasons for choosing the current system is.

Edit: Also does this claim that loading many files is less efficient for the OS than loading one big one factor into this.
User avatar
Pentarctagon
Project Manager
Posts: 5564
Joined: March 22nd, 2009, 10:50 pm
Location: Earth (occasionally)

Re: compressed saves, idle musings

Post by Pentarctagon »

iceiceice wrote:Edit: Also does this claim that loading many files is less efficient for the OS than loading one big one factor into this.
Loading many small files would be slower than one large file because it would require it to access the disk multiple times to fetch multiple files. The slowest part of a computer is the disk, so reducing N disk accesses to 1 and then letting the CPU parse through the data could end up being faster depending on a variety of factors (CPU speed, HD/SSD speed, compression used, etc). I believe Wesnoth's cache does this - extracting an 11kb cache file resulted in a 145kb text file with all the code of the Colosseum add-on concatenated together.
99 little bugs in the code, 99 little bugs
take one down, patch it around
-2,147,483,648 little bugs in the code
User avatar
Dugi
Posts: 4961
Joined: July 22nd, 2010, 10:29 am
Location: Carpathian Mountains
Contact:

Re: compressed saves, idle musings

Post by Dugi »

iceiceice wrote:However as I understand what gzip and bzip do, it seems that they can only take advantage of redundancy within a file. So if there is a long sequence that occurs once in every single file, for instance, the [unit] tag defining the elf fighter, then this is basically uncompressible with the current system, and should occur with only minimal compression in every campaign save file. And so on for all similar info.
This is not quite right. My campaign once had a bloat in save files size related to walking corpses on the recall list. It was long, plague was quite widely available, nobody ever called to recall the unused walking corpses, so they were piling on the recall list, and by the end more than a half of the save file was long list of variations of walking corpses (walking corpse and soulless have the longest unit codes because of the variations). So, I decided to add a victory event that kills all walking corpses with 0 exp. Size of uncompressed save files more than halved, size of compressed save files decreased quite significantly (but wasn't halved).
Macro bloat is also a massive expander for the save file size, and it is caused by repeating the same thing over and over. Replacing the popular repetitive macros by fire_event usage or variables can cut 2 megabytes large save files (when compressed) to 200 kilobytes large save files, or even less.
iceiceice wrote:Another idea that just randomly occurred to me is that you could store diffs of files like git does instead of separate copies of every autosave -- it seems like that is doomed to scale up badly, like n^2 for a n turn scenario. I have a feeling that this would potentially make a big difference for scenarios like SXRPG.
The problem with this is that you would have to keep all the autosaves or all the following ones will become useless. The autosaves are usually deleted when no longer needed. The main source of save file size is, according to my observation, replay history and positions of units, and that does not depend on add-on type much, just on the sheer length of the scenario (or better number of actions that occurred there there). Because you have only one unit in SXRPG, the save files shouldn't be too large. If they are, there is probably a macro bloat and therefore it's a fault in the design of SXRPG and not wesnoth.
Spoiler:
User avatar
Iris
Site Administrator
Posts: 6798
Joined: November 14th, 2006, 5:54 pm
Location: Chile
Contact:

Re: compressed saves, idle musings

Post by Iris »

I’ll preface this post with the disclaimer that I do not understand compression algorithms in as much depth as some other people do. I was not around when saved games were first implemented either (I’m guessing that was approximately 10 years ago), not that it’s too relevant given how much Wesnoth has changed and how many developers have modified or replaced Dave’s code since then. For starters, Wesnoth didn’t even support user-made campaigns in the beginning.
iceiceice wrote:However as I understand what gzip and bzip do, it seems that they can only take advantage of redundancy within a file. So if there is a long sequence that occurs once in every single file, for instance, the [unit] tag defining the elf fighter, then this is basically uncompressible with the current system, and should occur with only minimal compression in every campaign save file. And so on for all similar info.
I just took an Elvish Fighter [unit] node and its subnodes from a saved game and stored it in a separate file. The result’s size is 6177 bytes (6.03 KiB) in 290 lines. Operating under the assumption that it’s its only appearance in the saved game (which is 124 KiB compressed with gzip, 1.26 MiB otherwise), it’s hardly a significant portion of the saved game’s memory footprint.

But this isn’t really that relevant considering that neither the gzip nor bzip2 algorithms specifically know about WML. It’s all bytes to them and it’s extremely likely that a varying non-singular number of WML subtrees can be compressed within the same block for every save.
iceiceice wrote:A different strategy would be to put all the files in a tarball or something like this, then you would only need these definitions once; like, most likely the entire "default era" would get compressed down to a few symbols in every file.
Only unit types that are present in recall lists, the game map, or previous replay steps (or WML variables, of course) are recorded in saved games. Unless your scenario is some kind of unit zoo, you generally won’t have all unit type definitions lingering around in [unit] form in saved games.

A possible reason for the current “store everything” approach that’s not obvious at first glance is that unit type-derived attributes may get changed by traits, objects, and advancements when a unit is created or manipulated later in the game. There is some data (image paths at least) that appears to be ignored in favor of the unit type or trait/object/advancement (in [unit]) definitions, but WML can directly modify other attributes (statuses, description, name, stats, etc.) and this will work as intended by the scenario coder.
iceiceice wrote:I don't know how easy it is to grab one file out of a large tarball though, this might seriously hurt load run times? Maybe a better strategy would be that if you are playing a scenario, all the autosaves go together in a tarball specific to that scenario, but otherwise the scenarios are separate? For short scenarios this might not make a big difference but if you have like a 100 turn scenario it might make a big difference. Another idea that just randomly occurred to me is that you could store diffs of files like git does instead of separate copies of every autosave -- it seems like that is doomed to scale up badly, like n^2 for a n turn scenario. I have a feeling that this would potentially make a big difference for scenarios like SXRPG.
With the default configuration, autosaves for a scenario can and will get pruned after a few turns. Given {S1, S2, S3, S4, S5} autosaves under an autosave limit of 5, once the player starts turn 6, if S1 is the “master” autosave with the full information, the game will need to merge S2 and S1 together before deleting S1. So that’s two additional file open operations for writing S6, and the differencing operation may be considerably more computing-intensive than just dumping the complete WML tree every time. And then an outside agent (such as the user) may delete any of the intermediate autosaves or even the first one (which would instantly break all its successors).

For long games, I’d really expect the replay portion of every autosave to have the largest disk footprint. Although it should be noted in general that Wesnoth isn’t really designed to cater to large-scale gameplay (over 100 turns, over 500 on-map units, over 100x100 hexes, etc.), even though it is supported to some ill-defined extent.
Author of the unofficial UtBS sequels Invasion from the Unknown and After the Storm.
User avatar
iceiceice
Posts: 1056
Joined: August 23rd, 2013, 2:10 am

Re: compressed saves, idle musings

Post by iceiceice »

Thanks for the responses.

I read a little bit more about compression in the days after I wrote this post. The term solid compression is I guess the term for the various kinds of savings that I was proposing might exist from archiving many save games together.

Most likely my question of how it was decided what strategy to use is silly. Probably somebody just tried it and compared the results.

I decided just now to do such a test. I have about ~100 assorted v1.10 replays from off the replay servers from when I was testing my replay description script, so I gathered them and put them in a .tar.gz archive instead.

Uncompressed: 144 files, 89.7 Mb
Individual .gz files: 7.1 Mb
One .tar.gz: 7.1 Mb

So it looks like shadowm's intuition, that the [replay] section of the files would be the bulk of the footprint, and that this is unlikely to be similar to [replay] sections of other files, is dead on.
Dugi wrote:This is not quite right. My campaign once had a bloat in save files size related to walking corpses on the recall list. It was long, plague was quite widely available, nobody ever called to recall the unused walking corpses, so they were piling on the recall list, and by the end more than a half of the save file was long list of variations of walking corpses (walking corpse and soulless have the longest unit codes because of the variations). So, I decided to add a victory event that kills all walking corpses with 0 exp. Size of uncompressed save files more than halved, size of compressed save files decreased quite significantly (but wasn't halved).
Macro bloat is also a massive expander for the save file size, and it is caused by repeating the same thing over and over. Replacing the popular repetitive macros by fire_event usage or variables can cut 2 megabytes large save files (when compressed) to 200 kilobytes large save files, or even less.
Yeah so I guess that's not too surprising. Compression algorithms can only be so good, they will notice "redundancy" as in words or phrases appearing over and over again but they won't be able to do some kind of symbolic analysis to notice patterns when the results of two different uses of a particular macro get expanded with different arguments.
Dugi wrote:The problem with this is that you would have to keep all the autosaves or all the following ones will become useless. The autosaves are usually deleted when no longer needed. The main source of save file size is, according to my observation, replay history and positions of units, and that does not depend on add-on type much, just on the sheer length of the scenario (or better number of actions that occurred there there). Because you have only one unit in SXRPG, the save files shouldn't be too large. If they are, there is probably a macro bloat and therefore it's a fault in the design of SXRPG and not wesnoth.
I imagine SXRPG replays are particularly large because there are usually a few thousand units in them and the games go for usually about a hundred turns. Also the stats of the units get quite large, you often have berzerkers with 20 strikes, so the replay will contain many more coin flips than a usual wesnoth replay. But I haven't examined them carefully.
Dugi wrote:
iceiceice wrote: I discovered I had betwen 6 and 8 gigabytes of wesnoth save games on my disk (!!!). More than all other movies and media combined.
What? I have 20 gigabytes of music on my laptop, two weeks after accidentally formatting my disk
Hehe, I have an external hard drive with probably about that much music from when I was in college, but I don't download nearly so much as I used to, and I'm more likely to listen to a youtube stream or pandora nowadays I guess.
roidanton
Posts: 90
Joined: September 7th, 2012, 10:41 pm

Re: compressed saves, idle musings

Post by roidanton »

Modern compression algorithms can do so much more than simple redundancy removal.

For instance, one very simple way of compressing a file that I still remember from some early university classes is using a dynamic-length encoding. Today's operating systems store text files in a format that takes one byte per character. Each byte can hold 256 different values - but in simply English text, we only actually use about ~75-90 different characters. There are which are used very frequently, for instance vowels and others which are not used so often, like some consonants, numbers or special characters. And we only use upper-case letters at the beginning of a sentence and for names.

One very simple compression method is to use 6 bits for text, use values 0-62 for the most frequently used characters and 63 as a special meaning for "read the next 8 bits". For instance, the letters a and e would become "000000" and "000001", and things like the upper-case Q would be "111111 xxxxxxxx", thus consuming 14 bits. This would already give you some ~10%-15% compression. This can be improved significantly by using even shorter bit-sequences, computing the correct length dynamically based on your input data. Modern compression algorithms can do much more than that, so you always get quite a decent compression for human-readable files.
roidanton
Posts: 90
Joined: September 7th, 2012, 10:41 pm

Re: compressed saves, idle musings

Post by roidanton »

Speaking of auto-saves, I would like to make two suggestions.

First of all, please provide a way of choosing the name for the saves and to choose a different folder. The default file name should also be changed to something that either includes the MP game title / MP game host, the current date or anything like that.

At the moment, it uses the campaign / scenario name as file name, for instance "SXRPG_CastleOfCantar-Auto-Save118". The problem with this is that it overrides your previous auto-saves when you play the same game multiple times. That's very annoying, especially when you attempt to play the same game multiple times for a few turns, trying different strategies each time.

It's even more annoying that when you join someone's MP game, it will override our own previous auto-saves from that same scenario.

For instance, you could add some "use this location for all saves from this scenario" option to the "Save game" dialog and allow selecting a folder in addition to just a file name.

My second suggestion is to provide an option to save "everything from this campaign / MP game" in a single file. Not so much for performance reasons, but to make it easier to archive and organize things. Maybe also put replays into a different folder by default.

The "Load game" dialog should also group related things better. At least provide an option to only show replays and manual saves, not auto-saves. At the moment, this dialog provides me with a list of ~100 saves from Cantar, followed by another ~100 saves of Temple of Bones, then another ~100 saves of Wizard of War, etc. And the replay that I was looking for was somewhere in this big mess.
User avatar
GunChleoc
Translator
Posts: 506
Joined: September 28th, 2012, 7:35 am
Contact:

Re: compressed saves, idle musings

Post by GunChleoc »

roidanton wrote:For instance, one very simple way of compressing a file that I still remember from some early university classes is using a dynamic-length encoding. Today's operating systems store text files in a format that takes one byte per character. Each byte can hold 256 different values - but in simply English text, we only actually use about ~75-90 different characters. There are which are used very frequently, for instance vowels and others which are not used so often, like some consonants, numbers or special characters. And we only use upper-case letters at the beginning of a sentence and for names.
Do we need to watch out for non-English unit names here? Because that could get into multibyte character range pretty fast.
User avatar
iceiceice
Posts: 1056
Joined: August 23rd, 2013, 2:10 am

Re: compressed saves, idle musings

Post by iceiceice »

roidanton wrote:Modern compression algorithms can do so much more than simple redundancy removal.
It sounds like you are referring to Huffman coding. It is one of the most established compression techniques, used in winzip, gzip, etc. You are right that it is not "simple" depending on your point of view, but it is so well known that this is what I refer to as "simple redundancy removal". I guess I don't know what "simple redundancy removal" means to you. Maybe to you it means this. Actually that viewpoint makes a lot of sense, I guess I was unclear before.

An example of a kind of compression that does not fall in this category:
Every WML tag "[tag]" needs to be closed later by a matching "[/tag]". So WML could be compressed by changing all "[/tag]" to "[/]"; the tag which is being closed is always obvious from context, in syntactically correct WML. If you use something like LZ77 (used in gzip), which reads through the document and inserts "notes" that some portion of text matches some earlier portion of text, then you will get some compression in these closing tags anyway, but not as much.

There is no good way in general to discover such rules in an arbitrary document which happens to obey some rules like this. You would have to do some kind of exhaustive symbolic analysis.
GunChleoc wrote: Do we need to watch out for non-English unit names here? Because that could get into multibyte character range pretty fast.
Not to worry, we don't really need to think about this because gzip already does it for us.
Post Reply