GSoC Ideas for Memory Management

Discussion among members of the development team.

Moderator: Forum Moderators

Post Reply
chewryan
Posts: 5
Joined: March 24th, 2009, 6:10 pm

GSoC Ideas for Memory Management

Post by chewryan »

Hey guys,

I decided to post this on the forums and get input from everyone.
I will be applying for GSoC under Wesnoth this year and would like to flesh out my ideas before submitting my proposal. :)
There are two issues I wish to discuss:

1) WML Memory Optimisation
After reading Dave's post about WML memory optimization, I feel there might be some concerns if we were to shift away from an std::map to a custom container class.
Granted that std::map might incur unnecessary overhead in the multitude of attributes that a WML document has to store, it provides a standard, well-known interface that any fairly seasoned developer can understand without too much hassle. This makes it easier to maintain especially for new developers to the project and deviating away from it could introduce additional complications to the usage of a custom container class.

Without a doubt, implementing a custom container class solution specifically for the problem would definitely cut down on the memory footprint but I feel there might be other avenues to which we can look at to solve the problem:

Change the std::map key type to a C-style string instead of using std::strings
My first instinct was to revert from using std::strings to const char * as key values.
Since the key values shouldn't change and should remain static, I don't see the benefit from using an std::string if string manipulation isn't performed on them.
The memory savings are quantifiable by X amount of bytes reduced for every attribute that exists in the WML document (I haven't had the time to profile this yet to get an exact amount).

Store map values as pointers (or smart pointers) to strings instead of string objects, allocate memory for string values in blocks to reduce fragmentation
I'm not entirely sure if WML documents suffer from heavy fragmentation but separating the memory allocations for string values in the map itself could allow us to better organise how memory is allocated to the WML document.

Use Google's sparse_hash_map
I came across this while looking for an efficient std::map replacement.
http://google-sparsehash.googlecode.com ... index.html
I'm not sure if anyone has contemplated using this before but I think it's worth a good look, particularly the sparse_hash_map.
From what I have seen so far, Google's version of maps and sets are very efficient for memory or speed constraints (sparse or dense), plus they mirror the classes in STL implementations.
This could be a very viable replacement but I would have to do more research into how it works and how much more efficient it is compared to a regular map.

2) Wesnoth Memory Management
I understand that there is an existing basic memory manager for Wesnoth but it is not in use at the moment. I thought it might be a good idea to look into implementing a custom memory manager for the game now, seeing as there is talk (and plans?) to port the game to smaller, memory-constricted platforms (iPhone anyone? :) ).
The memory manager could also provide valuable debugging and memory leak information to give a more detailed breakdown on memory hogs.

I have not done much research on the possibilities yet but I believe that we can take ideas from implementations used by game consoles where memory is budgeted, rare commodity. A good book that covers a bit of this is 3D Game Engine Design 2nd Edition by David H. Eberly.

EDIT:
I forgot to add that separate memory budgets can be allocated for different areas of the game, such as WML storage/parsing.

--------------------------------------------------

Feedback on these ideas is greatly appreciated! :)

Regards,
Ryan
silene
Posts: 1109
Joined: August 28th, 2004, 10:02 pm

Re: GSoC Ideas for Memory Management

Post by silene »

chewryan wrote:After reading Dave's post about WML memory optimization, I feel there might be some concerns if we were to shift away from an std::map to a custom container class.
Granted that std::map might incur unnecessary overhead in the multitude of attributes that a WML document has to store, it provides a standard, well-known interface that any fairly seasoned developer can understand without too much hassle. This makes it easier to maintain especially for new developers to the project and deviating away from it could introduce additional complications to the usage of a custom container class.
This is C++ we are talking about. Just because we are not using a std::map doesn't mean we have to change the interface. As a matter of fact, I have already cleaned the code so that other files can no longer notice whether the implementation uses std::map or not.
chewryan wrote:Change the std::map key type to a C-style string instead of using std::strings
My first instinct was to revert from using std::strings to const char * as key values.
Since the key values shouldn't change and should remain static, I don't see the benefit from using an std::string if string manipulation isn't performed on them.
It is possible to go a lot further. Sure you don't need std::string as the key, but you don't need pointers either. Just take an opaque type that performs hash-consing on the identifiers. It will be transparent from the users of attributes. It may be a bit slower at first (and I'm not even sure it will be). But once the engine has most of its keys statically hash-consed, the code will be much faster and with a lower memory footprint.
chewryan wrote:Store map values as pointers (or smart pointers) to strings instead of string objects, allocate memory for string values in blocks to reduce fragmentation
I'm not entirely sure if WML documents suffer from heavy fragmentation but separating the memory allocations for string values in the map itself could allow us to better organise how memory is allocated to the WML document.
Again, it is possible to do much better. Why bother with strings only? If the value is an integer, just store the integer, no need to do a memory allocation; same things if it is a boolean or a floating-point value. If it is a short string (e.g., a 7-char string), you don't need memory allocation either. Once all these allocations have been removed, the situation will look a lot more sane, and then we can think of how to store the remaining strings.
chewryan wrote:2) Wesnoth Memory Management
I understand that there is an existing basic memory manager for Wesnoth but it is not in use at the moment. I thought it might be a good idea to look into implementing a custom memory manager for the game now, seeing as there is talk (and plans?) to port the game to smaller, memory-constricted platforms (iPhone anyone? :) ).
The memory manager could also provide valuable debugging and memory leak information to give a more detailed breakdown on memory hogs.
Unless you think you can do better than valgrind, there is not much point in an allocator dedicated to debugging and detecting memory leaks. Note also that the default memory allocator is already optimized depending on the target architecture, so it will be hard to do better.
chewryan
Posts: 5
Joined: March 24th, 2009, 6:10 pm

Re: GSoC Ideas for Memory Management

Post by chewryan »

silene wrote:This is C++ we are talking about. Just because we are not using a std::map doesn't mean we have to change the interface. As a matter of fact, I have already cleaned the code so that other files can no longer notice whether the implementation uses std::map or not.
Point taken :)
silene wrote:It is possible to go a lot further. Sure you don't need std::string as the key, but you don't need pointers either. Just take an opaque type that performs hash-consing on the identifiers. It will be transparent from the users of attributes. It may be a bit slower at first (and I'm not even sure it will be). But once the engine has most of its keys statically hash-consed, the code will be much faster and with a lower memory footprint.
I had originally discussed on IRC about the idea of using a hashed integer instead of storing strings but someone mentioned that these keys might need to be saved as the string into WML documents like save and replay files. In this case, a reverse-hash method could be implemented for instances where we need to extract the hashed strings. But it depends on whether the key values really need to be stored as string types. If they do, then the idea of using Google's sparse_hash_map might be a better solution as it retains the key value in its original form.
silene wrote:Again, it is possible to do much better. Why bother with strings only? If the value is an integer, just store the integer, no need to do a memory allocation; same things if it is a boolean or a floating-point value. If it is a short string (e.g., a 7-char string), you don't need memory allocation either. Once all these allocations have been removed, the situation will look a lot more sane, and then we can think of how to store the remaining strings.
I'm guessing that a proxy class would be used to store these values into the map? But then that might bring about possible complications when type-casting these values back and forth. Or we I suppose we could maintain a separate map for each integral type since none of the data stored are complex objects.
silene wrote:Unless you think you can do better than valgrind, there is not much point in an allocator dedicated to debugging and detecting memory leaks. Note also that the default memory allocator is already optimized depending on the target architecture, so it will be hard to do better.
To my understanding, Valgrind only works for Linux and I believe it foolish to say that leak-free in Linux means leak free on all other platforms. The problem with standard malloc and free is that you really -don't know- what they're doing.
If you write your own memory manager, you can add debug code, generate statistics, and add advanced features that might not be found in the default memory manager — features like handles, heap compaction, and virtual memory.
This allows customisation of the memory manager for platforms where the default memory manager may not be so efficient for a game like Wesnoth.
Customising a memory manager is definitely not an easy task, but I believe that if done right, the advantages would prove to be beneficial to the project for the reasons above.
silene
Posts: 1109
Joined: August 28th, 2004, 10:02 pm

Re: GSoC Ideas for Memory Management

Post by silene »

chewryan wrote:I had originally discussed on IRC about the idea of using a hashed integer instead of storing strings but someone mentioned that these keys might need to be saved as the string into WML documents like save and replay files. In this case, a reverse-hash method could be implemented for instances where we need to extract the hashed strings. But it depends on whether the key values really need to be stored as string types. If they do, then the idea of using Google's sparse_hash_map might be a better solution as it retains the key value in its original form.
Yes, the original strings have to be kept around so that they can be serialized. But that's hardly a memory issue since they have to be stored anyway for catching hashing collisions (even if you rely on sparse_hash_map). So you need a reverse-hash method, but it can be made as trivial as you want; instead of a "random" hash number, just use something akin to the pointer to the string in the opaque type. The reason why a hash_map (be it Google's one or the C++ Standard's one) is a bad idea is that it doesn't allow you to precompute hashes for extra efficiency. In Wesnoth's code, almost all the keys are already known at program startup, there is no need to recompute a hash value whenever they are queried.
chewryan wrote:I'm guessing that a proxy class would be used to store these values into the map? But then that might bring about possible complications when type-casting these values back and forth. Or we I suppose we could maintain a separate map for each integral type since none of the data stored are complex objects.
There is already a proxy class for storing values into the map ready to be committed (patch #1139), so it wouldn't be much of change. I'm not sure why you mention about type-casting, since users of attributes should never have to perform any typecast. If a user asks for a boolean, it should get a boolean; if the attribute does not contain a boolean or something that looks like a boolean, then either a default value or an exception should be returned, depending on how the user asked for it. Also, why would you want several maps? As far as I can tell, it would just increase the memory footprint without bringing any advantage.
chewryan wrote:To my understanding, Valgrind only works for Linux and I believe it foolish to say that leak-free in Linux means leak free on all other platforms.
Come on... And you think that implementing a brand new allocator for a hypothetical memory bug that would only occur on a specific platform is not foolish? Unless someone can show me that such a bug exists (or has existed) and it cannot be caught by the debug allocators that already exist on this specific platform, I will consider that this part of the project is just a waste of time and, more importantly, a way to add some vicious new bugs to the code.
chewryan
Posts: 5
Joined: March 24th, 2009, 6:10 pm

Re: GSoC Ideas for Memory Management

Post by chewryan »

silene wrote:just use something akin to the pointer to the string in the opaque type.
I'm not entirely sure what you mean by this... Could you explain a little further please?
silene wrote:There is already a proxy class for storing values into the map ready to be committed (patch #1139), so it wouldn't be much of change. I'm not sure why you mention about type-casting, since users of attributes should never have to perform any typecast. If a user asks for a boolean, it should get a boolean; if the attribute does not contain a boolean or something that looks like a boolean, then either a default value or an exception should be returned, depending on how the user asked for it. Also, why would you want several maps? As far as I can tell, it would just increase the memory footprint without bringing any advantage.
Sorry, I wasn't thinking about exceptions heh.
silene wrote:Come on... And you think that implementing a brand new allocator for a hypothetical memory bug that would only occur on a specific platform is not foolish? Unless someone can show me that such a bug exists (or has existed) and it cannot be caught by the debug allocators that already exist on this specific platform, I will consider that this part of the project is just a waste of time and, more importantly, a way to add some vicious new bugs to the code.
In that case, what about looking at existing, proven memory managers, such as the Fluid Studios Memory Manager (http://www.paulnettle.com/pub/FluidStud ... anager.zip)? This would certainly reduce the effort required to implement a full memory manager and not introduce vicious new bugs :)
silene
Posts: 1109
Joined: August 28th, 2004, 10:02 pm

Re: GSoC Ideas for Memory Management

Post by silene »

chewryan wrote:
silene wrote:just use something akin to the pointer to the string in the opaque type.
I'm not entirely sure what you mean by this... Could you explain a little further please?
Something like the following (untested, unoptimized) code:

Code: Select all

struct hashed_key {
  const std::string *key;
  hashed_key(const std::string &s) {
    static std::tr1::unordered_set<std::string> cache;
    key = &*cache.insert(s).first;
  }
};
Two different hashed_key objects initialized with equal strings contain the exact same pointer. If you want the actual key string, just follow the pointer.
chewryan
Posts: 5
Joined: March 24th, 2009, 6:10 pm

Re: GSoC Ideas for Memory Management

Post by chewryan »

Okay I understand what you mean now, thanks :)

Now, I realise that hash_map (and Google's version) supports the ability to specify the hashing and hash comparison functions to be used. In that case, it should be trivial to implement a simple hashing method, like the one you described, with caching capabilities to curb performance issues when querying the map.

On a separate note, I think that switching straight from a std::string to a C-style string would immediately cut down on two integers required by std::string to store the size and capacity. As far as I can tell, I don't see any drawbacks to the switch at the moment.
Hashing is a good idea but I think it requires more testing to figure out an optimal solution.
silene
Posts: 1109
Joined: August 28th, 2004, 10:02 pm

Re: GSoC Ideas for Memory Management

Post by silene »

chewryan wrote:Now, I realise that hash_map (and Google's version) supports the ability to specify the hashing and hash comparison functions to be used. In that case, it should be trivial to implement a simple hashing method, like the one you described, with caching capabilities to curb performance issues when querying the map.
I'm not sure, but I think you missed the point. I will try to be clearer.

The hashed_key structure cannot be used as a hash key, since it doesn't have the properties an unordered container relies on (in particular, uniform distribution of the hashes). So you would have to hash it again, which is overkill. Moreover, hash_map have a non-negligible overhead, since you want to have as many buckets as there are items, for efficiency reasons. The thing is, you no longer need any hash_map at this point, you can directly use a simpler container now that handling the key no longer induces a memory allocation. For instance, the attributes of the config class would become

Code: Select all

class config {
  std::map<hashed_key, t_string> values;
  ...
};
In the code above, I'm using std::map for illustrative purpose, but obviously one would want to use a more compact structure there, e.g., a dumb ordered vector.
chewryan
Posts: 5
Joined: March 24th, 2009, 6:10 pm

Re: GSoC Ideas for Memory Management

Post by chewryan »

Ah okay I get it now.
So whenever a hash_key is requested, it would search the cache for an existing string, otherwise it would be inserted.

I'll work on an implementation that would be as memory efficient as possible for storing the string cache.

Thanks Silene :)
Post Reply