Proposal: Code and Data Tarball Split

Discussion of all aspects of the game engine, including development of new and existing features.

Moderators: Forum Moderators, Developers

warren
Posts: 12
Joined: April 19th, 2008, 10:24 pm

Proposal: Code and Data Tarball Split

Post by warren » April 19th, 2008, 11:40 pm

Hi folks,

We ship wesnoth in Fedora repositories for four architectures (i386, x86_64, ppc, ppc64). We currently ship wesnoth in three current Fedora distributions (F-7, F-8 and F-9) and two EPEL distributions (EPEL-4 and EPEL-5). Each uses ~722MB of mirror space for wesnoth.
wesnoth*.i386.rpm 145MB
wesnoth*.x86_64.rpm 145MB
wesnoth*.ppc.rpm 145MB
wesnoth*.ppc64.rpm 145MB
wesnoth*.src.rpm 142MB

~722MB * 5 distributions = ~3.6GB of mirror space
I am hoping the wesnoth project is willing to consider a small change to how the source tarballs are released in order to enable more efficient distribution for binary distributors like us.

When wesnoth releases new tarballs, split the code and data into two separate tarballs. Both wesnoth and wesnoth-gamedata (or wesnoth-data or whatever name you want to call it) have their own Makefiles. make install can be used to install their payload into the appropriate location. You could require that wesnoth install happens before wesnoth-gamedata or vice versa (whatever you choose). Example:
wesnoth-1.4.2.tar.bz2
wesnoth*.src.rpm
wesnoth*.i386.rpm
wesnoth*.x86_64.rpm
wesnoth*.ppc.rpm
wesnoth*.ppc64.rpm
These RPMS are very small because they contain only code.

wesnoth-gamedata-1.4.2.tar.bz2
wesnoth-gamedata-1.4.2.src.rpm
wesnoth-gamedata-1.4.2-1.noarch.rpm
This RPM is very big. It contains the architecture independent game data that can be used by any of the above binary RPMS. Since this is identical between the four architectures and five distributions, we can hardlink it in the mirror tree. This change would enable us to save over 3.2GB of space on hundreds of our public mirrors. Much less data to push to mirrors means we can more easily push updated versions of wesnoth to the users.
Doesn't this introduce risk of the user's wesnoth version mismatching their wesnoth-gamedata version?
No. Binary packages of wesnoth can require a matching version of wesnoth-gamedata.

What if another binary distributor doesn't want to split the binaries and data?
Nobody is forced to change their binary distribution if they don't want to. They can build both source tarballs into their existing RPM/DEB/EXE binary packages and not change anything about how they distribute Wesnoth.

Thoughts?

tsr
Posts: 790
Joined: May 24th, 2006, 1:05 pm

Re: Proposal: Code and Data Tarball Split

Post by tsr » April 20th, 2008, 6:47 am

Sounds reasonable, this would also make it easier to release 'patches' in between minor versions. (Since you could ship a tar-ball for the executables and one for the gamedata - where the second just contains the changes).

I don't know if the current code-structure is up for it, but I'm sure the devs are happy to recieve a patch :)

/tsr

User avatar
ivanovic
Lord of Translations
Posts: 1146
Joined: September 28th, 2004, 10:10 pm
Location: Germany

Re: Proposal: Code and Data Tarball Split

Post by ivanovic » April 20th, 2008, 8:26 am

wtogami wrote: What if another binary distributor doesn't want to split the binaries and data?
Nobody is forced to change their binary distribution if they don't want to. They can build both source tarballs into their existing RPM/DEB/EXE binary packages and not change anything about how they distribute Wesnoth.
Strange, the Debian packager already manages to offer split build. There is no need for you to explicitly follow the structure of the tarball for the binary. Just do split the stuff, it is only a "per folder work". JUst have a look at how Debian splits the stuff, there it *does* work nicely:
http://packages.debian.net/search?keywo ... ection=all

And yes, each target arch has just a different binary, the rest is shared between all archs.

I don't understand why we have to split the main tarball to allow you as packager doing so. Or do you have to explicitly follow the tarball when creating the RPM packages? This would really suprise me...

warren
Posts: 12
Joined: April 19th, 2008, 10:24 pm

Re: Proposal: Code and Data Tarball Split

Post by warren » April 20th, 2008, 1:48 pm

Strange, the Debian packager already manages to offer split build. There is no need for you to explicitly follow the structure of the tarball for the binary. Just do split the stuff, it is only a "per folder work". JUst have a look at how Debian splits the stuff, there it *does* work nicely:

And yes, each target arch has just a different binary, the rest is shared between all archs.

I don't understand why we have to split the main tarball to allow you as packager doing so. Or do you have to explicitly follow the tarball when creating the RPM packages? This would really suprise me...
I would need to build the wesnoth-gamedata from a separate .src.rpm if I want that to be bit-for-bit identical and hardlinked between multiple distributions on our mirror. That is where most of the potential space savings above are from.

Currently if I wanted to do this, I would have to either:
1) Include the 142MB source tarball twice, once in wesnoth and again wesnoth-gamedata.
2) Manually rip apart the wesnoth source tarball and ship a modified one without data. I would rather not do this.

Again, this proposal makes it far easier for us to distribute without needlessly changing how others distribute Wesnoth.

Dave
Founding Developer
Posts: 7071
Joined: August 17th, 2003, 5:07 am
Location: Seattle
Contact:

Re: Proposal: Code and Data Tarball Split

Post by Dave » April 20th, 2008, 9:26 pm

I don't think that downloading multiple tarballs is as convenient for users.
“At Gambling, the deadly sin is to mistake bad play for bad luck.” -- Ian Fleming

User avatar
Noyga
Inactive Developer
Posts: 1790
Joined: September 26th, 2005, 5:56 pm
Location: France

Re: Proposal: Code and Data Tarball Split

Post by Noyga » April 21st, 2008, 12:13 am

If i understood correctly the problem:
The architecture independent data is indeed something like 99% of our tarball and it is something that you'll unlikely repackage differently in another distributions (especially since they are similar).
- For the moment you don't split the packages, thus you have 25 times the data (4 architecture rpm + 1 source rpm) * 5 distributions : big waste of space
- You can split the data inside a distribution, it'll fall down to 10 times the data (1 noarch rpm + 1 source rpm) * 5 distributions which is much better but still a lot.
- What you want is to have 1 architecture and distribution independent package : 2 times the data (1 noarch and 1 source rpm)
However if the binary packages are still made from the same tarfile it results in 5 extra (1 source rpm per distribution).
If you manage to make a single source rpm that works for all your distribution (for noarch and binary packages), you'll have only two instances of the data. But it's probably not that what you want since it's harder to maintain.
So splitting the data from the rest would be a great feature for the packagers.

On the other end :
- it's quite trivial to split the package yourself (maybe we could provide a make target or something like that to ease the process)
- it more convenient for some users to have everything in a single tar
- you need the data (a few configure checks iirc) to build the binary so distributing a wesnoth-nodata.tar.gz might be confusing since it can't build alone.
- the Wesnoth developement team usually don't want to deal with packaging issues
"Ooh, man, my mage had a 30% chance to miss, but he still managed to hit! Awesome!" ;) -- xtifr

warren
Posts: 12
Joined: April 19th, 2008, 10:24 pm

Re: Proposal: Code and Data Tarball Split

Post by warren » April 21st, 2008, 1:38 am

Noyga wrote:If i understood correctly the problem:
The architecture independent data is indeed something like 99% of our tarball and it is something that you'll unlikely repackage differently in another distributions (especially since they are similar).
Thank you, you understood all of it.
- it's quite trivial to split the package yourself (maybe we could provide a make target or something like that to ease the process)
This would be plenty sufficient. Just make a standard make target that will spit out a nodata tarball. You don't need to distribute that tarball itself (and it would indeed be confusing) but please make sure that the Makefile builds and installs without error in the absence of data.

warren
Posts: 12
Joined: April 19th, 2008, 10:24 pm

Re: Proposal: Code and Data Tarball Split

Post by warren » April 21st, 2008, 2:53 am

please make sure that the Makefile builds and installs without error in the absence of data.
[/quote]

A simpler way to implement this might be to require a non-default variable to allow install to succeed without data. Something like:

make install NODATA=yes

User avatar
Rhonda
Site Administrator
Posts: 47
Joined: January 26th, 2008, 9:13 pm
Location: Vienna, Austria, Europe, Earth, Milky Way
Contact:

Re: Proposal: Code and Data Tarball Split

Post by Rhonda » April 21st, 2008, 8:32 am

wtogami wrote: We ship wesnoth in Fedora repositories for four architectures (i386, x86_64, ppc, ppc64). We currently ship wesnoth in three current Fedora distributions (F-7, F-8 and F-9) and two EPEL distributions (EPEL-4 and EPEL-5). Each uses ~722MB of mirror space for wesnoth.
wesnoth*.i386.rpm 145MB
wesnoth*.x86_64.rpm 145MB
wesnoth*.ppc.rpm 145MB
wesnoth*.ppc64.rpm 145MB
wesnoth*.src.rpm 142MB

~722MB * 5 distributions = ~3.6GB of mirror space
It really makes me wonder, is the deb format really that superior to RPM that RPM isn't able to produce from a single tarball different binary packages that are partly arch dependent and partly not? This is something that is quite hard to believe - and to be honest, wesnoth would be the last package that comes to my mind that would have that problem. How about openoffice? There are also huge chunks of arch independent files in there, are they also put into arch dependent RPM files because RPM can't cope with them properly?

Are you sure you aren't doing something wrong here? It's a bit hard to believe that RPM can't do that. If you aren't doing it wrongly then I really can just pity RPM users...

Though, a real reason to request a change and distribute at least the music seperately is here: It wouldn't require to get "updated" with every new release but seperated from it. Often enough the music doesn't change at all, especially not through stable releases. Having to offer "updated" packages for the music (when there is no change in there at all) is just annoying, both to the packagers and to the users who have to download the exactly same amount of data again and again and again. And it's not like it's only am minor part of the packages to update, it's the major part of it.

warren
Posts: 12
Joined: April 19th, 2008, 10:24 pm

Re: Proposal: Code and Data Tarball Split

Post by warren » April 21st, 2008, 11:51 am

It really makes me wonder, is the deb format really that superior to RPM that RPM isn't able to produce from a single tarball different binary packages that are partly arch dependent and partly not?
You missed a point. I can make the data noarch, but that alone is insufficient if I want to share the data RPM between not just all architectures but also all of our distributions. We can save an additional ~600MB or so if the data is built from an independent package.

You make a good point about the music, it is ~50% of the installed size. Although it sounds like the developers really don't want to ship separate tarballs so this isn't possible.

torangan
Retired Developer
Posts: 1365
Joined: March 27th, 2004, 12:25 am
Location: Germany

Re: Proposal: Code and Data Tarball Split

Post by torangan » April 21st, 2008, 3:00 pm

Why isn't it possible for you to repackage the tarball yourself? I see no need to use the official released one if the source for splitted ones is trustworthy...
WesCamp-i18n - Translations for User Campaigns:
http://www.wesnoth.org/wiki/WesCamp

Translators for all languages required: contact me. No geek skills required!

User avatar
Rhonda
Site Administrator
Posts: 47
Joined: January 26th, 2008, 9:13 pm
Location: Vienna, Austria, Europe, Earth, Milky Way
Contact:

Re: Proposal: Code and Data Tarball Split

Post by Rhonda » April 21st, 2008, 4:18 pm

wtogami wrote:
It really makes me wonder, is the deb format really that superior to RPM that RPM isn't able to produce from a single tarball different binary packages that are partly arch dependent and partly not?
You missed a point. I can make the data noarch, but that alone is insufficient if I want to share the data RPM between not just all architectures but also all of our distributions. We can save an additional ~600MB or so if the data is built from an independent package.
I seem to really have missed your point because how would you share the data between all distributions and not only architectures? If only ...

Hmm, Debian implemented package pools for that years ago, that is a package with the same version (thus, not recompiled at all between releases) does exist only once in the pool, and packages files from the different releases reference the same package. Shouldn't this be something that would be possible to get implemented for Fedora, too? I would truly hope so because it can save you quite some place.

Your point sounds a bit like you already seem to have package pools, otherwise I wouldn't know how you would be able to share it through distributions if it's seperate but not if it comes from the same tarball? It doesn't really make sense to me.

warren
Posts: 12
Joined: April 19th, 2008, 10:24 pm

Re: Proposal: Code and Data Tarball Split

Post by warren » April 21st, 2008, 7:18 pm

I have no idea what your "package pool" is.

It is possible to have the same package across multiple Fedora distributions if we do a manual "tagging" in our database and hardlink of the packages across the directories. We do this very rarely because it isn't worth it except in a few cases like huge game data packages. The only way I can feasibly do this is if the data package is built separately from the code package.

Yes, I can make my own source tarball to do this. But an above idea was to provide a make target that will do it automatically. That is a fine idea that wouldn't effect anybody else. I hope we can have that.

torangan
Retired Developer
Posts: 1365
Joined: March 27th, 2004, 12:25 am
Location: Germany

Re: Proposal: Code and Data Tarball Split

Post by torangan » April 22nd, 2008, 12:27 am

Provide a patch and you'll most likely see it for the next release if it's just adding a new target. You know, autotools are on their way out in the development branch as we don't have people with the skill and motivation to maintain them anymore...
WesCamp-i18n - Translations for User Campaigns:
http://www.wesnoth.org/wiki/WesCamp

Translators for all languages required: contact me. No geek skills required!

warren
Posts: 12
Joined: April 19th, 2008, 10:24 pm

Re: Proposal: Code and Data Tarball Split

Post by warren » April 22nd, 2008, 1:27 am

What are you using instead of autotools?

Post Reply