Proposal: Code and Data Tarball Split
Moderator: Forum Moderators
Proposal: Code and Data Tarball Split
Hi folks,
We ship wesnoth in Fedora repositories for four architectures (i386, x86_64, ppc, ppc64). We currently ship wesnoth in three current Fedora distributions (F-7, F-8 and F-9) and two EPEL distributions (EPEL-4 and EPEL-5). Each uses ~722MB of mirror space for wesnoth.
When wesnoth releases new tarballs, split the code and data into two separate tarballs. Both wesnoth and wesnoth-gamedata (or wesnoth-data or whatever name you want to call it) have their own Makefiles. make install can be used to install their payload into the appropriate location. You could require that wesnoth install happens before wesnoth-gamedata or vice versa (whatever you choose). Example:
No. Binary packages of wesnoth can require a matching version of wesnoth-gamedata.
What if another binary distributor doesn't want to split the binaries and data?
Nobody is forced to change their binary distribution if they don't want to. They can build both source tarballs into their existing RPM/DEB/EXE binary packages and not change anything about how they distribute Wesnoth.
Thoughts?
We ship wesnoth in Fedora repositories for four architectures (i386, x86_64, ppc, ppc64). We currently ship wesnoth in three current Fedora distributions (F-7, F-8 and F-9) and two EPEL distributions (EPEL-4 and EPEL-5). Each uses ~722MB of mirror space for wesnoth.
I am hoping the wesnoth project is willing to consider a small change to how the source tarballs are released in order to enable more efficient distribution for binary distributors like us.wesnoth*.i386.rpm 145MB
wesnoth*.x86_64.rpm 145MB
wesnoth*.ppc.rpm 145MB
wesnoth*.ppc64.rpm 145MB
wesnoth*.src.rpm 142MB
~722MB * 5 distributions = ~3.6GB of mirror space
When wesnoth releases new tarballs, split the code and data into two separate tarballs. Both wesnoth and wesnoth-gamedata (or wesnoth-data or whatever name you want to call it) have their own Makefiles. make install can be used to install their payload into the appropriate location. You could require that wesnoth install happens before wesnoth-gamedata or vice versa (whatever you choose). Example:
Doesn't this introduce risk of the user's wesnoth version mismatching their wesnoth-gamedata version?wesnoth-1.4.2.tar.bz2
wesnoth*.src.rpm
wesnoth*.i386.rpm
wesnoth*.x86_64.rpm
wesnoth*.ppc.rpm
wesnoth*.ppc64.rpm
These RPMS are very small because they contain only code.
wesnoth-gamedata-1.4.2.tar.bz2
wesnoth-gamedata-1.4.2.src.rpm
wesnoth-gamedata-1.4.2-1.noarch.rpm
This RPM is very big. It contains the architecture independent game data that can be used by any of the above binary RPMS. Since this is identical between the four architectures and five distributions, we can hardlink it in the mirror tree. This change would enable us to save over 3.2GB of space on hundreds of our public mirrors. Much less data to push to mirrors means we can more easily push updated versions of wesnoth to the users.
No. Binary packages of wesnoth can require a matching version of wesnoth-gamedata.
What if another binary distributor doesn't want to split the binaries and data?
Nobody is forced to change their binary distribution if they don't want to. They can build both source tarballs into their existing RPM/DEB/EXE binary packages and not change anything about how they distribute Wesnoth.
Thoughts?
Re: Proposal: Code and Data Tarball Split
Sounds reasonable, this would also make it easier to release 'patches' in between minor versions. (Since you could ship a tar-ball for the executables and one for the gamedata - where the second just contains the changes).
I don't know if the current code-structure is up for it, but I'm sure the devs are happy to recieve a patch
/tsr
I don't know if the current code-structure is up for it, but I'm sure the devs are happy to recieve a patch
/tsr
Re: Proposal: Code and Data Tarball Split
Strange, the Debian packager already manages to offer split build. There is no need for you to explicitly follow the structure of the tarball for the binary. Just do split the stuff, it is only a "per folder work". JUst have a look at how Debian splits the stuff, there it *does* work nicely:wtogami wrote: What if another binary distributor doesn't want to split the binaries and data?
Nobody is forced to change their binary distribution if they don't want to. They can build both source tarballs into their existing RPM/DEB/EXE binary packages and not change anything about how they distribute Wesnoth.
http://packages.debian.net/search?keywo ... ection=all
And yes, each target arch has just a different binary, the rest is shared between all archs.
I don't understand why we have to split the main tarball to allow you as packager doing so. Or do you have to explicitly follow the tarball when creating the RPM packages? This would really suprise me...
Re: Proposal: Code and Data Tarball Split
I would need to build the wesnoth-gamedata from a separate .src.rpm if I want that to be bit-for-bit identical and hardlinked between multiple distributions on our mirror. That is where most of the potential space savings above are from.Strange, the Debian packager already manages to offer split build. There is no need for you to explicitly follow the structure of the tarball for the binary. Just do split the stuff, it is only a "per folder work". JUst have a look at how Debian splits the stuff, there it *does* work nicely:
And yes, each target arch has just a different binary, the rest is shared between all archs.
I don't understand why we have to split the main tarball to allow you as packager doing so. Or do you have to explicitly follow the tarball when creating the RPM packages? This would really suprise me...
Currently if I wanted to do this, I would have to either:
1) Include the 142MB source tarball twice, once in wesnoth and again wesnoth-gamedata.
2) Manually rip apart the wesnoth source tarball and ship a modified one without data. I would rather not do this.
Again, this proposal makes it far easier for us to distribute without needlessly changing how others distribute Wesnoth.
Re: Proposal: Code and Data Tarball Split
I don't think that downloading multiple tarballs is as convenient for users.
“At Gambling, the deadly sin is to mistake bad play for bad luck.” -- Ian Fleming
Re: Proposal: Code and Data Tarball Split
If i understood correctly the problem:
The architecture independent data is indeed something like 99% of our tarball and it is something that you'll unlikely repackage differently in another distributions (especially since they are similar).
- For the moment you don't split the packages, thus you have 25 times the data (4 architecture rpm + 1 source rpm) * 5 distributions : big waste of space
- You can split the data inside a distribution, it'll fall down to 10 times the data (1 noarch rpm + 1 source rpm) * 5 distributions which is much better but still a lot.
- What you want is to have 1 architecture and distribution independent package : 2 times the data (1 noarch and 1 source rpm)
However if the binary packages are still made from the same tarfile it results in 5 extra (1 source rpm per distribution).
If you manage to make a single source rpm that works for all your distribution (for noarch and binary packages), you'll have only two instances of the data. But it's probably not that what you want since it's harder to maintain.
So splitting the data from the rest would be a great feature for the packagers.
On the other end :
- it's quite trivial to split the package yourself (maybe we could provide a make target or something like that to ease the process)
- it more convenient for some users to have everything in a single tar
- you need the data (a few configure checks iirc) to build the binary so distributing a wesnoth-nodata.tar.gz might be confusing since it can't build alone.
- the Wesnoth developement team usually don't want to deal with packaging issues
The architecture independent data is indeed something like 99% of our tarball and it is something that you'll unlikely repackage differently in another distributions (especially since they are similar).
- For the moment you don't split the packages, thus you have 25 times the data (4 architecture rpm + 1 source rpm) * 5 distributions : big waste of space
- You can split the data inside a distribution, it'll fall down to 10 times the data (1 noarch rpm + 1 source rpm) * 5 distributions which is much better but still a lot.
- What you want is to have 1 architecture and distribution independent package : 2 times the data (1 noarch and 1 source rpm)
However if the binary packages are still made from the same tarfile it results in 5 extra (1 source rpm per distribution).
If you manage to make a single source rpm that works for all your distribution (for noarch and binary packages), you'll have only two instances of the data. But it's probably not that what you want since it's harder to maintain.
So splitting the data from the rest would be a great feature for the packagers.
On the other end :
- it's quite trivial to split the package yourself (maybe we could provide a make target or something like that to ease the process)
- it more convenient for some users to have everything in a single tar
- you need the data (a few configure checks iirc) to build the binary so distributing a wesnoth-nodata.tar.gz might be confusing since it can't build alone.
- the Wesnoth developement team usually don't want to deal with packaging issues
"Ooh, man, my mage had a 30% chance to miss, but he still managed to hit! Awesome!" -- xtifr
Re: Proposal: Code and Data Tarball Split
Thank you, you understood all of it.Noyga wrote:If i understood correctly the problem:
The architecture independent data is indeed something like 99% of our tarball and it is something that you'll unlikely repackage differently in another distributions (especially since they are similar).
This would be plenty sufficient. Just make a standard make target that will spit out a nodata tarball. You don't need to distribute that tarball itself (and it would indeed be confusing) but please make sure that the Makefile builds and installs without error in the absence of data.- it's quite trivial to split the package yourself (maybe we could provide a make target or something like that to ease the process)
Re: Proposal: Code and Data Tarball Split
[/quote]please make sure that the Makefile builds and installs without error in the absence of data.
A simpler way to implement this might be to require a non-default variable to allow install to succeed without data. Something like:
make install NODATA=yes
- Rhonda
- Site Administrator
- Posts: 47
- Joined: January 26th, 2008, 9:13 pm
- Location: Vienna, Austria, Europe, Earth, Milky Way
- Contact:
Re: Proposal: Code and Data Tarball Split
It really makes me wonder, is the deb format really that superior to RPM that RPM isn't able to produce from a single tarball different binary packages that are partly arch dependent and partly not? This is something that is quite hard to believe - and to be honest, wesnoth would be the last package that comes to my mind that would have that problem. How about openoffice? There are also huge chunks of arch independent files in there, are they also put into arch dependent RPM files because RPM can't cope with them properly?wtogami wrote: We ship wesnoth in Fedora repositories for four architectures (i386, x86_64, ppc, ppc64). We currently ship wesnoth in three current Fedora distributions (F-7, F-8 and F-9) and two EPEL distributions (EPEL-4 and EPEL-5). Each uses ~722MB of mirror space for wesnoth.wesnoth*.i386.rpm 145MB
wesnoth*.x86_64.rpm 145MB
wesnoth*.ppc.rpm 145MB
wesnoth*.ppc64.rpm 145MB
wesnoth*.src.rpm 142MB
~722MB * 5 distributions = ~3.6GB of mirror space
Are you sure you aren't doing something wrong here? It's a bit hard to believe that RPM can't do that. If you aren't doing it wrongly then I really can just pity RPM users...
Though, a real reason to request a change and distribute at least the music seperately is here: It wouldn't require to get "updated" with every new release but seperated from it. Often enough the music doesn't change at all, especially not through stable releases. Having to offer "updated" packages for the music (when there is no change in there at all) is just annoying, both to the packagers and to the users who have to download the exactly same amount of data again and again and again. And it's not like it's only am minor part of the packages to update, it's the major part of it.
Re: Proposal: Code and Data Tarball Split
You missed a point. I can make the data noarch, but that alone is insufficient if I want to share the data RPM between not just all architectures but also all of our distributions. We can save an additional ~600MB or so if the data is built from an independent package.It really makes me wonder, is the deb format really that superior to RPM that RPM isn't able to produce from a single tarball different binary packages that are partly arch dependent and partly not?
You make a good point about the music, it is ~50% of the installed size. Although it sounds like the developers really don't want to ship separate tarballs so this isn't possible.
Re: Proposal: Code and Data Tarball Split
Why isn't it possible for you to repackage the tarball yourself? I see no need to use the official released one if the source for splitted ones is trustworthy...
WesCamp-i18n - Translations for User Campaigns:
http://www.wesnoth.org/wiki/WesCamp
Translators for all languages required: contact me. No geek skills required!
http://www.wesnoth.org/wiki/WesCamp
Translators for all languages required: contact me. No geek skills required!
- Rhonda
- Site Administrator
- Posts: 47
- Joined: January 26th, 2008, 9:13 pm
- Location: Vienna, Austria, Europe, Earth, Milky Way
- Contact:
Re: Proposal: Code and Data Tarball Split
I seem to really have missed your point because how would you share the data between all distributions and not only architectures? If only ...wtogami wrote:You missed a point. I can make the data noarch, but that alone is insufficient if I want to share the data RPM between not just all architectures but also all of our distributions. We can save an additional ~600MB or so if the data is built from an independent package.It really makes me wonder, is the deb format really that superior to RPM that RPM isn't able to produce from a single tarball different binary packages that are partly arch dependent and partly not?
Hmm, Debian implemented package pools for that years ago, that is a package with the same version (thus, not recompiled at all between releases) does exist only once in the pool, and packages files from the different releases reference the same package. Shouldn't this be something that would be possible to get implemented for Fedora, too? I would truly hope so because it can save you quite some place.
Your point sounds a bit like you already seem to have package pools, otherwise I wouldn't know how you would be able to share it through distributions if it's seperate but not if it comes from the same tarball? It doesn't really make sense to me.
Re: Proposal: Code and Data Tarball Split
I have no idea what your "package pool" is.
It is possible to have the same package across multiple Fedora distributions if we do a manual "tagging" in our database and hardlink of the packages across the directories. We do this very rarely because it isn't worth it except in a few cases like huge game data packages. The only way I can feasibly do this is if the data package is built separately from the code package.
Yes, I can make my own source tarball to do this. But an above idea was to provide a make target that will do it automatically. That is a fine idea that wouldn't effect anybody else. I hope we can have that.
It is possible to have the same package across multiple Fedora distributions if we do a manual "tagging" in our database and hardlink of the packages across the directories. We do this very rarely because it isn't worth it except in a few cases like huge game data packages. The only way I can feasibly do this is if the data package is built separately from the code package.
Yes, I can make my own source tarball to do this. But an above idea was to provide a make target that will do it automatically. That is a fine idea that wouldn't effect anybody else. I hope we can have that.
Re: Proposal: Code and Data Tarball Split
Provide a patch and you'll most likely see it for the next release if it's just adding a new target. You know, autotools are on their way out in the development branch as we don't have people with the skill and motivation to maintain them anymore...
WesCamp-i18n - Translations for User Campaigns:
http://www.wesnoth.org/wiki/WesCamp
Translators for all languages required: contact me. No geek skills required!
http://www.wesnoth.org/wiki/WesCamp
Translators for all languages required: contact me. No geek skills required!
Re: Proposal: Code and Data Tarball Split
What are you using instead of autotools?