Heavy repository

Discussion of all aspects of the game engine, including development of new and existing features.

Moderator: Forum Moderators

Freem
Posts: 23
Joined: July 12th, 2009, 7:58 pm

Heavy repository

Post by Freem »

Hello.

I was curious about some parts of wesnoth's codebase and wanted to take a look. So I did as usual: git clone. And it took me around 2 days.

It took me that many time for those reasons:

* wesnoth codebase... hum, no, in fact, wesnoth's repository is huge. More than 3GiB if I can trust the "du" command.
* git clone can not be interrupted. If it is, you need to restart.
* my connection had problems, or I needed it for something else at some point.

Now, the reason why the repo is so large seems obvious, game data is included. Here is a look at what "du --max-depth=1 -h" shows me in the (finally) local clone I have:

Code: Select all

12K	./graphincludes
32K	./misc
72K	./scons
92K	./cmake
108K	./icons
612K	./sounds
644K	./packaging
2,8M	./projectfiles
4,8M	./utils
5,4M	./attic
6,9M	./fonts
7,8M	./images
7,9M	./doc
12M	./src
279M	./po
369M	./data
2,5G	./.git
3,1G	.
What I would suggest is to help the cloning by separating the actual program code and the other resources. Well, I think saying that 1 repo for program's code and another one for game data would only be a solution for coders, not for everyone. I think a better solution might be to split the repos like Debian did with the wesnoth-* packages:

Code: Select all

~% apt-cache pkgnames |grep wesnoth
wesnoth-1.12-trow
wesnoth-1.12-l
wesnoth-1.10-aoi
wesnoth-1.12-tools
wesnoth-1.10-music
wesnoth-1.12-httt
wesnoth-1.10-thot
wesnoth-1.10-trow
wesnoth-1.10-dbg
wesnoth-1.12-utbs
wesnoth-1.12-low
wesnoth-1.10-httt
wesnoth-1.10
wesnoth-1.12
wesnoth-1.12-music
wesnoth-music
wesnoth-1.12-data
wesnoth-1.10-did
wesnoth-1.10-tsg
wesnoth-1.10-ttb
wesnoth-1.12-aoi
wesnoth-1.12-dm
wesnoth-1.12-dw
wesnoth-1.10-utbs
wesnoth-1.12-ei
wesnoth-1.10-l
wesnoth-1.10-sotbe
wesnoth-1.10-sof
wesnoth-1.12-server
wesnoth-1.10-data
wesnoth-1.12-dbg
wesnoth-1.12-nr
wesnoth-1.10-dm
wesnoth-1.10-dw
wesnoth-1.12-core
wesnoth-1.10-ei
wesnoth-1.12-did
wesnoth-1.12-sotbe
wesnoth-1.12-tsg
wesnoth-1.10-server
wesnoth-1.12-ttb
wesnoth-1.10-nr
wesnoth-1.10-core
wesnoth-1.12-sof
wesnoth-core
wesnoth-1.10-tools
wesnoth-1.10-low
wesnoth
wesnoth-1.12-thot
This way, someone wanted to play with a specific part of wesnoth would not have to wait several hours just to have the clone.

PS: also, I've read in the guidelines that C++11 is not used because of the lack of supporting compilers. I think this is no longer true.
PPS: I was just curious to see if I could tinker around few things that annoy me, it is not said I will actually fix them. If I do, I'll obviously send patches, but nobody knows when, especially myself :)
fabi
Inactive Developer
Posts: 1260
Joined: March 21st, 2004, 2:42 pm
Location: Germany

Re: Heavy repository

Post by fabi »

I have asked for a split of the repository ever since it was converted into a git one.

I hope you have more luck than me :-)
"Wesnoth has many strong points but team and users management are certainly not in them." -- pyrophorus
Freem
Posts: 23
Joined: July 12th, 2009, 7:58 pm

Re: Heavy repository

Post by Freem »

Well, to be honest, I had that wish (to dig into wesnoth's codebase) since years, but never really had the will to use and understand mercurial. Now, I was surprised that the repo passed to git, and so I did a try. I have no job atm, so I had actually time to download that code base (and to be fair, I went to sleep with computers running/downloading, which is not something I really like to do, but more than 2 hours were already spent downloading...), despite several connection failures.
Now, it won't be everyone's case.

So, when I read that:
Sadly, a hard truth must be faced: Wesnoth, as a project, is understaffed. At this time, there are fewer than half a dozen developers working on each new version of the game, and even fewer of them are able to work on the engine itself. We do not collectively have the time or skills to fix bugs as quickly as we should, or implement features as rapidly as we would like. The game itself suffers from an aging codebase and old software.
I simply think that when a git clone takes more than 2 hours to be at 15% of progress, you won't have a lot of people trying to dig in the source out of curiosity, and even less trying to do that *and* provide some returns[1].
This is true for C++ codebase, of course, but it probably is even worse for art resources: I think coders will be more likely to download something for hours (but, well, 8 hours is still by far foo much) than someone just wanting to work on the arts, because coders have a real benefit from repositories (time machines are great!).
This is probably untrue for artists (they send png or ogg it seems, which don't take advantage of the time machine of repos. And I doubt you will move to svg/midi :lol:).

Now... you are involved in the project. I am not. You will prefer to do the thing with everyone, I do not mind. If I want to fix that, I can do it myself and take only the mainstream patches which would interest me. Then, I would simply send patches.
My point in saying that is simply to say: I doubt spending 8+ hours in cloning a repo and hope not having connection failures will help having new contributors.
There are other things to say, but they are other subjects, and I need to read more of the code to be sure it's not only because I didn't started by opening the good file first.

1: example of returns
_ peer reviews
_ patches to add assertions like in src/actions/move.cpp:1048 => "*(full_end_-1)" would lead to a crash if "full_end_ == route.begin()". Adding an assert() would at least make it clear, detecting the bug in debug mode and would change nothing for releases.
_ code de-duplication, like in undo_list::redo() and undo_list::undo()
_ refactoring: I have noticed tons of singletons and several shared_ptr, not sure if it really was the best way to go, someone might think like me *and* do something to prove be right ;)
_ quick bug fixes (no example for now, just reading random bits of code for now ;) )

Yes, you might notice I'm trying to figure how stuff related to planning mode work. Should give me some fair hints about how the global HIM works.
Of course, those stuff I used as example are only results of a quick code reading, not sure if they are really pertinent.
gfgtdf
Developer
Posts: 1432
Joined: February 10th, 2013, 2:25 pm

Re: Heavy repository

Post by gfgtdf »

I agree that iw would make sense to move the translations ( ./po ), and maybe even the campauigns to a sepereate repo. Onw of the main prblems i see hoewever is that even when we remove those files from the repo they'l stoll be in the main repos history (which is with 2.5 GB the main part of the size ). Also i don't exactly know how translations work in exctly. Actually i never tried to touch the ./po directoy becasue it afaik had ans still has a 'don't change po files via pull requests or git commits' policy
Scenario with Robots SP scenario (1.11/1.12), allows you to build your units with components, PYR No preperation turn 1.12 mp-mod that allows you to select your units immideately after the game begins.
User avatar
zookeeper
WML Wizard
Posts: 9742
Joined: September 11th, 2004, 10:40 pm
Location: Finland

Re: Heavy repository

Post by zookeeper »

It'd be super great if the repo size could be decreased, but I have no idea how possible or feasible something like that is. I know little about git, but I can imagine how it could be a nightmarish undertaking to split an existing repo while correctly retaining >10 years of history containing countless moves and renames and the whole directory structure having changed substantially over time.
Freem wrote:Well, I think saying that 1 repo for program's code and another one for game data would only be a solution for coders, not for everyone.
How is it a solution for coders, though? Sure, you could clone the source, do a change, and compile, but you still need the latest game data in order to test it. I guess you could get the data repo as a shallow clone then?
gfgtdf
Developer
Posts: 1432
Joined: February 10th, 2013, 2:25 pm

Re: Heavy repository

Post by gfgtdf »

I actually think that the translations files were always in the ./po directoy and from what i saw in the git docuemtntation it maybe aklso be not too hard to move the po files toa seperate directpy using "git filter-branch" a disadvantage woudl be that this woudl requirea force push to master which would make merging working branches hader that were branched before the force push.
Scenario with Robots SP scenario (1.11/1.12), allows you to build your units with components, PYR No preperation turn 1.12 mp-mod that allows you to select your units immideately after the game begins.
Freem
Posts: 23
Joined: July 12th, 2009, 7:58 pm

Re: Heavy repository

Post by Freem »

I think back in the time translations were managed through another tool/website. I do not know if they still are.

About striping data... I don't know. It seems flare did it. How... I guess one should take a look in git help.

Code: Select all

How is it a solution for coders, though? Sure, you could clone the source, do a change, and compile, but you still need the latest game data in order to test it. I guess you could get the data repo as a shallow clone then?
Coders can use the data from official releases.

[edit]
I mean, random coders like I, of course. If the thing someone want's to code depends on stuff only in repo, obviously it won't help
User avatar
Dugi
Posts: 4961
Joined: July 22nd, 2010, 10:29 am
Location: Carpathian Mountains
Contact:

Re: Heavy repository

Post by Dugi »

zookeeper wrote:It'd be super great if the repo size could be decreased, but I have no idea how possible or feasible something like that is. I know little about git, but I can imagine how it could be a nightmarish undertaking to split an existing repo while correctly retaining >10 years of history containing countless moves and renames and the whole directory structure having changed substantially over time.
How about renaming wesnoth to wesnoth-old and uploading all the stuff into a new repository, wesnoth, that would be clean of the old stuff?

It is possible to duplicate it so that squashing all edits before 1.12 would not alter the old repo?
Freem wrote:PS: also, I've read in the guidelines that C++11 is not used because of the lack of supporting compilers. I think this is no longer true.
I have added a note about this into the page's discussion, but it appears that nobody noticed it. Wesnoth's source code actually uses C++11, so it's not that the rules are outdated, just the stuff on the wiki is.
gfgtdf
Developer
Posts: 1432
Joined: February 10th, 2013, 2:25 pm

Re: Heavy repository

Post by gfgtdf »

I have added a note about this into the page's discussion, but it appears that nobody noticed it. Wesnoth's source code actually uses C++11, so it's not that the rules are outdated, just the stuff on the wiki is.
hmm ok i updated that page.
Scenario with Robots SP scenario (1.11/1.12), allows you to build your units with components, PYR No preperation turn 1.12 mp-mod that allows you to select your units immideately after the game begins.
User avatar
Iris
Site Administrator
Posts: 6798
Joined: November 14th, 2006, 5:54 pm
Location: Chile
Contact:

Re: Heavy repository

Post by Iris »

Freem wrote:* git clone can not be interrupted. If it is, you need to restart.
* my connection had problems, or I needed it for something else at some point.
Not saying that your complaints aren’t legitimate, and I can understand your ordeal as a mobile broadband user myself, but if you had asked around first, perhaps someone could have pointed you towards this post:
shadowm wrote:[...] you should get familiarized with Git and clone the Wesnoth source code repository, which contains all contents and history of mainline Wesnoth, including the game engine itself. Note that it’s a large download (approximately 2 GiB). If you require the ability to pause and resume your download, you should ask shadowm (either on IRC or via forum PM) for help with obtaining access to our downloadable Git snapshots.
Dugi wrote:
Freem wrote:PS: also, I've read in the guidelines that C++11 is not used because of the lack of supporting compilers. I think this is no longer true.
I have added a note about this into the page's discussion, but it appears that nobody noticed it. Wesnoth's source code actually uses C++11, so it's not that the rules are outdated, just the stuff on the wiki is.
Only by approximately one month, which is actually far better than most of our documentation.
Author of the unofficial UtBS sequels Invasion from the Unknown and After the Storm.
Freem
Posts: 23
Joined: July 12th, 2009, 7:58 pm

Re: Heavy repository

Post by Freem »

shadowm wrote:
Freem wrote:* git clone can not be interrupted. If it is, you need to restart.
* my connection had problems, or I needed it for something else at some point.
Not saying that your complaints aren’t legitimate, and I can understand your ordeal as a mobile broadband user myself, but if you had asked around first, perhaps someone could have pointed you towards this post:
I see.

Well, I must admit I have only read stuff from the wiki, several links given by [this page](https://wiki.wesnoth.org/DevelopersHome). Now, to be honest, if I would have read the link you point to, I doubt I would have try to clone the repo. That may seems strange, but for me it means you know the problem, and don't (because of lack of will or of power?) fix it.
When I want to code or read code from an open-source project for fun, it's usually at a moment t, and if I can't play with code (relatively) quickly but have to ask someone, then I would probably lose motivation by the time there is a reply. I don't even understand how I have found motivation to restart it several times, that's really unusual from me.
Also, I usually read the wiki, README and source code into the repo *before* going to read posts in a forum. I usually go to the forum when I have a question or suggestion which should not go into a bugtracker, not for semi-technical issues, which are stuff I am used to solve by myself (all projects can't have more than 100 read in less than 24h for a single forum thread, that's quite rare from my point of view). Fact is, I didn't even thought about asking around for an issue caused by my side of the Internet. Should have, obviously.
And finally, I would probably have not made an account to speak about the problem (especially if I know the problems comes from me/my hardware/my whatever). And it's even less probable that I would have made one to ask for links to archives. I can remember having done once. I only remember that I forgot everything about that except the fact that I would have to wait to contribute (plus, the forum was with admin confirmation, so imagine :D).

I was not even aware of the problem that git could not resume a download after a failing connection, since it's the first time to me a clone takes more than 1 hour. I have learned something about git this week (not that hard, I'm no git expert).

Anyway, from what I can read on this thread, I must admit I am surprised.
I thought that wesnoth, being and old project which runs with really few bugs (ok, there are some oos errors with special stuff sometimes, and planning mode is not really efficient, but it's far better than when it was introduced, and things works more than correctly when it's disabled), would have reliable informations in the wiki (not complete, because I know how boring it is to write doc, but reliable and synced for the parts present). I guess your lack of manpower is worse than what I thought. I can only suggest you (the project, not you personally) that, even if you don't fix this, at least point out in the wiki the forum links where instructions for coders are.
This would avoid wasting time of several people :) (like you, which could reply to more important stuff instead of quoting posts)
gfgtdf
Developer
Posts: 1432
Joined: February 10th, 2013, 2:25 pm

Re: Heavy repository

Post by gfgtdf »

I just have the idea to move trasnlations and music to seperate repos some thouhts, and i think that the main problem with removing those things from the repos history is that it woudl then also remove them from the 1.12 and older git tags which is not something it think we should do. So when we do that we shoudl most likeley move those old tag on a diffrerent repo.


So it seem to me like if we want to decrease the repos size, unrelated to how exactly we want to do it (whether we move some oarts in a differnet repor or whether we cut the history belo a certain date liek dugi suggested) we will need to create a weml-old repor for the old stuff. whcih are then not effected by those changes.
Scenario with Robots SP scenario (1.11/1.12), allows you to build your units with components, PYR No preperation turn 1.12 mp-mod that allows you to select your units immideately after the game begins.
User avatar
Celtic_Minstrel
Developer
Posts: 2207
Joined: August 3rd, 2012, 11:26 pm
Location: Canada
Contact:

Re: Heavy repository

Post by Celtic_Minstrel »

While I agree that the size of the wesnoth repo (including history) is kind of insane, I want to point out that new contributors could actually download the zip from github rather than cloning (or make a shallow clone); either method would work around the issue of the insane size since the majority of that size lies in the repository history.

I'm dubious about whether moving translations or music is actually worth the effort. I think I'd want to somehow figure out how much they really take up, history included. I don't actually know any way to do that though. Moving the essential resources (ie, graphics) is in my opinion not an option, since it means that a simple "git clone" suddenly doesn't produce enough to build a functioning copy of the game (the move would probably involve using submodules), and furthermore doesn't actually solve the issue (the full history still gets downloaded, just now in two steps instead of one). Moving campaigns may or may not be worthwhile, depending on the extent to which graphics dominate the campaign sizes, but I doubt it would make a significant dent in the overall repository size.

In summary, I acknowledge that this is a problem but can't see any solution that doesn't just create more problems.
Author of The Black Cross of Aleron campaign and Default++ era.
Former maintainer of Steelhive.
Freem
Posts: 23
Joined: July 12th, 2009, 7:58 pm

Re: Heavy repository

Post by Freem »

Just for information, I just did a small try to give translations and campaigns their own repository (without for now removing). It seems the procedure is quite easy (and faster than I thought).

In short, it consists of using "git subtree" to create a branch containing stuff from only one folder, then create a new repo and pull in the newly created branch.

For example:

Code: Select all

cd ~
git clone https://github.com/wesnoth/wesnoth ~/splitting-wesnoth
cd ~/splitting-wesnoth
git subtree split -P data/campaigns -b wesnoth-campaigns
git subtree split -P po -b wesnoth-translations
git subtree split -P music -b wesnoth-musics
mkdir ~/splitted-wesnoths
cd ~/splitted-wesnoths
for split in musics translations campaigns
do
  mkdir $split
  cd $split
  git init
  git pull ~/splitting-wesnoth wesnoth-$split
  cd ~/splitted-wesnoth
done
Resulting weigths for me (since I haven't pulled since I started this thread, your weights might be different):

wesnoth-translations: 880M
wesnoth-campaigns: 487M
wesnoth-musics: 297M

The "original" repo (~/splitting-wesnoth here) remains untouched, except for the creation of branches, so the repo still weight a lot, but the single branches can also be pulled, which would make the clone easier for those 3 folders.
Of course, I started this for the source code, and didn't applied the trick to src, but it is because there would be the need of a commit to move source code related stuff in a single folder before creating the branch, which would in turn imply to adapt the project files (cmake, scons, VC9 too if I trust 'find -name "*.cpp"', and probably other stuff around?).

Anyway, this might be a starting point from a technical point of view? At least, it makes me thinks it is doable, but it might imply modifying some project files here and there. Considering that this stuff is based on branches, it may be doable to have a transition period and still keep maintaining the sync without too much burden?
User avatar
Celtic_Minstrel
Developer
Posts: 2207
Joined: August 3rd, 2012, 11:26 pm
Location: Canada
Contact:

Re: Heavy repository

Post by Celtic_Minstrel »

Hmm, most of the talk until now revolved around the use of "git submodule". I just took a look at the "git subtree" man-page, and it does look like that might work better than a submodule. The question though is, does it actually solve the problem? Whether it's a submodule or a subtree, the full history is still there, right? Will the subtree history be automatically downloaded when you clone? Do you need extra steps to download the subtree when you clone?
Author of The Black Cross of Aleron campaign and Default++ era.
Former maintainer of Steelhive.
Post Reply