Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relatively large binaries in repo history - also Makefile thoughts #153

Closed
davidjoffe opened this issue Nov 21, 2022 · 10 comments
Closed

Relatively large binaries in repo history - also Makefile thoughts #153

davidjoffe opened this issue Nov 21, 2022 · 10 comments
Labels

Comments

@davidjoffe
Copy link
Owner

davidjoffe commented Nov 21, 2022

EDIT 22 NOV 2022 I HAVE DONE THIS - APOLOGIES FOR ANY INCONVENIENCE. You will probably have to re-clone maybe to re-set up your working trees. ALSO have renamed the main branch. This is a force-rewrite of the history. Your contributions should be intact in the history. But I won't do this again.

[low prio] FYI: if one clean clones the repo it's about 24MB. Relatively tiny by most standards BUT, most of that space is files that shouldn't be in the history, which bothers me slightly (e.g. "datasrc/.psd" that are now in separate repo, where they should be) and "data/.tga" and "data/*.lev" etc. that are also in seperate repo, as they should be).

No Intellectual Property issues, that's not the issue, just the space. (These are all files that are not meant to be present directly if you git clone the source in your working tree - it's just that in the beginning I just had one repo, I hadn't separated the data and datasrc out .. only later decided to split into 3 repos (src, datasrc for things like .psd, data))

If I do a test locally and trim that clutter out the git repo history and crunch (filter-data + reflog-exire + prune etc.) it the entire source repo of dave_gnukem source code becomes a nice attractive tiny less than 2MB, much nippier to work with. But if I ever had to crunch that I'd have to do a 'force push' to rewrite the history which I NEVER want to do because existing forks and other contributions may be messed with - so I will leave this repo 'as is', HOWEVER, this is just a small thing that nags me somehow. what I'm thinking is that maybe at some point I'll TRY create a new parallel new 'crunched' repo and call it 'dave_gnukem2' or something (it would start out having the exact same history as this one and same working tree as this one at that point in time EXCEPT the "datasrc/" and "data/" stuff removed from old history, e.g. contributions like Matteo's should remain present and linked correctly I think - if not I won't do it because I think the accreditation of contributors is really imoprtant) - and perhaps then ports and forks thereafter could start being based off that - but leave this dave_gnukem repo as is so it doesn't with any existing anything - but I want to make sure everybody who deserves credit is still in the repo history in the correct way - so will do tests to check before I do anything on this, if I even change it.

At that point might be a good time to switch to better e.g. autoconf or cmake tools or something to help manage dependencies etc. for more platforms than the ancient Makefile - that way port maintainers who rely on that stable crappy old literally-1998-based Makefile (this Makefile is the same age as my girlfriend) could keep using it if they want but those who want to switch to newer better build system could. That new repo could also be used for more serious refactoring to make the engine more flexible perhaps.
OR: I might rename existing repo to eg dave_gnukem_old and leave the new one under same name - I'm just doing tests now - history looks fair/fine

Or 24MB of clutter in source repo history nothing to worry about? maybe. I just worry 'every repo ever created from this one ever' would have that

(ALSO FYI master branch will probably be renamed to main soon)

Thoughts?

I just want people to be aware of this also, so if/when I do try do this, you can be prepared and know why. But will test thoroughly there's nothing negative, especially in terms of user contribution accreditation in the history of eg merged pull requests, I want those to hopefully still be clear in the history - will try make sure

@davidjoffe davidjoffe changed the title Relatively large binaries in repo history Relatively large binaries in repo history - also Makefile thoughts Nov 21, 2022
@davidjoffe
Copy link
Owner Author

davidjoffe commented Nov 21, 2022

FYI I'm doing some tests in some clone repos (to be deleted) and so far the history looks 'pretty good' except the pull requests don't show exactly right (but it does most importantly show as being merged from the actual contributing user) .. tags seem to transfer, but not release files .. issues I think are movable .. hmm, if the history looks fine maybe will indeed try do a 'push force' with a rewritten history .. at that point all SHAs change though, so it would require anyone with a cloned repo to re-get it from scratch .. anyway will think about it and decide.
Apologies in advance for any inconvience this might cause for forks etc. if I do that

@davidjoffe
Copy link
Owner Author

(Something also for me to think about is that conceptually parts of the code are also meant to be 'generic' 'engine-y' while the others parts game-specific - theoretically another 2-repo split if ever were to worry seriously about that - but there are so many better larger game engines it's unclear what the value proposition would be for yet another one and I certainly don't have spare time to do a larger more generic 'game engine' copmonent)

@davidjoffe
Copy link
Owner Author

davidjoffe commented Nov 22, 2022

I have thought about it and decided I am going to do this, sorry for any inconvenience this causes downstream - you'll probably have to re-get/re-clone to re-setup your working tree (MAKE BACKUPS of your working trees before sorting this out! If you have lots of local changes in your work tree could be a pain)

@andreaspeters
Copy link
Collaborator

How about if you include the two data source git repositories as "git submodul"?

@davidjoffe
Copy link
Owner Author

Hmm, "datasrc" is maybe a special case as very few people would need that - e.g. artists etc. who literally use PhotoShop to edit sprites (it's not intended for general distribution in e.g. a Debian package say because most normal users who just want to play the game wouldn't really be wanting or needing to try edit the files in Photoshop?)

data is really necessary to the gameplay .. so maybe a more compelling use case for submodules .. I don't know if there may be negative effects to using submodules.

I want to though also have the freedom to potentially in future be able to use different data folders - for example, if we extend the core 'engine' to handle more games (either some other unrelated game, say, or a hypothetical 'Dave Gnukem version 2' .. then I'd still want the core game source to support the 'version 1 Dave Gnukem' game and gameplay and game data and the set of levels etc. that were regarded as 'version 1' but potentially it would then be 'detached' from that specific data folder to have some other data folder entirely ... e.g. if someone wants to use the 'engine' to build some entirely different game. Maybe not likely but I don't know.

I was thinking of adding some small helper scripts to maybe more easily git clone the data subfolder.

In future we may also have some hypothetical scenario where e.g. maybe the main source tree contains some bleeding edge breaking stuff but the main data is the stable data (or vice versa) ...

Not sure if anyone else has some thoughts? Would submodules affect things like Debian packaging or not really? I mean of course it's simple enough to 'ignore a submodule if it's present .. to be honest I haven't worked that much with submodules, so it's a bit unfamiliar to me, that's also why I haven't

@davidjoffe
Copy link
Owner Author

Of course for now the focus is mainly on consolidating and stabilizing the main 'Dave Gnukem version 1' and helping get more ports building reliably etc. (And I don't want to suddenly change sprite data or level data drastically right now because that's essentially the 'official' version 1 we released - though we could add more separate 'missions' and/or some new levels as 'bonus levels' or something in a hypothetical version 1.2 or somethign.)

@davidjoffe
Copy link
Owner Author

Anyone else have thoughts for/against submodules?

@andreaspeters
Copy link
Collaborator

Yeah, keep it so easy as possible. If you didn't work with submodules before, let it and save your time for more important things. :-) At least, you even could add the "git clone" commands into the Makefile to save some extra steps. :-)

@davidjoffe
Copy link
Owner Author

davidjoffe commented Nov 23, 2022

I locally started creating a little separate helper script that looks like this:

If we integrate it into Makefile though we should be careful as could cause issues with automated downstream build systems that maybe apply patches etc. to this Makefile or maybe packaging systems where it's maybe done a different way e.g. Debian .. but I was also thinking maybe the build scripts could help make it easier .. but hadn't decided ..

getdatafolder.sh:

#!/bin/sh
# dj2022 small helper script to get or update data subfolder (you need git installed for this)

djDATADIR="data"
djDATA_URL="https://github.com/davidjoffe/gnukem_data"

if [ -d "$djDATADIR" ]; then
	echo Updating data folder "${djDATADIR}" ...
	echo cd "${djDATADIR}"
	cd "${djDATADIR}"
	# show current folder
	pwd
	echo git pull
	git pull
	echo cd ..
	cd ..
else
	echo Cloning data folder ...
	git clone "${djDATA_URL}" "${djDATADIR}"
fi

@davidjoffe
Copy link
Owner Author

FYI have also now added this new helper script to the repo:

https://github.com/davidjoffe/dave_gnukem/blob/main/get_datafolders.sh

@davidjoffe davidjoffe unpinned this issue Nov 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants