-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pipenv git repository too large due to stored wheels #3680
Comments
Creating a separate repo for them would be very useful for other projects also, as it becomes a canonical set of test data for all the strange packages,etc that are possible. c.f. sarugaku/requirementslib#145 |
I'd be interested in feedback about how people think this should be handled. Will LFS actually help with wheels? I don't believe it will do much for tarballs. |
Short answer LFS will help, especially in the long run. The problem with wheels (and any non-text data) directly in a git repo is this: Git cannot track the changes. Whenever a binary files changes just one single bit, git will think it's a completly different file and store both versions, old and new, of the whole file in its entirety, not just the changes. Meaning if you have a 25 MB wheel in your repo, you commit a new version of the same wheel that has 26 MB, the whole repo will now be 51 MB, eventhough little actually changed between the two versions of the wheel. That's why the pipenv repo is currently 562 MB in size, even though all wheels combined in the latest commit are just 214 MB. The difference are older or deleted wheels in historic commits. Git with LFS stores just links to the files, and fetches them as necessary. The links are tiny. Problem is, you're stuck now with the historic commits, because you can't (shouldn't rewrite commit history). LFS will just prevent things from getting worse and make it possible to "delete" historic files no longer needed without them still cluttering the repo's history making it huger and huger as time progresses. NEVER commit binary data to git repositories, kids! |
@con-f-use thanks for the info, that's actually super useful. I have had limited success wiping artifacts from the tree in the past and I am always loathe to do that kind of sweeping history modification (although it is admittedly necessary). So lets make a path forward here -- lets say we create a separate repo, and lets say we turn on LFS properly etc, what are our options for scrubbing / shrinking the size of this repository? Again I have some experience doing that but with limited success and I'd be hesitant pushing that back up to the remote after destroying the reflog / history |
Currently we are using submodules to store pypi artifacts, I think this issue can be closed now. |
I noticed the pipenv git repository takes a long time to clone because of it's size. This is because a number of wheels (binary data) is stored directly in the repo rather than using git LFS or other means.
The problem will only grow with time, when different versions of the wheels get committed, because the old ones will still be part of the repo and git cannot make smart diffs with binary data as it can with text.
Please find another solution to storing wheels for tests.
The text was updated successfully, but these errors were encountered: