Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plumbing: format/packfile, prevent large objects from being read into memory completely #303

Merged
merged 5 commits into from
May 12, 2021

Conversation

zeripath
Copy link
Contributor

This PR adds code to prevent large objects from being read into memory from packfiles or the filesystem.

Objects greater than 1Mb are now no longer directly stored in the cache
or read completely into memory.

Signed-off-by: Andrew Thornton art27@cantab.net

This PR adds code to prevent large objects from being read into memory from packfiles or the filesystem.

Objects greater than 1Mb are now no longer directly stored in the cache
or read completely into memory.

Signed-off-by: Andrew Thornton <art27@cantab.net>
@mcuadros mcuadros requested a review from jfontan April 18, 2021 23:38
Signed-off-by: Andrew Thornton <art27@cantab.net>
@jfontan
Copy link
Member

jfontan commented Apr 22, 2021

@zeripath Thanks for the PR. I like the changes but I'm a little worried of using a goroutine for the reader. Do you think it can be done without using it?

Signed-off-by: Andrew Thornton <art27@cantab.net>
Signed-off-by: Andrew Thornton <art27@cantab.net>
@zeripath
Copy link
Contributor Author

So before you merge this, there are a couple of questions that we should think about:

  1. Should we really be caching these incompletely read objects? - Could the object files be changed under our feet by a repack etc? Could you end up with a broken object that it would not be easy to get a gogit repository to clean out from the cache? I don't know go-git's architecture completely so I'm not certain if that's possible.
  2. Is the 1Mb limit too low for the delta'd objects? It feels appropriate for the raw object db cases as those are simple file reads - whereas a delta'd object is a little more involved and perhaps the limit should be raised to account for the complexity?

Many thanks for your work on this

@zeripath
Copy link
Contributor Author

Sorry I've taken so long to answer my own questions here.

  1. Should we really be caching these incompletely read objects? - Could the object files be changed under our feet by a repack etc? Could you end up with a broken object that it would not be easy to get a gogit repository to clean out from the cache? I don't know go-git's architecture completely so I'm not certain if that's possible.

This is fine - it's only when you get the reader that you actually do the reading.

  1. Is the 1Mb limit too low for the delta'd objects? It feels appropriate for the raw object db cases as those are simple file reads
  • whereas a delta'd object is a little more involved and perhaps the limit should be raised to account for the complexity?

This is also probably fine.

@mcuadros mcuadros changed the title Prevent large objects from being read into memory completely plumbing: format/packfile, prevent large objects from being read into memory completely May 12, 2021
@mcuadros mcuadros merged commit 720c192 into go-git:master May 12, 2021
@zeripath zeripath deleted the no-large-packread branch May 12, 2021 21:05
zeripath added a commit to zeripath/go-git that referenced this pull request May 27, 2021
Unfortunately the code in go-git#303 was incorrect for large objects in OFS deltas.

The underlying fault is that the seeker in the packfile was seeking away whilst the
delta was being read.

This PR simply reads the delta into a buffer.

Fix go-git#323

Signed-off-by: Andrew Thornton <art27@cantab.net>
mcuadros pushed a commit that referenced this pull request Jun 2, 2021
…ead into memory completely (#303)" (#329)

This reverts commit 720c192.
traidare pushed a commit to traidare/go-git that referenced this pull request Oct 26, 2024
… memory completely (go-git#303)

This PR adds code to prevent large objects from being read into memory from packfiles or the filesystem.

Objects greater than 1Mb are now no longer directly stored in the cache
or read completely into memory.

Signed-off-by: Andrew Thornton <art27@cantab.net>
traidare pushed a commit to traidare/go-git that referenced this pull request Oct 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants