Skip to content
This repository has been archived by the owner on Sep 11, 2020. It is now read-only.

plumbing: packfile, apply small object reading optimization also for delta objects #1121

Merged
merged 12 commits into from
Apr 24, 2019

Conversation

filipnavara
Copy link
Contributor

In #963 I addressed issue with reading many small objects from a pack file. I originally opted not to apply the optimization to delta objects. It turns out that when traversing a commit history the tree objects are often small and delta encoded. This extends the original optimization to handle this case too. Tested on https://github.com/rails/rails repository and it improves the complete history traversal time by ~20%.

…delta objects

Signed-off-by: Filip Navara <navara@emclient.com>
Signed-off-by: Filip Navara <navara@emclient.com>
…ting new one

Signed-off-by: Filip Navara <filip.navara@gmail.com>
@jfontan
Copy link
Contributor

jfontan commented Apr 22, 2019

I'm investigating a decrease in speed when using these changes it gitbase.

When iterating all trees with a cold cache it's faster with these changes (35s vs 25s). We keep the cache and in a second run the previous go-git code is faster (16s vs 25s). It seems the cache is not being filled correctly and in one profile I can see that Packfile.objectAtOffset uses 3.13s an with these changes it uses 11.08s, most of it in recreating again the objects that should be in cache.

@filipnavara
Copy link
Contributor Author

There's couple of flaws I didn't realize when coding this.

Firstly, it slightly regresses scenario where someone was accessing the small delta objects only to check the header. Now it falls back to decoding the content which can be comparatively expensive. This notably regresses packfile.GetByType iterations and by proxy the high level methods like CommitObjects. Not only it will read more data from the file it can end up loading enough to cause cache spills.

Secondly, I assumed that if the delta size is small and the expanded size is small then it should not be a big issue to load the base object and just apply the delta. The flaw is that the delta object can reference huge base object but only reuse a small part of it.

I have ideas on how to fix both of the issues without introducing new regressions. It will require significant reworking of the logic and a lot of testing though.

… plumbing.AnyObject

Signed-off-by: Filip Navara <filip.navara@gmail.com>
@filipnavara filipnavara changed the title plumbing: packfile, apply small object reading optimization also for delta objects WIP: plumbing: packfile, apply small object reading optimization also for delta objects Apr 23, 2019
Signed-off-by: Filip Navara <filip.navara@gmail.com>
plumbing/format/packfile/packfile.go Outdated Show resolved Hide resolved
plumbing/format/packfile/packfile.go Outdated Show resolved Hide resolved
Signed-off-by: Filip Navara <filip.navara@gmail.com>
@filipnavara filipnavara changed the title WIP: plumbing: packfile, apply small object reading optimization also for delta objects plumbing: packfile, apply small object reading optimization also for delta objects Apr 23, 2019
@jfontan
Copy link
Contributor

jfontan commented Apr 23, 2019

After more investigation I've found that the iterator is not using the cache to retrieve objects so delta objects are regenerated even when they are already in the cache. After enabling it the performance increases but triggers a bug I am investigating.

@filipnavara
Copy link
Contributor Author

After more investigation I've found that the iterator is not using the cache to retrieve objects so delta objects are regenerated even when they are already in the cache.

I forgot to fix the "cold" code path in the iterator to try the cache.

Note that if offsetToType is kept around (ie. packfiles are reused) it will hit the cache properly because it will always find the offset in this map and then either skip over the object or load it through GetByOffset which is checking the cache.

Signed-off-by: Filip Navara <navara@emclient.com>
Signed-off-by: Filip Navara <navara@emclient.com>
@filipnavara
Copy link
Contributor Author

filipnavara commented Apr 23, 2019

After enabling it the performance increases but triggers a bug I am investigating.

I see... I will look into it tomorrow :-)

Some of the tests don't handle errors properly. Thus the underlying error ("invalid delta") crashes them hard. For some reason already undeltified data get to PatchDelta as input.

@filipnavara
Copy link
Contributor Author

I think I figured out what's wrong... getObjectType in the iterator code can seek and thus the next getNextObject call reads wrong data.

…osition used by getNextObject in the subsequent statement

Signed-off-by: Filip Navara <navara@emclient.com>
… in FSObject

Signed-off-by: Filip Navara <navara@emclient.com>
Signed-off-by: Filip Navara <navara@emclient.com>
…objects

Signed-off-by: Filip Navara <navara@emclient.com>
@jfontan
Copy link
Contributor

jfontan commented Apr 24, 2019

Thanks!

@mcuadros Now this patch results in better performance in all my tests.

@mcuadros mcuadros merged commit 44a20de into src-d:master Apr 24, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants