-
Notifications
You must be signed in to change notification settings - Fork 698
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lazily decode cache files for checking invalidation #7516
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
The failure is because the cf: https://downloads.haskell.org/~ghc/7.6.3/docs/html/libraries/binary-0.5.1.1/Data-Binary-Get.html Any suggestions the best way to deal with this? |
Oh, hrm. It has an incremental function |
You could move structuredDecodeTriple to cabal-install, which does not support ghc 7.6 |
Great suggestion @fgaz, done! |
I wonder if writing out of the cache files can be done on a separate (low priority) thread top speed things up. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've bitched about whitespace, but otherwise it looks great.
I thought about that too. We don't really use separate threads for IO ops elsewhere in cabal afaik, so I didn't know about setting a precedent. I'd think we'd need some slightly careful architecture to make sure that reading and writing didn't step on one another, etc., so not sure about the tradeoff between complexity and a constant win in some large cases. |
Yes, also, a user trying to exit before writing is finished, while otherwise cabal seems to have done it's job and is just hanging, would deserve a warning and a confirmation, which again adds to complexity of the solution. If cabal always keeps its caches in memory and so never reads caches that it writes in the same session, that would simplify things. OTOH, this makes cabal use more memory than it would otherwise. BTW, the CI broke on some timeout, so I'd ignore it. (edit: and merge) |
@Mergifyio backport 3.6 |
Command
|
…7537) * lazily decode cache files for checking invalidation (cherry picked from commit 3dcfe27) * Update Structured.hs (cherry picked from commit 5a4290c) * move structuredDecodeTriple to cabal-install (cherry picked from commit c1d5d4f) * fix type signatures (cherry picked from commit 7e30fd9) # Conflicts: # cabal-install/src/Distribution/Client/FileMonitor.hs * fix whitespace (cherry picked from commit f8bdd7f) * Update FileMonitor.hs Co-authored-by: Gershom Bazerman <gershom@arista.com> Co-authored-by: gbaz <gershomb@gmail.com>
This yields a significant (15% ?) speedup on rebuilding build plans for projects with lots of individual cabal packages. (As in https://github.com/peterbecich/cabal-resolver-issue which is a repro for #7466). In such cases the cache files can grow quite large (up to approx 30M).
The way the old filemonitor / caching system worked, it would first read and fully parse each cache file, and then check it to see if it was invalidated (an operation that only involved the header portion at the beginning of the file). When files are small, the full parse isn't noticeable. But as files grow large, this parse can become quite expensive.
This changes the deserialization to go in two steps -- first parse the header info, and check for invalidation. Then, only if the cache is valid, proceed to deserialize the remaining (potentially large) serialized value. Testing reveals that this removes deserialization as a noticeable cost center immediately -- dropping it to about 3% of total time.
(note: the writing out of the cache files is still about 10% of total time, but that seems fairly unavoidable)