Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store traces in a tree form rather than a list so span can be calcula… #658

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

spall
Copy link

@spall spall commented Mar 19, 2019

I added the TTree, TForest, PtTree, PtForest data types to maintain the ordering and relationship of the traced commands to accurately calculate span. The original list is also maintained, although maybe calculating it when necessary is better.

Additionally to calculating span I think this could be useful for future build system analysis.

I look forward to your comment on if you think this is worthwhile or if there is a better way to implement it.

Note: this did pass the tests on my Ubuntu machine

@ndmitchell
Copy link
Owner

Thanks for the PR @spall. What's the desire behind this PR? What is the advantage of storing traces in a tree? What can we do now?

@spall
Copy link
Author

spall commented Mar 19, 2019 via email

@ndmitchell
Copy link
Owner

Cool project. Yes, I've been investigating things, writeup at https://neilmitchell.blogspot.com/2019/03/ghc-rebuild-times-shake-profiling.html, and everything that was used to do that is entirely public. The actual code that does it is in html/ts - in particular https://github.com/ndmitchell/shake/blob/master/html/ts/reports/summary.tsx#L44-L136.

My guess is that most build systems don't call parallel, e.g. it's entirely absent in Hadrian, so it won't have a significant effect. As a result, I'm not sure that storing the detail at that level of granularity helps. I also based my calculations on the depends field and the execution time - rather assuming (as a slight simplification) that the build proceeds a a series of need followed by the bulk of the execution. To get my precise calculation I did need to extend depends from being a set of dependencies to a list of sets of dependencies as it really is in the core of Shake. Previously profiling blurred that information away.

Is this project meant to be a significant part of your PhD? Or more just a side project? Is the Shake aspect an important one? If it's going to be a big part, it may be worth us having a video-chat to figure out what you're trying to do and seeing how it all fits together. I've just finished a big bit of work on profiling, but I'm not particularly planning to do much work on it for a while now.

@spall
Copy link
Author

spall commented Mar 20, 2019 via email

@ndmitchell
Copy link
Owner

Shake has need which demands a number of dependencies in parallel. It encodes that in the database with the dependencies being an ordered list of sets. As a result, very few people use explicit parallel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants