-
Notifications
You must be signed in to change notification settings - Fork 1k
[WIP] VerifyDepTree and DigestFromDirectory use godirwalk #1084
Conversation
internal/gps/pkgtree/digest.go
Outdated
case modeType&os.ModeNamedPipe != 0: | ||
return nil // nothing more to do for this type | ||
case modeType&os.ModeSocket != 0: | ||
return nil // nothing more to do for this type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this block of cases be simplified a bit to something like this:
case modeType&(os.ModeDir|os.ModeDevice|os.ModeCharDevice|os.ModeNamedPipe|os.ModeSocket) != 0:
return nil // nothing more to do for these types
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Close:
case modeType&(os.ModeDir|os.ModeDevice|os.ModeCharDevice|os.ModeNamedPipe|os.ModeSocket) != 0:
Pushing update, and thanks!
I'd like to hold off on merging this just yet. |
ready for review now, though?
…On 08/30 19:33, Karrick McDermott wrote:
I'd like to hold off on merging this just yet.
--
You are receiving this because your review was requested.
Reply to this email directly or view it on GitHub:
#1084 (comment)
|
It looks like the speedup on Mac is significantly more than on linux. Here, walking a large directory tree is as fast as doing so with hostname:walk-fast karrick$ du -sh ~/src
26G /Users/karrick/src
hostname:walk-fast karrick$ for i in 1 2 3 ; do time walk-fast ~/src | wc -l ; done
542946
real 0m6.048s
user 0m1.279s
sys 0m2.667s
542946
real 0m6.022s
user 0m1.267s
sys 0m2.633s
542946
real 0m6.079s
user 0m1.278s
sys 0m2.682s
hostname:walk-fast karrick$ for i in 1 2 3 ; do time walk-stdlib ~/src | wc -l ; done
542946
real 0m29.369s
user 0m1.994s
sys 0m24.758s
542946
real 0m29.139s
user 0m1.987s
sys 0m25.127s
542946
real 0m29.092s
user 0m2.010s
sys 0m24.917s
hostname:walk-fast karrick$ for i in 1 2 3 ; do time find ~/src | wc -l ; done
542946
real 0m5.979s
user 0m0.310s
sys 0m2.724s
542946
real 0m5.823s
user 0m0.313s
sys 0m2.578s
542946
real 0m5.723s
user 0m0.307s
sys 0m2.564s |
When compared on Linux: [karrick@linuxBox walk-fast]$ du -sh ~/src
36G /home/karrick/src
[karrick@linuxBox walk-fast]$ for i in 1 2 3 ; do time walk-fast ~/src | wc -l ; done
829481
real 0m5.443s
user 0m4.246s
sys 0m3.698s
829481
real 0m5.415s
user 0m4.006s
sys 0m3.932s
829481
real 0m5.433s
user 0m4.418s
sys 0m3.529s
[karrick@linuxBox walk-fast]$ for i in 1 2 3 ; do time walk-stdlib ~/src | wc -l ; done
829481
real 0m8.298s
user 0m4.789s
sys 0m6.034s
829481
real 0m8.442s
user 0m4.835s
sys 0m6.144s
829481
real 0m8.352s
user 0m4.759s
sys 0m6.084s
[karrick@linuxBox walk-fast]$ for i in 1 2 3 ; do time find ~/src | wc -l ; done
829481
real 0m1.492s
user 0m0.669s
sys 0m0.921s
829481
real 0m1.474s
user 0m0.698s
sys 0m0.873s
829481
real 0m1.495s
user 0m0.636s
sys 0m0.949s |
The changes to Earlier today I was running some benchmarks, and I had found a bug in I would appreciate any thoughts on |
On Windows, walk-fast runs in half the time that walk-stdlib runs, by the way. |
I have fixed the bug, but would appreciate any extra eyeballs on the code, and thoughts folks might have. |
It's interesting to note that while However, using
|
Okay, the previous benchmark was created by traversing the As expected, when I benchmark using On RHEL 6.x
On macOS Sierra
|
* found by megacheck :)
Rebased my fork due to upstream changes... |
gonna try to get on reviewing this a bit later this week - sorry, taken up with case sensitivity stuff rn. |
Case insensitivity... Not fun. I'm happy to help out on this project still. And looking forward to getting the digest sum added to the |
It looks like the BoltDB tests are failing, causing the builds to not pass. |
The tests that use the bolt db cache are passing on localhost, so it might be a race condition in |
I have performed a fresh checkout on upstream on my Windows box, and running the bolt db tests, and sometimes they pass, and sometimes they fail. |
Restarted jobs |
* The setup and validation for all the importer testcases are the same, they just need to handle calling the importer properly. * Flag importer tests to run in parallel now that they've been cleaned up.
The closure created by t.Run wasn't copying the testcase value, so the test was being run on the wrong values.
When a lock hint is empty, nothing should be added to the lock
Fixes an error when the git user is not set.
Use the test helper so that it can check for any problems up-front.
Disable until we can figure out why this is causing the following error on the AppVeyor(Windows) builds: "remote repository at https://github.com/carolynvs/deptest-importers does not exist, or is inaccessible"
So there's good news and bad news. 👍 The good news is that everyone that needs to sign a CLA (the pull request submitter and all commit authors) have done so. Everything is all good there. 😕 The bad news is that it appears that one or more commits were authored by someone other than the pull request submitter. We need to confirm that they're okay with their commits being contributed to this project. Please have them confirm that here in the pull request. Note to project maintainer: This is a terminal state, meaning the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just some small things.
are there any notable differences between the previous internal implementation and what's in godirwalk? i don't think there are, and i'd assume you'd have mentioned some, but making sure 😄
} | ||
} | ||
|
||
const flameIterations = 100000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you explain the benefit of having a fixed number of iterations, rather than relying (as BenchmarkDigestFromDirectory()
does) on the testing system to either a) empirically determine an appropriate threshold and/or b) allow the user to control it via CLI args?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When comparing flame graphs, I want to compare how many cycles were spent in the corresponding functions for the same input. Those numbers are skewed if we run version A 500 times, and version B only 100 times. Therefore, forcing the iteration count to a particular number ensures an apples to apples comparison.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gotcha, wfm 👍
@@ -0,0 +1,103 @@ | |||
package pkgtree |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need a copyright header (ugh, this should've caused our travis tests to fail...).
also, for consistency, filename should be snaky, not camel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually prefer snaky filenames, or even better with hyphens, but most other Go projects use camel case for file names. Is there something I don't know about, or just a convention localized to dep
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's the uniform approach in both stdlib and the toolchain, afaik. that's the convention we're aiming to follow.
Somehow rebasing from upstream/master and pushing to my fork causes some problems. I will take a look at this later. |
do they really? huh. that's not been my subjective experience.
only standard I'm really following here is stdlib/the toolchain.
…On September 15, 2017 6:53:34 PM EDT, Karrick McDermott ***@***.***> wrote:
karrick commented on this pull request.
> @@ -0,0 +1,103 @@
+package pkgtree
I actually prefer snaky filenames, or even better with hyphens, but
most other Go projects use camel case for file names. Is there
something I don't know about, or just a convention localized to `dep`?
--
You are receiving this because you modified the open/close state.
Reply to this email directly or view it on GitHub:
#1084 (comment)
|
I don't mind renaming the files, however I've been on business trip this past week and procrastinating the experience of understanding what I did wrong when I attempted my last git rebase. Thanks for your patience. |
yeah, seems like we've got some rebase weirdness. i'm happy to help you sort out a workable process - here, or pinging me on slack works, as well 😄 |
would it be easier to close this out and open a fresh PR? |
i'm going to close this out and open a fresh PR once i have a chance to get back to this. |
What does this do / why do we need it?
Alters digest code to use godirwalk project. godirwalk handles symlinked directories on Windows correctly, and is faster than filepath.Walk on both unix and Windows.
What should your reviewer look out for in this PR?
Do you need help or clarification on anything?
I recall @sdboyer mentioning that there were some tests in the other portions of dep that were not working on Windows. I wonder whether those are related to the errant handling of symlinks to directories that happens with filepath on Windows, and whether godirwalk could address that.
Which issue(s) does this PR fix?
Related to #121.