-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The new implementation is a bit slower than the old #73
Comments
Seeing the same as described. Using 1.1.2 $ time tartufo --exclude-paths .tartufo/exclude_paths.txt .
Results have been saved in /var/folders/w5/dkpm7qm95s3gbtffn66fwxb00000gn/T/tmp7sodkfkj
tartufo --exclude-paths .tartufo/exclude_paths.txt . 16.10s user 11.53s system 97% cpu 28.389 total Using 2.0.0 $ time tartufo scan-local-repo .
tartufo scan-local-repo . 266.86s user 206.12s system 99% cpu 7:55.45 total |
@bsarrazin I hate to ask you to spend more time running these, but do you know if those timings are consistent, with minimal potential outside influences? (Other heavy processes running on the system) |
I tried this against the tartufo repo itself on my (physical) Debian system, just for grins. Basically the same behavior, and the 2.0.1 times look pretty stable: 1.1.2: $ time tartufo
[...spewage clipped...]
real 0m8.190s
user 0m4.614s
sys 0m1.756s 2.0.1: $ time tartufo scan-local-repo .
real 0m14.562s
user 0m7.017s
sys 0m4.012s
$ time tartufo scan-local-repo .
real 0m15.262s
user 0m7.182s
sys 0m3.919s
$ time tartufo scan-local-repo .
real 0m15.074s
user 0m7.151s
sys 0m3.983s |
Well I'm glad to see that you're experiencing similar deltas to what I was, @rscottbailey. But I'm very concerned by the deltas that @bsarrazin is seeing. I would love to try to isolate what exactly is causing this, and why it would be so much worse on a different repo. I haven't had much experience myself with Python profiling tools, but I will try to take some time to dig in. I know |
Hmm. I don't know if he's looking at a different (larger) repository or running on a dog-slow system, but i'm still seeing pretty much the 2x elapsed/user/system bloat that was originally reported. I don't know if this is valuable or not (clipped after the first few dozen lines):
|
One thing I'm seeing here and from my own profiling is that a lot of this appears to be in the 1.x has: [[package]]
category = "main"
description = "Python Git Library"
name = "gitpython"
optional = false
python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*"
version = "2.1.15"
[package.dependencies]
gitdb2 = ">=2,<3" While 2.x has: [[package]]
name = "gitpython"
version = "3.1.9"
description = "Python Git Library"
category = "main"
optional = false
python-versions = ">=3.4"
[package.dependencies]
gitdb = ">=4.0.1,<5" Is it possible that the performance changes come from those libraries? (He says, hopefully) If so, perhaps it's time to look into #66. |
Well, I built fresh venvs for both versions just now - sitting on top of Linux with python 3.8.6, so I have no library skew (pip reports the same versions of everything except tartufo itself) and I'm not worrying about Windows. For rough comparison above, here's a 1.1.2 profile:
|
One thing that strikes me is that v2 is doing ~2.4M primitive calls vs v1's ~1.9M calls, even though v1 is doing much more work to actually spew all of the findings to stdout |
I am running this on an almost brand new MacBook Pro 16", fresh installation of macOS from 3 weeks ago, running using pyenv in python 3.8.2. We're scanning the iOS repo for GoDaddy Conversations, ran it again this morning after a reboot, not much running except Mail, Safari, iTerm2, Slack, watching an MxPx show: tartufo 1.1.2$ time tartufo --exclude-paths .tartufo/exclude_paths.txt .
15.42s user
11.90s system
97% cpu 27.949 total tartufo 2.0.1$ time tartufo --regex --entropy --exclude-paths .tartufo/exclude_paths.txt scan-local-repo .
242.55s user
184.42s system
99% cpu 7:10.09 total |
One thought that crossed my mind here -- can you retry the 1.1.2 run, adding |
@rscottbailey just tried with regex time tartufo --regex --exclude-paths .tartufo/exclude_paths.txt .
17.53s user
12.63s system
97% cpu 30.793 total |
@bsarrazin Can you share your repo URL? (Assuming it's on GHE) Hoping to poke at tartufo a bit more without deluging you with a bunch of "what about this?" variant requests... Send to me on slack (rbailey) if that's easier. |
URL sent by Slack |
Doing a little bit of shotgun debugging here and checking dependencies, there doesn't seem to be any big difference: > poetry add "GitPython<3.0.0"
• Removing gitdb (4.0.5)
• Installing smmap2 (3.0.1)
• Installing gitdb2 (2.0.6)
• Updating gitpython (3.1.9 -> 2.1.15)
> time tartufo scan-local-repo . && time tartufo scan-local-repo . && time tartufo scan-local-repo . && time tartufo scan-local-repo . && time tartufo scan-local-repo .
tartufo scan-local-repo . 6.73s user 13.35s system 30% cpu 1:06.74 total
tartufo scan-local-repo . 6.72s user 13.69s system 24% cpu 1:21.76 total
tartufo scan-local-repo . 6.87s user 13.76s system 25% cpu 1:19.96 total
tartufo scan-local-repo . 6.88s user 13.85s system 25% cpu 1:21.10 total
tartufo scan-local-repo . 6.86s user 13.76s system 25% cpu 1:22.43 total > poetry add "GitPython@latest"
• Removing gitdb2 (2.0.6)
• Removing smmap2 (3.0.1)
• Installing gitdb (4.0.5)
• Updating gitpython (2.1.15 -> 3.1.9)
> time tartufo scan-local-repo . && time tartufo scan-local-repo . && time tartufo scan-local-repo . && time tartufo scan-local-repo . && time tartufo scan-local-repo .
tartufo scan-local-repo . 7.84s user 16.72s system 29% cpu 1:22.54 total
tartufo scan-local-repo . 7.92s user 16.76s system 28% cpu 1:27.02 total
tartufo scan-local-repo . 7.93s user 16.78s system 28% cpu 1:25.29 total
tartufo scan-local-repo . 7.89s user 16.74s system 28% cpu 1:25.91 total
tartufo scan-local-repo . 7.90s user 16.77s system 29% cpu 1:23.60 total Vs a baseline with v1.1.2 > time tartufo --repo-path .
tartufo --repo-path . 4.57s user 7.54s system 55% cpu 21.670 total This is a pretty significant increase I'm seeing against this project's own codebase now. One other thing I'm curious about is, could this be related to the signature-based exclusions? We are now calculating a hash based on every match found and comparing that against a list of exclusions. While this is using BLAKE2, which should be quite fast, it is definitely a >0 cost. |
I commented out the calls to > time tartufo scan-local-repo .
# Repeated by hand because issues pop up when signature checking is missing
tartufo scan-local-repo . 7.49s user 15.46s system 34% cpu 1:05.89 total
tartufo scan-local-repo . 7.78s user 16.51s system 29% cpu 1:22.78 total
tartufo scan-local-repo . 7.78s user 16.43s system 30% cpu 1:20.52 total
tartufo scan-local-repo . 7.81s user 16.41s system 31% cpu 1:18.10 total
tartufo scan-local-repo . 7.76s user 16.09s system 29% cpu 1:19.81 total |
I believe this is what's wrong: #119 |
Using strace on tartufo, filtered on "^open"
Shows tartufo (gitpython?) absolutely hammering the exact same files over and over again. In my testing with @djlarsu we found that the number of syscalls that tartufo does in 2.x is massively higher than the number in 1.x. I wonder if we aren't accidentally recreating / reloading the references in the scan loop somehow. |
Okay so here's the key data:
Tartufo v1.1.2 did 65,000 file opens to examine this repo. |
There are 1,532 files in this repo, and the size of the repo is 37MB. There are 1,308 revisions / commits on all branches.
|
Fixed in #233 |
🐛 Bug Report
When self-scanning
tartufo
, I found that it was taking a bit longer (usingtime tartufo
to measure) to run a full scan. In general, it changed from about 13 seconds to about 20 seconds. I did some preliminary testing and wasn't able to find any obvious reasons why. It'd be good to go through at some point with a fine-toothed profiling comb and see what can be optimized in here.On a fun note, I found that the
@lru_cache
decorators I used did shave about a second off the run time! Yay!It's worth mentioning that I built the new implementation with the intention of optimizing for memory rather than speed, using things such as generators etc. I don't know if some of this ended up being detrimental to the speed. But may be worth investigating.
To Reproduce
Run the following commands and you'll be able to see the discrepancy.
Expected Behavior
The new version should be a similar speed, if not faster. Certainly not nearly twice as slow.
The text was updated successfully, but these errors were encountered: