Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duration seems to depend on the number of processes, not seconds. Or on threads... #33

Closed
mborus opened this issue Sep 12, 2018 · 4 comments

Comments

@mborus
Copy link

mborus commented Sep 12, 2018

I'm testing the flamegraph for a minute on PID 502 calling

py-spy -p 502 -d 60 -f /tmp/flame

on a Linux CentOS 5 machine,

I get a progress bar that only moves slowly.
I'm using the default sample rate of 1000.

It seems that duration seconds doesn't consider cases where
there's less samples than the maximum defined. Or see 2nd
theory below...

Output:
Sampling process 1000 times a second for 60 seconds
████████████████████████████████████████████████████████████████████████████████████████████████████████████████ 60000/60000
Wrote flame graph '/tmp/flame'. Samples: 60000 Errors: 0

Also in this scenario, the flamegraph might be wrong/confusing,
The generated svg file has a first line
Function all: (255,466 samples 100%)

Which is inconsistent with the collected 60000 samples.

The next line shows " /main.py:25)" at 59999 samples
and the rest at "_bootstrap" (threading.py:884) 195,457 samples,
so it's possible that the default 1000 samples a second are taken,
but those from threads don't count towards the duration.

The job ran about 4 minutes, so that would work out.

@mborus mborus changed the title Duration seems to depend on the number of processes, not seconds Duration seems to depend on the number of processes, not seconds. Or on threads... Sep 12, 2018
@benfred
Copy link
Owner

benfred commented Sep 14, 2018

Weird. Setting the duration flag seems to work appropiately for me:

Sampling process 1000 times a second for 10 seconds
█████████████████████████████████████████████████████████████████████ 10000/10000
Wrote flame graph 'd.svg'. Samples: 10000 Errors: 0

real	0m10.170s
user	0m1.172s
sys	0m1.729s

I'm wondering if what's happening here is that we're sampling slower than 1K times a second for some reason, and it isn't keeping up. I'm going to add some code to detect if this happens and write out an error message in this case.

Can you try again with setting a lower rate just to verify? Maybe --rate 100 or something like that?

benfred added a commit that referenced this issue Sep 15, 2018
At high rates, it's quite possible that we won't be able to sample the
python process fast enough. This could lead to inaccurate results,
and also might lead to issues like #33
. Fix by warning if we aren't keeping up in sampling.

Also tweak the default sampling rate to be lower (200). While I can sample
at around 10K samples a second on any machine I've tested this should help
here too.
@benfred
Copy link
Owner

benfred commented Sep 15, 2018

The 0.1.6 release has code to warn if we aren't sampling fast enough - which I think was the issue here

@mborus
Copy link
Author

mborus commented Sep 15, 2018

I made an update and now get the warning on the sample rate 1000 like this
"24.68s behind in sampling, results may be inaccurate. Try reducing the sampling rate."
and no warning on the sample rate 100 (speed is OK, too)

@benfred
Copy link
Owner

benfred commented Sep 16, 2018

thanks for letting me know! It's weird that the max sampling rate is so low for your system (I have no problem on most machines up to 10K samples a second), but I'm glad the problem is sorted out.

@benfred benfred closed this as completed Sep 16, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants