Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stackprof: weight on-cpu samples by period rather than timestamp delta #425

Merged
merged 5 commits into from
Jun 18, 2023

Conversation

manuelfelipe
Copy link
Contributor

This attempts to improve the quality of the on-CPU profiles stackprof provides. Rather than weighing samples by their timestamp deltas, which, in our opinion, are only valid in wall-clock mode, this weighs callchains by:

S = number of samples
P = sample period in nanoseconds

W = S * P

The difference after this change is quite substantial, specially in profiles that previously were showing up with heavy IO frames:

  • Total profile weight is almost down by 90%, which actually makes sense for an on-CPU profile if the app is relatively idle
  • Certain callchains that blocked in syscalls / IO are now much lower weight. This was what I was expecting to find.
    Here is an example of the latter point.

In delta mode, we see an io select taking a long time, it is a significant portion of the profile:

236936508-709bee01-d616-4246-ba74-ab004331dcd3

But in period scaling mode, it is only a couple of sample periods ultimately:

236936693-9d44304e-a1c2-4906-b3c8-50e19e6f9f27

@jlfwong
Copy link
Owner

jlfwong commented Jun 6, 2023

Hi @manuelfelipe!

This seems reasonable to me! Can you please include tests which would catch changes to this behaviour were it to break. I think this would mean including a new profile with the cpu mode set in https://github.com/jlfwong/speedscope/blob/main/src/import/stackprof.test.ts

@coveralls
Copy link

coveralls commented Jun 6, 2023

Coverage Status

coverage: 42.146% (-0.01%) from 42.16% when pulling 2b92d51 on dalehamel:stackprof-cpu-weight-fixes-upstream into e9133be on jlfwong:main.

@manuelfelipe
Copy link
Contributor Author

This seems reasonable to me! Can you please include tests which would catch changes to this behaviour were it to break. I think this would mean including a new profile with the cpu mode set in https://github.com/jlfwong/speedscope/blob/main/src/import/stackprof.test.ts

Alright, added a sample profile along with a test asserting for the desired behaviour in 7ad10a1

On master, it fails (as expected) with:

  ● importCpuProfileWithProperWeights › uses samples count for weight when importing cpu profile

    expect(received).toBe(expected) // Object.is equality

    Expected: 489000
    Received: 6650772

      28 |       if (profile) {
      29 |         expect(profile.getWeightUnit()).toBe('microseconds')
    > 30 |         expect(profile.getTotalWeight()).toBe(489000)
         |                                          ^
      31 |       }
      32 |     }
      33 |   })

Also added couple of changes to the simple ruby app to allow changing the mod via an arg value, and some sleep time to showcase how weight is influenced when capturing cpu profiles

Before this change:
Screenshot 2023-06-07 at 10 11 23 AM

After

Screenshot 2023-06-07 at 10 11 52 AM

@jlfwong
Copy link
Owner

jlfwong commented Jun 15, 2023

Thanks for adding the test and ensuring that it fails on the main branch!

Looks like there are failing test runs & a merge conflict that needs resolving before this can land.

@manuelfelipe manuelfelipe force-pushed the stackprof-cpu-weight-fixes-upstream branch from 7ad10a1 to 2b92d51 Compare June 15, 2023 14:17
@manuelfelipe
Copy link
Contributor Author

thanks @jlfwong. Rebased base on latest changes on main, and re-ran run jest -- -u in 2b92d51 to try and get CI happy.

Not sure I fully understand what is the actual thing failing failure in https://github.com/jlfwong/speedscope/actions/runs/5201005116/jobs/9543115054. Locally tests are passing, so hoping the thing that was causing the issue are the outdated snapshots.

@dalehamel
Copy link

FYI @jlfwong I enabled actions on my fork, Manuel pushed to it, and it passes there https://github.com/dalehamel/speedscope/actions/runs/5280170594 (since we cannot run workflows here, this seemed the most expedient way).

@jlfwong jlfwong merged commit 741fdeb into jlfwong:main Jun 18, 2023
@jlfwong
Copy link
Owner

jlfwong commented Jun 18, 2023

Thanks! This will go out with the next release

@jlfwong
Copy link
Owner

jlfwong commented Jun 22, 2023

This is now live on https://speedscope.app and published to npm as part of v1.15.2. Thanks for your contribution!

jackerghan pushed a commit to jackerghan/speedscope that referenced this pull request Jul 28, 2023
jlfwong#425)

This attempts to improve the quality of the on-CPU profiles stackprof provides. Rather than weighing samples by their timestamp deltas, which, in our opinion, are only valid in wall-clock mode, this weighs callchains by:


```
S = number of samples
P = sample period in nanoseconds

W = S * P
```

The difference after this change is quite substantial, specially in profiles that previously were showing up with heavy IO frames:  

* Total profile weight is almost down by 90%, which actually makes sense for an on-CPU profile if the app is relatively idle
* Certain callchains that blocked in syscalls / IO are now much lower weight. This was what I was expecting to find.
Here is an example of the latter point.

In delta mode, we see an io select taking a long time, it is a significant portion of the profile:

<img width="1100" alt="236936508-709bee01-d616-4246-ba74-ab004331dcd3" src="https://github.com/dalehamel/speedscope/assets/4398256/39140f1e-50a9-4f33-8a61-ec98b6273fd4">

But in period scaling mode, it is only a couple of sample periods ultimately:

<img width="206" alt="236936693-9d44304e-a1c2-4906-b3c8-50e19e6f9f27" src="https://github.com/dalehamel/speedscope/assets/4398256/7d19077f-ef25-4d79-980b-cfa1775d928d">
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants