Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

command/operator_debug: add pprof interval #11938

Merged
merged 10 commits into from
Apr 4, 2022

Conversation

danishprakash
Copy link
Contributor

Closes #11514

Copy link
Member

@tgross tgross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @danishprakash! Overall this is looking pretty good. I've left some comments, but I'm also going to rope in @davemay99 for a review as he's kind of the internal "owner" of this feature.

command/operator_debug.go Outdated Show resolved Hide resolved
command/operator_debug.go Outdated Show resolved Hide resolved
command/operator_debug.go Outdated Show resolved Hide resolved
command/operator_debug.go Outdated Show resolved Hide resolved
command/operator_debug.go Outdated Show resolved Hide resolved
@tgross tgross requested a review from davemay99 January 26, 2022 13:54
@vercel vercel bot temporarily deployed to Preview – nomad February 5, 2022 13:30 Inactive
@vercel vercel bot temporarily deployed to Preview – nomad February 5, 2022 13:32 Inactive
@vercel vercel bot temporarily deployed to Preview – nomad February 5, 2022 14:05 Inactive
@tgross tgross added this to Needs Triage in Nomad - Community Issues Triage via automation Mar 7, 2022
@tgross tgross moved this from Needs Triage to In Progress in Nomad - Community Issues Triage Mar 7, 2022
@tgross tgross self-assigned this Mar 7, 2022
command/operator_debug.go Show resolved Hide resolved
@hashicorp-cla
Copy link

hashicorp-cla commented Mar 12, 2022

CLA assistant check
All committers have signed the CLA.

@Amier3
Copy link
Contributor

Amier3 commented Apr 1, 2022

Hey @danishprakash

Just checking in to see if you're still interested in working on this. We think this'd be a great change and would love to see this get over the hump 😄

@danishprakash
Copy link
Contributor Author

Hey @Amier3, I missed following up on this, I've pushed some updates, will try to close this soon :) Thanks for the heads up.

danishprakash and others added 4 commits April 4, 2022 09:54
Signed-off-by: danishprakash <grafitykoncept@gmail.com>
Signed-off-by: danishprakash <grafitykoncept@gmail.com>
Signed-off-by: danishprakash <grafitykoncept@gmail.com>
@vercel vercel bot temporarily deployed to Preview – nomad April 4, 2022 15:10 Inactive
@tgross tgross force-pushed the operator-debug-pprof-interval branch from 23f26fb to 69a473f Compare April 4, 2022 19:05
@vercel vercel bot temporarily deployed to Preview – nomad April 4, 2022 19:06 Inactive
@tgross tgross merged commit ff6ae5f into hashicorp:main Apr 4, 2022
Nomad - Community Issues Triage automation moved this from In Progress to Done Apr 4, 2022
tgross added a commit that referenced this pull request Apr 7, 2022
We introduced a `pprof-interval` argument to `operator debug` in #11938, and unfortunately this has resulted in a lot of test flakes. The actual command in use is mostly fine (although I've fixed some quirks here), so what's really happened is that the change has revealed some existing issues in the tests. Summary of changes:

* Make first pprof collection synchronous to preserve the existing
  behavior for the common case where the pprof interval matches the
  duration.

* Clamp `operator debug` pprof timing to that of the command. The
  `pprof-duration` should be no more than `duration` and the
  `pprof-interval` should be no more than `pprof-duration`. Clamp the
  values rather than throwing errors, which could change the commands
  that existing users might already have in debugging scripts

* Testing: remove test parallelism

  The `operator debug` tests that stand up servers can't be run in
  parallel, because we don't have a way of canceling the API calls for
  pprof. The agent will still be running the last pprof when we exit,
  and that breaks the next test that talks to that same agent.
  (Because you can only run one pprof at a time on any process!)

  We could split off each subtest into its own server, but this test
  suite is already very slow. In future work we should fix this "for
  real" by making the API call cancelable.


* Testing: assert against unexpected errors in `operator debug` tests.

  If we assert there are no unexpected error outputs, it's easier for
  the developer to debug when something is going wrong with the tests
  because the error output will be presented as a failing test, rather
  than just a failing exit code check. Or worse, no failing exit code
  check!

  This also forces us to be explicit about which tests will return 0
  exit codes but still emit (presumably ignorable) error outputs.

Additional minor bug fixes (mostly in tests) and test refactorings:

* Fix text alignment on pprof Duration in `operator debug` output

* Remove "done" channel from `operator debug` event stream test. The
  goroutine we're blocking for here already tells us it's done by
  sending a value, so block on that instead of an extraneous channel

* Event stream test timer should start at current time, not zero

* Remove noise from `operator debug` test log output. The `t.Logf`
  calls already are picked out from the rest of the test output by
  being prefixed with the filename.

* Remove explicit pprof args so we use the defaults clamped from
  duration/interval
@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
Development

Successfully merging this pull request may close these issues.

cli: take multiple pprof snapshots from 'operator debug'
4 participants