-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
debug: add LSM health explicitly to the debug zip #79518
Labels
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
E-quick-win
Likely to be a quick win for someone experienced.
E-starter
Might be suitable for a starter project for new employees or team members.
T-storage
Storage Team
Comments
nicktrav
added
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
T-storage
Storage Team
labels
Apr 6, 2022
Manually synced with Jira |
nicktrav
added
E-starter
Might be suitable for a starter project for new employees or team members.
E-quick-win
Likely to be a quick win for someone experienced.
labels
Jun 10, 2022
This is under consideration for 23.2. Removing
O-postmortem
|
One possible impl:
|
anish-shanbhag
added a commit
to anish-shanbhag/cockroach
that referenced
this issue
Jun 18, 2024
Metrics from the storage engine are already exposed in the `/debug/lsm` HTTP endpoint. These can be useful when debugging storage issues, and so this change adds these metrics to the debug zip under `/nodes/$N/lsm.txt` in the same text format as the HTTP route. The previously unused `EngineStats` status endpoint was repurposed to serve these metrics from each node. Fixes: cockroachdb#79518 Epic: none Release note: none
craig bot
pushed a commit
that referenced
this issue
Jun 24, 2024
125865: cli: add storage engine metrics to debug zip r=itsbilal a=anish-shanbhag Metrics from the storage engine are already exposed in the `/debug/lsm` HTTP endpoint. These can be useful when debugging storage issues, and so this change adds these metrics to the debug zip under `/nodes/$N/lsm.txt` in the same text format as the HTTP route. The previously unused `EngineStats` status endpoint was repurposed to serve these metrics from each node. Fixes: #79518 Epic: none Release note: none 126087: roachtest: improve logging in gossip/chaos/nodes=9 further r=nvanbenschoten a=nvanbenschoten Informs #124828. Informs #126077. Release note: None Co-authored-by: Anish Shanbhag <anish.shanbhag@cockroachlabs.com> Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
asg0451
pushed a commit
to asg0451/cockroach
that referenced
this issue
Jun 25, 2024
Metrics from the storage engine are already exposed in the `/debug/lsm` HTTP endpoint. These can be useful when debugging storage issues, and so this change adds these metrics to the debug zip under `/nodes/$N/lsm.txt` in the same text format as the HTTP route. The previously unused `EngineStats` status endpoint was repurposed to serve these metrics from each node. Fixes: cockroachdb#79518 Epic: none Release note: none
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
E-quick-win
Likely to be a quick win for someone experienced.
E-starter
Might be suitable for a starter project for new employees or team members.
T-storage
Storage Team
Pebble exposes a view of the LSM, which we expose in the Cockroach console via the
/debug/lsm
endpoint.When debugging storage-related issues, we often want to know the state of the LSM at the time when the debug.zip was taken. To do this, we typically dig through logs, looking for the last instance of the LSM printout, which could be up to 10 minutes in the past.
To quicken the process of diagnosing issues related to storage health, we should consider adding the LSM health to the debug zip as its own dedicated file, for each node. For instance, under
nodes/$N/lsm.txt
, we can dump the same output from the/debug/lsm
endpoint.For historical LSM metrics, the logs can still be used, in addition to the metrics captured in the tsdump (l0 size, l0 file count, etc.).
Jira issue: CRDB-14895
The text was updated successfully, but these errors were encountered: