Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sanitize rapl zone name #2301

Closed

Conversation

sourikghosh
Copy link

hotfix for rapl collector crash sanitizing the rapl zone name for invalid metrics name.
closes #2299

Signed-off-by: Sourik Ghosh <sourikghosh31@gmail.com>
@sourikghosh sourikghosh force-pushed the fix/2299-sanitize-rapl-name branch from 5c269cf to 0cd26a7 Compare March 1, 2022 12:25
@baryluk
Copy link

baryluk commented Mar 1, 2022

Test?

@sourikghosh
Copy link
Author

yeah ok adding

Signed-off-by: Sourik Ghosh <sourikghosh31@gmail.com>
Signed-off-by: Sourik Ghosh <sourikghosh31@gmail.com>
Signed-off-by: Sourik Ghosh <sourikghosh31@gmail.com>
@SuperQ
Copy link
Member

SuperQ commented Mar 1, 2022

Hmm, maybe this is one of those that we should make a label instead of mashing the metric name.

@sourikghosh
Copy link
Author

@SuperQ any update on this ?? Do we need to fix this ??

@SuperQ
Copy link
Member

SuperQ commented Mar 2, 2022

@discordianfish What do you think? Should we move the rapl zone into a rapl_zone label?

@baryluk
Copy link

baryluk commented Mar 2, 2022

Here is the content of the relevant sysfs files from the machine that was a reason to fill the bug:

# for f in `find /sys/ | grep rapl | grep -E 'uj|name'`; do grep -H . $f; done
/sys/devices/virtual/powercap/intel-rapl/intel-rapl:0/energy_uj:9169789504
/sys/devices/virtual/powercap/intel-rapl/intel-rapl:0/intel-rapl:0:0/energy_uj:36263586493
/sys/devices/virtual/powercap/intel-rapl/intel-rapl:0/intel-rapl:0:0/name:core
/sys/devices/virtual/powercap/intel-rapl/intel-rapl:0/intel-rapl:0:0/max_energy_range_uj:65532610987
/sys/devices/virtual/powercap/intel-rapl/intel-rapl:0/name:package-0-die-0
/sys/devices/virtual/powercap/intel-rapl/intel-rapl:0/max_energy_range_uj:65532610987

CPU: Model name: AMD EPYC 7401P 24-Core Processor

Maybe it will help.

@sourikghosh
Copy link
Author

@discordianfish Will you share your kind opinion on this?

@discordianfish What do you think? Should we move the rapl zone into a rapl_zone label?

@discordianfish
Copy link
Member

@SuperQ Yes, that would be cleaner. It's a breaking change though so I'm fine either way.

@sourikghosh
Copy link
Author

@SuperQ @discordianfish Should I close this PR or we can go ahead with this?

@discordianfish
Copy link
Member

@sourikghosh Yes, let's use a label for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

rapl collector crash: panic: "node_rapl_package-0-die-0_joules_total" is not a valid metric name
4 participants