Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add collector for Linux EDAC #324

Merged
merged 2 commits into from
Jan 10, 2017
Merged

Add collector for Linux EDAC #324

merged 2 commits into from
Jan 10, 2017

Conversation

SuperQ
Copy link
Member

@SuperQ SuperQ commented Oct 9, 2016

Collect "Error detection and correction" metrics from memory
controllers.

  • Supported on Linux only.
  • Add basic fixtures.
  • Enabled by default.

@SuperQ SuperQ changed the title Add collector for Linux EDAC [WIP] Add collector for Linux EDAC Oct 9, 2016
Copy link
Contributor

@brian-brazil brian-brazil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't there be a change in the end-to-end output too?

func NewEdacCollector() (Collector, error) {
return &edacCollector{
ceCount: prometheus.NewDesc(
prometheus.BuildFQName(Namespace, edacSubsystem, "ce_count"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_count is the suffix for Summaries/Histograms. This is probably a counter, so should be _total

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was translating the names directly without thinking about it too hard.

What about correctable_errors_total, and similar for the other names.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sgtm

[]string{"controller"}, nil,
),
ueCount: prometheus.NewDesc(
prometheus.BuildFQName(Namespace, edacSubsystem, "ue_count"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spell out "uncorrectable"

@SuperQ
Copy link
Member Author

SuperQ commented Oct 9, 2016

Yes, it should be in the end-to-end output, not sure why.

[]string{"controller"}, nil,
),
ueNoinfoCount: prometheus.NewDesc(
prometheus.BuildFQName(Namespace, edacSubsystem, "no_csrow_uncorrectable_errors_total"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might we want to lump these into csrow_uncorrectable_errors_total with a label like "unknown"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. That depends on how well the kernel implements this data. In theory, unknown row + the csrow numbers should be possible to aggregate. Then we only need two metrics, one for correctable and one for uncorrectable.

@SuperQ SuperQ force-pushed the superq/edac_mc branch 2 times, most recently from cc3eb99 to dd3dc45 Compare December 13, 2016 11:29
@discordianfish
Copy link
Member

@SuperQ What is the state of this? Is it ready to get reviewed/merged?

@SuperQ SuperQ force-pushed the superq/edac_mc branch 2 times, most recently from 19efeed to e8b92d3 Compare January 8, 2017 11:59
@SuperQ SuperQ requested a review from discordianfish January 8, 2017 12:06
@SuperQ SuperQ changed the title [WIP] Add collector for Linux EDAC Add collector for Linux EDAC Jan 8, 2017
@SuperQ
Copy link
Member Author

SuperQ commented Jan 8, 2017

@discordianfish Ok, finally fixed up the end-to-end test. This is ready to go.

@SuperQ SuperQ force-pushed the superq/edac_mc branch 2 times, most recently from 2b36015 to 374e060 Compare January 8, 2017 12:32
Copy link
Member

@discordianfish discordianfish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good beside the regex question. Also needs rebasing.

)

var (
edacMemControllerRE = regexp.MustCompile(`.*devices/system/edac/mc/mc([0-9]*)`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are the regexes needed? The globbing should already limit the files to the same, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regexp is used to extract the controller number from the directory name.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see, makes sense. In general I slightly prefer doing such basic parsing manually.. then again, possibly just a personal preference. So fine with me!

Copy link
Member

@discordianfish discordianfish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@SuperQ SuperQ requested a review from juliusv January 10, 2017 08:52
),
csrowUeCount: prometheus.NewDesc(
prometheus.BuildFQName(Namespace, edacSubsystem, "csrow_uncorrectable_errors_total"),
"Total correctable memory errors for this csrow.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correctable -> uncorrectable

type edacCollector struct {
ceCount *prometheus.Desc
ueCount *prometheus.Desc
csrowCeCount *prometheus.Desc
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

csrowCeCount -> csRowCECount

csrowUeCount -> csRowUECount

@juliusv
Copy link
Member

juliusv commented Jan 10, 2017

👍 otherwise

Collect "Error detection and correction" metrics from memory
controllers.
* Supported on Linux only.
* Add basic fixtures.
* Enabled by default.
@SuperQ SuperQ merged commit 12f8494 into master Jan 10, 2017
@SuperQ SuperQ deleted the superq/edac_mc branch January 10, 2017 09:42
@SuperQ SuperQ mentioned this pull request Jan 15, 2017
tamcore pushed a commit to gitgrave/node_exporter that referenced this pull request Oct 22, 2024
Signed-off-by: prombot <prometheus-team@googlegroups.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants