Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ABRT adaptor config #105

Merged
merged 1 commit into from
Mar 29, 2017

Conversation

juliusmilan
Copy link
Contributor

@juliusmilan juliusmilan commented Mar 23, 2017

This config allows NPD to catch problems detected by ABRT, ABRT can find various problems (see the link) and log them to journalctl in format described here:
https://github.com/abrt/abrt/wiki/systemd-journal-catalog-messages

Work sequence:

  1. ABRT processes problem and logs to journal
  2. NDP catches ABRT logs and report them to upper layer

This change is Reviewable

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Mar 23, 2017
@Random-Liu
Copy link
Member

@juliusmilan This is cool! Have you verified this config?

@dchen1107

@juliusmilan
Copy link
Contributor Author

@Random-Liu Thank you, yes verified for all kinds of ABRT messages present in config.

@Random-Liu
Copy link
Member

LGTM.

@juliusmilan So you run NPD in Kubernetes cluster and did see node events generated, right?

@Random-Liu Random-Liu added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 27, 2017
@Random-Liu
Copy link
Member

Will merge after your confirmation. :)

@dchen1107
Copy link
Member

Thanks for the pr to extend the usability of NPD.

@juliusmilan
Copy link
Contributor Author

@Random-Liu I tested it using netcat in following manner:
started netcat on localhost:
$ nc -l -k 127.0.0.1 5000
then I created a crash using will_python_raise, will_abort, ... (from package will_crash - simple testing programs, each just crashes somehow) to be catched by ABRT:
$ will_python_raise
after this in journalctl appeared following line, then I executed NPD
abrt-notification[24992]: Process 24711 (will_python_raise) of user 1000 encountered an uncaught ZeroDivisionError exception
$ ./bin/node-problem-detector --apiserver-override=http://127.0.0.1:5000?inClusterConfig=false --system-log-monitors=config/abrt-adaptor.json

following is what catched my netcat (so I knew config works well):

POST /api/v1/namespaces/default/events HTTP/1.1
Host: 127.0.0.1:5000
User-Agent: node-problem-detector/v1.4.0 (linux/amd64) kubernetes/$Format
Content-Length: 572
Accept: application/json, /
Content-Type: application/json
Accept-Encoding: gzip

{"kind":"Event","apiVersion":"v1","metadata":{"name":"dhcp-24-175.brq.redhat.com.14aff755624bb494","namespace":"default","creationTimestamp":null},"involvedObject":{"kind":"Node","name":"dhcp-24-175.brq.redhat.com","uid":"dhcp-24-175.brq.redhat.com"},"reason":"UncaughtException","message":"Process 24711 (will_python_raise) of user 1000 encountered an uncaught ZeroDivisionError exception","source":{"component":"abrt-adaptor","host":"dhcp-24-175.brq.redhat.com"},"firstTimestamp":"2017-03-28T06:19:07Z","lastTimestamp":"2017-03-28T06:19:07Z","count":1,"type":"Warning"}
PATCH /api/v1/nodes/dhcp-24-175.brq.redhat.com/status HTTP/1.1
Host: 127.0.0.1:5000
User-Agent: node-problem-detector/v1.4.0 (linux/amd64) kubernetes/$Format
Content-Length: 28
Accept: application/json, /
Content-Type: application/strategic-merge-patch+json
Accept-Encoding: gzip

{"status":{"conditions":[]}}

I did similar test for all kinds of problems I added to the config/abrt-adaptor.json

@Random-Liu
Copy link
Member

Cool~ LGTM

@Random-Liu Random-Liu merged commit 9c29602 into kubernetes:master Mar 29, 2017
@Random-Liu
Copy link
Member

@juliusmilan Merged! Thanks for the improvement!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants