Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NullPointerException Discovery WorkerThread Error 100% CPU #1014

Closed
Harmoney-RogerParkinson opened this issue Nov 15, 2017 · 2 comments
Closed

Comments

@Harmoney-RogerParkinson

Running eureka 1.4.6 on an AWS m3.xlarge. It is packaged into a docker image and run with ECS, possibly not relevant though. Every few hours is throws this error:

java.lang.NullPointerException: null
at com.netflix.eureka.util.batcher.TaskExecutors$TaskExecutorMetrics.registerExpiryTimes(TaskExecutors.java:135)
at com.netflix.eureka.util.batcher.TaskExecutors$BatchWorkerRunnable.run(TaskExecutors.java:184)
at java.lang.Thread.run(Thread.java:748)
WARN c.n.e.util.batcher.TaskExecutors Discovery WorkerThread error

At which point the CPU usage of that machine takes a step up from about 2.5% to 10% or 25%. A few hours later (the interval seems to be random) it does it again and eventually is running at 100%. At that point we start seeing timeouts in our logs, heartbeat requests timing out.

We run two Eureka instances, and they each have a dedicated server, so no other services are running on those servers. The other services (on other servers) are reasonably happy, they get the odd timeout on their heartbeats but they continue to handle requests and route them etc, which is pretty cool, but the errors and CPU saturation on the Eureka servers is a worry.

It looks like there's a loop in Eureka that, once it hits the NPE keeps looping in a thread, hence the incremental jumps in CPU. Has anyone seen this before? I searched, seems to be not a known problem. Thanks for any help.

qiangdavidliu added a commit to qiangdavidliu/eureka that referenced this issue Feb 15, 2018
@qiangdavidliu
Copy link
Contributor

@Harmoney-RogerParkinson apologies for taking so long to take a look at this, I have submitted a PR for the fix: #1033 .

From the stack trace I suspect is is some race condition to do with checking the isShutdown flag that results in a null holder which then NPEs when metrics are computed from it.

qiangdavidliu added a commit that referenced this issue Feb 15, 2018
@qiangdavidliu
Copy link
Contributor

Merged and will be released in the next release (should be soon)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants