NullPointerException Discovery WorkerThread Error 100% CPU #1014

Harmoney-RogerParkinson · 2017-11-15T03:21:20Z

Running eureka 1.4.6 on an AWS m3.xlarge. It is packaged into a docker image and run with ECS, possibly not relevant though. Every few hours is throws this error:

java.lang.NullPointerException: null
at com.netflix.eureka.util.batcher.TaskExecutors$TaskExecutorMetrics.registerExpiryTimes(TaskExecutors.java:135)
at com.netflix.eureka.util.batcher.TaskExecutors$BatchWorkerRunnable.run(TaskExecutors.java:184)
at java.lang.Thread.run(Thread.java:748)
WARN c.n.e.util.batcher.TaskExecutors Discovery WorkerThread error

At which point the CPU usage of that machine takes a step up from about 2.5% to 10% or 25%. A few hours later (the interval seems to be random) it does it again and eventually is running at 100%. At that point we start seeing timeouts in our logs, heartbeat requests timing out.

We run two Eureka instances, and they each have a dedicated server, so no other services are running on those servers. The other services (on other servers) are reasonably happy, they get the odd timeout on their heartbeats but they continue to handle requests and route them etc, which is pretty cool, but the errors and CPU saturation on the Eureka servers is a worry.

It looks like there's a loop in Eureka that, once it hits the NPE keeps looping in a thread, hence the incremental jumps in CPU. Has anyone seen this before? I searched, seems to be not a known problem. Thanks for any help.

qiangdavidliu · 2018-02-15T22:00:19Z

@Harmoney-RogerParkinson apologies for taking so long to take a look at this, I have submitted a PR for the fix: #1033 .

From the stack trace I suspect is is some race condition to do with checking the isShutdown flag that results in a null holder which then NPEs when metrics are computed from it.

fix potential NPE for issue #1014

qiangdavidliu · 2018-02-16T00:22:36Z

Merged and will be released in the next release (should be soon)

qiangdavidliu added a commit to qiangdavidliu/eureka that referenced this issue Feb 15, 2018

fix potential NPE for issue Netflix#1014

c331e53

qiangdavidliu mentioned this issue Feb 15, 2018

fix potential NPE for issue #1014 #1033

Merged

qiangdavidliu added a commit that referenced this issue Feb 15, 2018

Merge pull request #1033 from qiangdavidliu/NPE_fix

5491428

fix potential NPE for issue #1014

qiangdavidliu closed this as completed Feb 16, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NullPointerException Discovery WorkerThread Error 100% CPU #1014

NullPointerException Discovery WorkerThread Error 100% CPU #1014

Harmoney-RogerParkinson commented Nov 15, 2017

qiangdavidliu commented Feb 15, 2018

qiangdavidliu commented Feb 16, 2018

NullPointerException Discovery WorkerThread Error 100% CPU #1014

NullPointerException Discovery WorkerThread Error 100% CPU #1014

Comments

Harmoney-RogerParkinson commented Nov 15, 2017

qiangdavidliu commented Feb 15, 2018

qiangdavidliu commented Feb 16, 2018