You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Running eureka 1.4.6 on an AWS m3.xlarge. It is packaged into a docker image and run with ECS, possibly not relevant though. Every few hours is throws this error:
java.lang.NullPointerException: null
at com.netflix.eureka.util.batcher.TaskExecutors$TaskExecutorMetrics.registerExpiryTimes(TaskExecutors.java:135)
at com.netflix.eureka.util.batcher.TaskExecutors$BatchWorkerRunnable.run(TaskExecutors.java:184)
at java.lang.Thread.run(Thread.java:748)
WARN c.n.e.util.batcher.TaskExecutors Discovery WorkerThread error
At which point the CPU usage of that machine takes a step up from about 2.5% to 10% or 25%. A few hours later (the interval seems to be random) it does it again and eventually is running at 100%. At that point we start seeing timeouts in our logs, heartbeat requests timing out.
We run two Eureka instances, and they each have a dedicated server, so no other services are running on those servers. The other services (on other servers) are reasonably happy, they get the odd timeout on their heartbeats but they continue to handle requests and route them etc, which is pretty cool, but the errors and CPU saturation on the Eureka servers is a worry.
It looks like there's a loop in Eureka that, once it hits the NPE keeps looping in a thread, hence the incremental jumps in CPU. Has anyone seen this before? I searched, seems to be not a known problem. Thanks for any help.
The text was updated successfully, but these errors were encountered:
@Harmoney-RogerParkinson apologies for taking so long to take a look at this, I have submitted a PR for the fix: #1033 .
From the stack trace I suspect is is some race condition to do with checking the isShutdown flag that results in a null holder which then NPEs when metrics are computed from it.
Running eureka 1.4.6 on an AWS m3.xlarge. It is packaged into a docker image and run with ECS, possibly not relevant though. Every few hours is throws this error:
At which point the CPU usage of that machine takes a step up from about 2.5% to 10% or 25%. A few hours later (the interval seems to be random) it does it again and eventually is running at 100%. At that point we start seeing timeouts in our logs, heartbeat requests timing out.
We run two Eureka instances, and they each have a dedicated server, so no other services are running on those servers. The other services (on other servers) are reasonably happy, they get the odd timeout on their heartbeats but they continue to handle requests and route them etc, which is pretty cool, but the errors and CPU saturation on the Eureka servers is a worry.
It looks like there's a loop in Eureka that, once it hits the NPE keeps looping in a thread, hence the incremental jumps in CPU. Has anyone seen this before? I searched, seems to be not a known problem. Thanks for any help.
The text was updated successfully, but these errors were encountered: