Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NPE while recovering the Threadlocals in Reactor #898

Open
jensbaitingerbosch opened this issue Dec 3, 2024 · 9 comments
Open

NPE while recovering the Threadlocals in Reactor #898

jensbaitingerbosch opened this issue Dec 3, 2024 · 9 comments

Comments

@jensbaitingerbosch
Copy link

jensbaitingerbosch commented Dec 3, 2024

Hi,

we observe the following Error message in our Prod system, unfortunately without any hint where it is coming from. I don't know if it is the 'fault' of micrometer-core or micrometer-tracing.

On one hand you could argue that a KeyValue within a KeyValues should not never be null, on the other hand, the null vaulue must probably come from RevertingScope.matchingBaggageKeyValues()

Operator called default onErrorDropped java.lang.NullPointerException: Cannot invoke "io.micrometer.common.KeyValue.getKey()" because "o" is null
	at io.micrometer.common.KeyValue.compareTo(KeyValue.java:99)
	at io.micrometer.common.KeyValues.isSortedSet(KeyValues.java:76)
	at io.micrometer.common.KeyValues.make(KeyValues.java:92)
	at io.micrometer.common.KeyValues.of(KeyValues.java:365)
	at io.micrometer.observation.Observation$Context.getLowCardinalityKeyValues(Observation.java:1210)
	at io.micrometer.observation.Observation$Context.getAllKeyValues(Observation.java:1232)
	at io.micrometer.tracing.handler.RevertingScope.matchingBaggageKeyValues(RevertingScope.java:114)
	at io.micrometer.tracing.handler.RevertingScope.maybeWithBaggage(RevertingScope.java:81)
	at io.micrometer.tracing.handler.TracingObservationHandler.setMaybeScopeOnTracingContext(TracingObservationHandler.java:100)
	at io.micrometer.tracing.handler.TracingObservationHandler.onScopeOpened(TracingObservationHandler.java:82)
	at io.micrometer.observation.ObservationHandler$FirstMatchingCompositeObservationHandler.onScopeOpened(ObservationHandler.java:173)
	at io.micrometer.observation.SimpleObservation.notifyOnScopeOpened(SimpleObservation.java:235)
	at io.micrometer.observation.SimpleObservation.openScope(SimpleObservation.java:194)
	at io.micrometer.observation.contextpropagation.ObservationThreadLocalAccessor.setValue(ObservationThreadLocalAccessor.java:117)
	at io.micrometer.observation.contextpropagation.ObservationThreadLocalAccessor.setValue(ObservationThreadLocalAccessor.java:33)
	at io.micrometer.context.DefaultContextSnapshot.setThreadLocal(DefaultContextSnapshot.java:102)
	at io.micrometer.context.DefaultContextSnapshotFactory.setAllThreadLocalsFrom(DefaultContextSnapshotFactory.java:125)
	at io.micrometer.context.DefaultContextSnapshotFactory.setThreadLocalsFrom(DefaultContextSnapshotFactory.java:109)
	at reactor.core.publisher.ContextPropagation.setThreadLocals(ContextPropagation.java:85)
	at reactor.core.publisher.FluxContextWriteRestoringThreadLocals$ContextWriteRestoringThreadLocalsSubscriber.onComplete(FluxContextWriteRestoringThreadLocals.java:148)

The issue occurs since the update to Spring Boot 3.4.0, Springframework 6.2.0, Micrometer 1.14.1, Micrometer Tracing 1.4.0

@marcingrzejszczak
Copy link
Contributor

Summoning @chemicL

@chemicL
Copy link
Contributor

chemicL commented Dec 3, 2024

Can you please list the versions of the libraries? Did it start happening after an upgrade? Are you able to provide a reproducer?

@jensbaitingerbosch
Copy link
Author

I updated the main ticket with the versions. Unfortunately I could not reproduce it yet. I occurs about 1-3 times per hour while the particular application processes about 60-70 requests per seconds / 240k requests per hour, so it is pretty rare relative to the amount of requests.

I also saw that issue on our QA system, where we only have artificial traffic from our tests, so my current assumption is that the issue is not caused by some obscure data that only occurs on production. My plan is to start a debugging instance on the QA system and hope to catch the issue there, which me luck :-)

@chemicL
Copy link
Contributor

chemicL commented Dec 4, 2024

Thanks! So these are the latest versions it seems and reactor-core 3.7.0 would be used. Did this start appearing after a bump from some other Boot version? From the reactor perspective I can't point to any recent change that could have caused this and one that deals with KeyValues is not released yet ;)

@marcingrzejszczak perhaps there were some recent changes in Micrometer around this area?

@marcingrzejszczak
Copy link
Contributor

I haven't touched micrometer for the last couple of months cc @jonatan-ivanov @shakuzen

@shakuzen
Copy link
Member

shakuzen commented Dec 5, 2024

It looks similar to micrometer-metrics/micrometer#4356 which seems to be reported more frequently lately.

@chemicL
Copy link
Contributor

chemicL commented Dec 5, 2024

@shakuzen @marcingrzejszczak the related change that seems to have been introduced by 1.4.0 is #455 and in fact it touches the mentioned RevertingScope.matchingBaggageKeyValues() in this report. Perhaps there's a bug in there somewhere and some unit tests can be added for it?

@chemicL
Copy link
Contributor

chemicL commented Dec 5, 2024

@shakuzen @marcingrzejszczak @jonatan-ivanov I documented my investigations in micrometer-metrics/micrometer#4356 (comment) - hope this helps.

@shakuzen
Copy link
Member

We expect this should be fixed with this change in Micrometer, released in 1.14.2, which tracing 1.4.1 uses. Could people affected by this please try with the latest versions to confirm if this is still an issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants