-
Notifications
You must be signed in to change notification settings - Fork 848
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix metric sdk when multiple readers are present #4436
Fix metric sdk when multiple readers are present #4436
Conversation
Codecov Report
@@ Coverage Diff @@
## main #4436 +/- ##
============================================
- Coverage 90.20% 90.12% -0.08%
+ Complexity 5030 5002 -28
============================================
Files 572 569 -3
Lines 15513 15438 -75
Branches 1497 1488 -9
============================================
- Hits 13994 13914 -80
- Misses 1048 1061 +13
+ Partials 471 463 -8
Continue to review full report at Codecov.
|
@@ -748,22 +748,21 @@ void viewSdk_capturesBaggageFromContext() { | |||
} | |||
|
|||
@Test | |||
void sdkMeterProvider_supportsMultipleCollectorsCumulative() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two tests were written in a way that allowed the bug to hide. Recording additional measurements with attributes reveals the true behavior. These tests as currently written fail on main
.
@@ -41,10 +35,8 @@ public final class SdkMeterProvider implements MeterProvider, Closeable { | |||
|
|||
private final ComponentRegistry<SdkMeter> registry; | |||
private final MeterProviderSharedState sharedState; | |||
private final Map<CollectionHandle, CollectionInfo> collectionInfoMap; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CollectionHandle and CollectionInfo had overlapping responsibilities. I've replaced them with a single class with a name which more clearly conveys its responsibilities: RegisteredReader
.
private final AtomicBoolean isClosed = new AtomicBoolean(false); | ||
private final AtomicLong lastCollectionTimestamp; | ||
private final long minimumCollectionIntervalNanos; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The minimumCollectionIntervalNanos
we've previously talked about turned out to be central to the bug. It's true purpose appears to be trying to ensure that when multiple readers are present they each receive the same data if they collect within a narrow enough interval of time. However, I believe the mechanism to be flawed as it didn't account for correctness when the readers are on different schedules (a perfectly reasonable scenario).
The refactor negates the need for it and assists in reducing complexity.
* <p>This class is internal and is hence not for public use. Its APIs are unstable and can change | ||
* at any time. | ||
*/ | ||
public class RegisteredReader { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A simple wrapper of MetricReader
which is assigned a UUID to allows internal code to differentiate readers.
Later, we may choose to use this class to track when a reader has last collected, which would assist in solving #4400.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As an FYI - you could continue to use CollectionHandle
if you wanted, which is GUID (from the standpoint of the Metrics SDK), and performs similarly to using an integer ID.
They may be less needed given the overall scope here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, CollectionHandle
and CollectionInfo
are both somewhat involved in identifying a unique reader. I didn't see the need to have two classes for that concept, and figured renaming to RegisteredReader
was more representative of what the class does. That is, it represents a unique registered reader. Internal components can rely on hashCode()
and equals()
that are unique among readers, and any meta data that needs to be stored with the reader can be associated with the RegisteredReader
.
this.aggregationTemporality = | ||
registeredReader | ||
.getReader() | ||
.getAggregationTemporality(metricDescriptor.getSourceInstrument().getType()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A small optimization to calculate the temporality one instead of each time a collection occurs.
@@ -38,7 +38,7 @@ public class MetricStorageRegistry { | |||
private final Map<MetricDescriptor, MetricStorage> registry = new HashMap<>(); | |||
|
|||
/** Returns a {@link Collection} of the registered {@link MetricStorage}. */ | |||
public Collection<MetricStorage> getMetrics() { | |||
public Collection<MetricStorage> getStorages() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename for improved clarify.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. It could be even something like "getAllStorages" to make it clear that all registered ones will be returned.
It may also make sense to change JavaDocs - eg line 48 to "Registers the storage..." (I can't add review comment to these lines directly).
* <p>This class is internal and is hence not for public use. Its APIs are unstable and can change | ||
* at any time. | ||
*/ | ||
public class RegisteredReader { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As an FYI - you could continue to use CollectionHandle
if you wanted, which is GUID (from the standpoint of the Metrics SDK), and performs similarly to using an integer ID.
They may be less needed given the overall scope here.
} | ||
|
||
public InstrumentDescriptor getInstrumentDescriptor() { | ||
return instrumentDescriptor; | ||
} | ||
|
||
void invokeCallback() { | ||
void invokeCallback(RegisteredReader reader) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this method synchronized in some fashion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the moment all collections across all readers are synchronized such that only one happens at a time. This is controlled in MeterSharedState
, which obtains collectLock
during collection.
...c/main/java/io/opentelemetry/sdk/metrics/internal/state/DefaultSynchronousMetricStorage.java
Outdated
Show resolved
Hide resolved
sdk/metrics/src/main/java/io/opentelemetry/sdk/metrics/internal/export/RegisteredReader.java
Outdated
Show resolved
Hide resolved
sdk/metrics/src/main/java/io/opentelemetry/sdk/metrics/SdkMeterProvider.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks solid, some minor comments from my side.
…y-java into fix-multiple-readers
I’ve found a rather important conceptual flaw in the metrics SDK that prevents proper function when multiple metric readers are registered.
The problem is rooted in that there is 1 metric storage per instrument and matching view. This single metric storage is shared among multiple metric readers which each reset the storage after collection. The result is readers interfering with each other and each receiving partial measurements.
The solution in this PR is to create 1 metric storage per reader per instrument and matching view. When measurements are recorded to an instrument, they accumulate to each registered readers storage. When a reader collects metrics, it reads and resets only from the storages associated with it. Along the way, I was able to remove a fair amount of complexity.
I discovered this issue while investigating adding support for allowing metric readers (and exporters) to specify their own default aggregation for each instrument type. This solution paves the way for that as well.