Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix metric sdk when multiple readers are present #4436

Merged
merged 4 commits into from
May 13, 2022

Conversation

jack-berg
Copy link
Member

I’ve found a rather important conceptual flaw in the metrics SDK that prevents proper function when multiple metric readers are registered.

The problem is rooted in that there is 1 metric storage per instrument and matching view. This single metric storage is shared among multiple metric readers which each reset the storage after collection. The result is readers interfering with each other and each receiving partial measurements.

The solution in this PR is to create 1 metric storage per reader per instrument and matching view. When measurements are recorded to an instrument, they accumulate to each registered readers storage. When a reader collects metrics, it reads and resets only from the storages associated with it. Along the way, I was able to remove a fair amount of complexity.

I discovered this issue while investigating adding support for allowing metric readers (and exporters) to specify their own default aggregation for each instrument type. This solution paves the way for that as well.

@jack-berg jack-berg requested a review from a user May 4, 2022 21:20
@jack-berg jack-berg requested a review from Oberon00 as a code owner May 4, 2022 21:20
@codecov
Copy link

codecov bot commented May 4, 2022

Codecov Report

Merging #4436 (ab11d74) into main (078d55a) will decrease coverage by 0.07%.
The diff coverage is 84.86%.

@@             Coverage Diff              @@
##               main    #4436      +/-   ##
============================================
- Coverage     90.20%   90.12%   -0.08%     
+ Complexity     5030     5002      -28     
============================================
  Files           572      569       -3     
  Lines         15513    15438      -75     
  Branches       1497     1488       -9     
============================================
- Hits          13994    13914      -80     
- Misses         1048     1061      +13     
+ Partials        471      463       -8     
Impacted Files Coverage Δ
...try/sdk/metrics/internal/SdkMeterProviderUtil.java 77.77% <ø> (+0.63%) ⬆️
...sdk/metrics/internal/state/EmptyMetricStorage.java 0.00% <0.00%> (-45.46%) ⬇️
...etry/sdk/metrics/internal/state/MetricStorage.java 0.00% <ø> (-100.00%) ⬇️
.../metrics/internal/state/MetricStorageRegistry.java 92.59% <ø> (ø)
.../sdk/metrics/internal/export/RegisteredReader.java 71.42% <71.42%> (ø)
...nternal/state/DefaultSynchronousMetricStorage.java 78.57% <81.39%> (-4.36%) ⬇️
...k/metrics/internal/state/CallbackRegistration.java 97.43% <93.75%> (-2.57%) ⬇️
...y/sdk/metrics/internal/state/MeterSharedState.java 97.80% <94.28%> (-1.01%) ⬇️
...in/java/io/opentelemetry/sdk/metrics/SdkMeter.java 100.00% <100.00%> (ø)
...io/opentelemetry/sdk/metrics/SdkMeterProvider.java 100.00% <100.00%> (+5.55%) ⬆️
... and 12 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 078d55a...ab11d74. Read the comment docs.

@@ -748,22 +748,21 @@ void viewSdk_capturesBaggageFromContext() {
}

@Test
void sdkMeterProvider_supportsMultipleCollectorsCumulative() {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two tests were written in a way that allowed the bug to hide. Recording additional measurements with attributes reveals the true behavior. These tests as currently written fail on main.

@@ -41,10 +35,8 @@ public final class SdkMeterProvider implements MeterProvider, Closeable {

private final ComponentRegistry<SdkMeter> registry;
private final MeterProviderSharedState sharedState;
private final Map<CollectionHandle, CollectionInfo> collectionInfoMap;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CollectionHandle and CollectionInfo had overlapping responsibilities. I've replaced them with a single class with a name which more clearly conveys its responsibilities: RegisteredReader.

private final AtomicBoolean isClosed = new AtomicBoolean(false);
private final AtomicLong lastCollectionTimestamp;
private final long minimumCollectionIntervalNanos;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The minimumCollectionIntervalNanos we've previously talked about turned out to be central to the bug. It's true purpose appears to be trying to ensure that when multiple readers are present they each receive the same data if they collect within a narrow enough interval of time. However, I believe the mechanism to be flawed as it didn't account for correctness when the readers are on different schedules (a perfectly reasonable scenario).

The refactor negates the need for it and assists in reducing complexity.

* <p>This class is internal and is hence not for public use. Its APIs are unstable and can change
* at any time.
*/
public class RegisteredReader {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A simple wrapper of MetricReader which is assigned a UUID to allows internal code to differentiate readers.

Later, we may choose to use this class to track when a reader has last collected, which would assist in solving #4400.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an FYI - you could continue to use CollectionHandle if you wanted, which is GUID (from the standpoint of the Metrics SDK), and performs similarly to using an integer ID.

They may be less needed given the overall scope here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, CollectionHandle and CollectionInfo are both somewhat involved in identifying a unique reader. I didn't see the need to have two classes for that concept, and figured renaming to RegisteredReader was more representative of what the class does. That is, it represents a unique registered reader. Internal components can rely on hashCode() and equals() that are unique among readers, and any meta data that needs to be stored with the reader can be associated with the RegisteredReader.

this.aggregationTemporality =
registeredReader
.getReader()
.getAggregationTemporality(metricDescriptor.getSourceInstrument().getType());
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A small optimization to calculate the temporality one instead of each time a collection occurs.

@@ -38,7 +38,7 @@ public class MetricStorageRegistry {
private final Map<MetricDescriptor, MetricStorage> registry = new HashMap<>();

/** Returns a {@link Collection} of the registered {@link MetricStorage}. */
public Collection<MetricStorage> getMetrics() {
public Collection<MetricStorage> getStorages() {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename for improved clarify.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. It could be even something like "getAllStorages" to make it clear that all registered ones will be returned.
It may also make sense to change JavaDocs - eg line 48 to "Registers the storage..." (I can't add review comment to these lines directly).

@jack-berg jack-berg mentioned this pull request May 5, 2022
* <p>This class is internal and is hence not for public use. Its APIs are unstable and can change
* at any time.
*/
public class RegisteredReader {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an FYI - you could continue to use CollectionHandle if you wanted, which is GUID (from the standpoint of the Metrics SDK), and performs similarly to using an integer ID.

They may be less needed given the overall scope here.

}

public InstrumentDescriptor getInstrumentDescriptor() {
return instrumentDescriptor;
}

void invokeCallback() {
void invokeCallback(RegisteredReader reader) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this method synchronized in some fashion?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment all collections across all readers are synchronized such that only one happens at a time. This is controlled in MeterSharedState, which obtains collectLock during collection.

Copy link

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks solid, some minor comments from my side.

@jack-berg jack-berg merged commit 8659a82 into open-telemetry:main May 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants