Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core: Update OpenCensus version. #4494

Merged
merged 16 commits into from
Nov 10, 2018

Conversation

songy23
Copy link
Contributor

@songy23 songy23 commented May 22, 2018

Related to census-instrumentation/opencensus-java#1207.

OpenCensus has updated the RPC constants in Specs recently, and in OpenCensus-Java these new constants are added since v0.13.0. This PR updates gRPC to use the latest OpenCensus version, and record stats against the new RPC constants.

Note that gRPC also needs to record stats against the deprecated constants, to keep backwards compatibility.

/cc @bogdandrutu @dinooliva @sebright

@songy23 songy23 force-pushed the opencensus-rpc-constants branch 4 times, most recently from 5be264f to 636d393 Compare May 23, 2018 16:31
Copy link
Contributor

@sebright sebright left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we also need to update filtering for serialized tags?

@@ -1891,22 +1897,36 @@ private void assertServerStatsTrace(String method, Status.Code code,
}
}

private static void checkStartTags(MetricsRecord record, String methodName) {
private static void checkStartTags(MetricsRecord record, String methodName, boolean isClient) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Since there are many calls to checkStartTags and checkEndTags that are far from the definition, it might help to use an enum with CLIENT and SERVER values instead of a boolean.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense to me, though there's other method in this class that already used a similar boolean check...

Copy link
Contributor Author

@songy23 songy23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have filtering for serialized tags before - I'm not sure if this would be a problem after we added the new tags (client_method, server_method, etc.). @zhangkun83 WDYT?

@@ -1891,22 +1897,36 @@ private void assertServerStatsTrace(String method, Status.Code code,
}
}

private static void checkStartTags(MetricsRecord record, String methodName) {
private static void checkStartTags(MetricsRecord record, String methodName, boolean isClient) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense to me, though there's other method in this class that already used a similar boolean check...

Copy link
Contributor

@zhangkun83 zhangkun83 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have filtering for serialized tags before - I'm not sure if this would be a problem after we added the new tags (client_method, server_method, etc.). @zhangkun83 WDYT?

Sorry I don't understand your question. Can you elaborate the problem?

@@ -329,7 +330,9 @@ public void outboundMessage(int seqNo) {
this.parentCtx = checkNotNull(parentCtx);
this.startCtx =
module.tagger.toBuilder(parentCtx)
.put(RpcMeasureConstants.RPC_METHOD, TagValue.create(fullMethodName)).build();
.put(RpcMeasureConstants.RPC_METHOD, TagValue.create(fullMethodName))
.put(RpcMeasureConstants.GRPC_CLIENT_METHOD, TagValue.create(fullMethodName))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably want to create the TagValue once and re-use it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -410,6 +424,9 @@ void callEnded(Status status) {
.tagger
.toBuilder(startCtx)
.put(RpcMeasureConstants.RPC_STATUS, TagValue.create(status.getCode().toString()))
.put(
RpcMeasureConstants.GRPC_CLIENT_STATUS,
TagValue.create(status.getCode().toString()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -609,6 +634,9 @@ public void streamClosed(Status status) {
.tagger
.toBuilder(parentCtx)
.put(RpcMeasureConstants.RPC_STATUS, TagValue.create(status.getCode().toString()))
.put(
RpcMeasureConstants.GRPC_SERVER_STATUS,
TagValue.create(status.getCode().toString()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -211,9 +211,10 @@ private static EndSpanOptions createEndSpanOptions(
}

private static void recordNetworkEvent(
Span span, NetworkEvent.Type type,
Span span, io.opencensus.trace.NetworkEvent.Type type,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it not the time yet to switch to the new class MessageEvent?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to do that in a separate PR since it's unrelated to RPC constants. Also, I assume we will still need to record for both NetworkEvent and MessageEvent to be backwards compatible? @bogdandrutu @HailongWen

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The API converts between the two classes, so only one should be recorded. I think that the only reason to not migrate to MessageEvent is to keep compatibility with older versions of opencensus-java.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sebright, do you mean old versions of opencensus-java implementations? So presumably, the new Census impl will treat NetworkEvent and MessageEvent as equivalent, then gRPC can switch to MessageEvent after all your customers has upgraded to the new impl.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sebright, do you mean old versions of opencensus-java implementations?

I meant old versions of opencensus-api. I don't think that the version of the OpenCensus implementation should matter to gRPC, as long as it is also compatible with the version of opencensus-api that gRPC uses.

The opencensus-java implementations still only reference NetworkEvent, and the API does the conversion.

@@ -389,6 +392,7 @@ void callEnded(Status status) {
tracer = BLANK_CLIENT_TRACER;
}
MeasureMap measureMap = module.statsRecorder.newMeasureMap()
// TODO(songya): remove the deprecated measure constants once they are completed removed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is the migration handled on the consumer side in the Census implementation? For not to break the interoperability between gRPC and Census inside Google, it seems necessary for Census to read both the old and the new measurements, giving the new ones precedence. If that's the case, we could just remove the old measurements here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, inside Google the new measurements are not completely consistent with Census Java, and only the old measurements will be read at this moment. Even for open source, users could still export and read the old measurements so I think it's important to continue recording stats for the old ones.

There are some ongoing discussion about making the new measurements consistent with Census Java. /cc @dinooliva

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The artifact providing the gRPC views has separate methods for registering the old and new views, so users can choose when to upgrade them. It might be easier for users to upgrade if they could get data for both sets of views at the same time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to first change Census impls to convert the old measurements to the new ones, then switch gRPC to record the new measurements after all your customers have switched to the new impl and the new measurements? Then the migration of the measurements can go in line with the migration from NetworkEvent to MessageEvent (see my comment on that). Plus the gRPC code (esp. the tests) would be much cleaner if it ever has to deal with one API.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure, but I thought that we wanted users to explicitly switch to the new views, since the change isn't transparent like the switch to MessageEvent. For example, the stats may be exported to a service that uses the view or measure names.

/cc @bogdandrutu

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, like @sebright said, OpenCensus users could be exporting the old RPC constants to other backends and using them for alerting rules. Since the new and old RPC constants have different names, they will be different metrics on other monitoring systems (Stackdriver, Prometheus). We would like to allow users to explicitly choose old or new RPC constants and update their monitoring accordingly, as in our APIs (registerAllGrpcViews vs registerAllViews). So I think it's import to have stats for both of them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On top of that, as I mentioned in the previous comments, currently it's particularly difficult to map the old RPC constants to the new ones inside Google. For example, there are no *_count measures in the new constants.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you don't have the "started count" measurements in the new constants, but it is a required metric inside Google.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We removed started_count, finished_count and error_count in the new specs. There are some rationale behind why error_count and finished_count gets removed, but I cannot remember why we removed started_count.

@bogdandrutu @Ramonza should we consider putting back the started_count measure since it's required?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added *_started_rpcs constants to OpenCensus v0.14.0, and upgraded the version here. Now the started_rpcs are recorded. PTAL at the two latest commits.

Copy link
Contributor Author

@songy23 songy23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're adding more tags to the context, @sebright proposed maybe we need filtering on serialized tags when serializing contexts. We didn't do this before and wondered if the additional tags could be a problem.

@@ -329,7 +330,9 @@ public void outboundMessage(int seqNo) {
this.parentCtx = checkNotNull(parentCtx);
this.startCtx =
module.tagger.toBuilder(parentCtx)
.put(RpcMeasureConstants.RPC_METHOD, TagValue.create(fullMethodName)).build();
.put(RpcMeasureConstants.RPC_METHOD, TagValue.create(fullMethodName))
.put(RpcMeasureConstants.GRPC_CLIENT_METHOD, TagValue.create(fullMethodName))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -389,6 +392,7 @@ void callEnded(Status status) {
tracer = BLANK_CLIENT_TRACER;
}
MeasureMap measureMap = module.statsRecorder.newMeasureMap()
// TODO(songya): remove the deprecated measure constants once they are completed removed.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, inside Google the new measurements are not completely consistent with Census Java, and only the old measurements will be read at this moment. Even for open source, users could still export and read the old measurements so I think it's important to continue recording stats for the old ones.

There are some ongoing discussion about making the new measurements consistent with Census Java. /cc @dinooliva

@@ -410,6 +424,9 @@ void callEnded(Status status) {
.tagger
.toBuilder(startCtx)
.put(RpcMeasureConstants.RPC_STATUS, TagValue.create(status.getCode().toString()))
.put(
RpcMeasureConstants.GRPC_CLIENT_STATUS,
TagValue.create(status.getCode().toString()))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -609,6 +634,9 @@ public void streamClosed(Status status) {
.tagger
.toBuilder(parentCtx)
.put(RpcMeasureConstants.RPC_STATUS, TagValue.create(status.getCode().toString()))
.put(
RpcMeasureConstants.GRPC_SERVER_STATUS,
TagValue.create(status.getCode().toString()))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -211,9 +211,10 @@ private static EndSpanOptions createEndSpanOptions(
}

private static void recordNetworkEvent(
Span span, NetworkEvent.Type type,
Span span, io.opencensus.trace.NetworkEvent.Type type,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to do that in a separate PR since it's unrelated to RPC constants. Also, I assume we will still need to record for both NetworkEvent and MessageEvent to be backwards compatible? @bogdandrutu @HailongWen

@ericgribkoff ericgribkoff added the kokoro:run Add this label to a PR to tell Kokoro the code is safe and tests can be run label May 24, 2018
@kokoro-team kokoro-team removed the kokoro:run Add this label to a PR to tell Kokoro the code is safe and tests can be run label May 24, 2018
@songy23
Copy link
Contributor Author

songy23 commented May 24, 2018

Just found I forgot to update repositories.bzl. @ericgribkoff could you please run the kokoro build again?

@ericgribkoff ericgribkoff added the kokoro:run Add this label to a PR to tell Kokoro the code is safe and tests can be run label May 24, 2018
@kokoro-team kokoro-team removed the kokoro:run Add this label to a PR to tell Kokoro the code is safe and tests can be run label May 24, 2018
@ericgribkoff ericgribkoff added the kokoro:run Add this label to a PR to tell Kokoro the code is safe and tests can be run label May 25, 2018
@zhangkun83
Copy link
Contributor

Since we're adding more tags to the context, @sebright proposed maybe we need filtering on serialized tags when serializing contexts. We didn't do this before and wondered if the additional tags could be a problem.

Oh, I thought the method name and status tags were not propagated on the via in the first place. Is it not the case?

Copy link
Contributor Author

@songy23 songy23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I thought the method name and status tags were not propagated on the via in the first place. Is it not the case?

Server-method, server-status and client-status are non-propagating. The only thing we're not sure is client-method. From our interop testing it looks like client-method will be propagated to server.

@@ -389,6 +392,7 @@ void callEnded(Status status) {
tracer = BLANK_CLIENT_TRACER;
}
MeasureMap measureMap = module.statsRecorder.newMeasureMap()
// TODO(songya): remove the deprecated measure constants once they are completed removed.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, like @sebright said, OpenCensus users could be exporting the old RPC constants to other backends and using them for alerting rules. Since the new and old RPC constants have different names, they will be different metrics on other monitoring systems (Stackdriver, Prometheus). We would like to allow users to explicitly choose old or new RPC constants and update their monitoring accordingly, as in our APIs (registerAllGrpcViews vs registerAllViews). So I think it's import to have stats for both of them.

@songy23
Copy link
Contributor Author

songy23 commented May 29, 2018

Besides, it looks like the failed Kokoro Macos build is because the testing machine cannot connect to gradle host:

./gradlew clean -PtargetArch=x86_64 -Pcheckstyle.ignoreFailures=false -PfailOnWarnings=true -PerrorProne=true -Dorg.gradle.parallel=true

Downloading https://services.gradle.org/distributions/gradle-4.7-bin.zip

Exception in thread "main" java.net.NoRouteToHostException: No route to host (Host unreachable)
...

@ericgribkoff ericgribkoff added kokoro:force-run Add this label to a PR to tell Kokoro to re-run all tests. Not generally necessary and removed kokoro:run Add this label to a PR to tell Kokoro the code is safe and tests can be run labels May 29, 2018
@kokoro-team kokoro-team removed the kokoro:force-run Add this label to a PR to tell Kokoro to re-run all tests. Not generally necessary label May 29, 2018
@zhangkun83
Copy link
Contributor

Server-method, server-status and client-status are non-propagating. The only thing we're not sure is client-method. From our interop testing it looks like client-method will be propagated to server.

What's the benefit of propagating client-method? Presumably it would be the same as server-method.

@songy23
Copy link
Contributor Author

songy23 commented May 30, 2018

What's the benefit of propagating client-method? Presumably it would be the same as server-method.

Oh sorry I got this wrong at first. Client-method is actually non-propagating, but server would have method in the context. From Dino's comments in census-ecosystem/opencensus-experiments#18 (comment):

The grpc client doesn't need to propagate the method tag but the grpc server does need to add it to the context where the rpc is being processed so there should be a method tag in scope on the server side.

Also, if the grpc server subsequently makes a call as a grpc client, then the method tag will be propagated since it is in scope (I don't believe gRPC has any logic for explicitly removing the method tag).

And we're wondering if ServerTracerFactory.filterContext() needs to be updated after using the new TagKeys.

@zhangkun83
Copy link
Contributor

Oh sorry I got this wrong at first. Client-method is actually non-propagating, but server would have method in the context. From Dino's comments in census-ecosystem/opencensus-experiments#18 (comment):

The grpc client doesn't need to propagate the method tag but the grpc server does need to add it to the context where the rpc is being processed so there should be a method tag in scope on the server side.

Also, if the grpc server subsequently makes a call as a grpc client, then the method tag will be propagated since it is in scope (I don't believe gRPC has any logic for explicitly removing the method tag).

And we're wondering if ServerTracerFactory.filterContext() needs to be updated after using the new TagKeys.

gRPC puts RPC_METHOD (also GRPC_SERVER_METHOD as added in this PR) in the parentCtx in CensusStatsModule.ServerTracerFactory.newServerStreamTracer. This is already doing what @dinooliva was describing. filterContext() doesn't need to be changed.

@songy23
Copy link
Contributor Author

songy23 commented Jun 4, 2018

#4530 has updated gRPC to use MessageEvent instead of NetworkEvent, and has been submitted. Rebased against that PR (upgrade from 0.12.3 to 0.13.2).

Please hold off this PR until I add started_rpcs to OC Java (census-instrumentation/opencensus-java#1230) and make a 0.14.0 release. I'll update this PR after that.

@sebright
Copy link
Contributor

sebright commented Jun 5, 2018

LGTM

@ericgribkoff ericgribkoff added the kokoro:run Add this label to a PR to tell Kokoro the code is safe and tests can be run label Sep 1, 2018
@kokoro-team kokoro-team removed the kokoro:run Add this label to a PR to tell Kokoro the code is safe and tests can be run label Sep 1, 2018
@songy23
Copy link
Contributor Author

songy23 commented Nov 9, 2018

So sorry I didn't update this thread for a long time. The most recent benchmarks still showed performance regression if we were to record both the new and old constants. For now I remove the new constants from this PR and just update the OpenCensus version (and fix deprecation warnings). Please help me submit this PR first to avoid blocking census-instrumentation/opencensus-java#1563. @zhangkun83

Moving forward about the new constants, I think at some point we should just switch to the new ones and completely get rid of the old constants, both internally and in open source. That will be a breaking change, but even with our initial plan (which is to record both new and old constants for some time and then remove the old constants) there will be a point when we need to make the same breaking change.

@songy23 songy23 changed the title core: Update OpenCensus version and record against the new RPC constants. core: Update OpenCensus version. Nov 9, 2018
@@ -1918,22 +1926,27 @@ private void assertServerStatsTrace(String method, Status.Code code,
}
}

private static void checkStartTags(MetricsRecord record, String methodName) {
private static void checkStartTags(MetricsRecord record, String methodName, Type type) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type is used, can you remove it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, removed.

}

private static void checkEndTags(
MetricsRecord record, String methodName, Status.Code status) {
MetricsRecord record, String methodName, Status.Code status, Type type) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

@zhangkun83 zhangkun83 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zhangkun83 zhangkun83 added the kokoro:force-run Add this label to a PR to tell Kokoro to re-run all tests. Not generally necessary label Nov 10, 2018
@kokoro-team kokoro-team removed the kokoro:force-run Add this label to a PR to tell Kokoro to re-run all tests. Not generally necessary label Nov 10, 2018
@zhangkun83 zhangkun83 merged commit 09b13fe into grpc:master Nov 10, 2018
@songy23 songy23 deleted the opencensus-rpc-constants branch November 10, 2018 01:07
@songy23
Copy link
Contributor Author

songy23 commented Nov 10, 2018

Thanks!

@lock lock bot locked as resolved and limited conversation to collaborators Feb 8, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants