-
Notifications
You must be signed in to change notification settings - Fork 635
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/tracecontext integration test #228
Feature/tracecontext integration test #228
Conversation
there are two outstanding issues (in addition to the ows failing test case that I believed needs to be fixed upstream):
I am working toward fixing these issues. |
@@ -44,60 +44,6 @@ def test_no_traceparent_header(self): | |||
span_context = FORMAT.extract(get_as_list, output) | |||
self.assertTrue(isinstance(span_context, trace.SpanContext)) | |||
|
|||
def test_from_headers_tracestate_entry_limit(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these tests were invalid behaviors in reference to the spec. I removed them as the integration test will validate it is working as intended.
@@ -304,6 +305,31 @@ def format_span_id(span_id: int) -> str: | |||
return "0x{:016x}".format(span_id) | |||
|
|||
|
|||
def generate_span_id() -> int: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the w3c tracecontext spec calls for generating valid span contexts in the case where one cannot parse tracecontext headers. As such it was necessary to lift this code into the API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Requesting changes because I think we need to think the change with generate_spancontext through. I think we should rather fix the handling of INVALID_SPANCONTEXT in the SDK. Moving generate_spancontext to the API would break open-telemetry/oteps#58 (even though I'm not sure if that will be merged anytime soon, I'd rather not make it impossible to implement). Note that we encountered this problem already in #226 which uses a different solution for the same problem.
I had the same problem while implementing the Jaeger exporter, I had to add exceptions to the different checkers to solve that. Maybe you can take some inspiration from it: #174. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to my understanding there are three different places where to handle the case when the incoming request doesn't contain valid trace information:
- The propagator (proposed in this PR).
- The integration (proposed in ext/wsgi: use current span when extracting fails #226).
- SDK (what @Oberon00 is proposing).
I'm not sure at this point what is the best place to do that, the only thing I can say is that we have to document it so people implementing it understand it's their responsibility to take care of it.
examples/trace/server.py
Outdated
) | ||
return "hello" | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is better to move this to something under the tests, it could confuse people that is looking at the example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can move this, just wanted to re-use existing examples. But I see the argument to not pollute short and sweet ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice, great to see that we're using the W3C tests.
You might want to rebase to pick up #229 and make sure this still works as expected.
I agree that verify_tracecontext
should move into its own test file, even if it means duplicating code with the example. I also think we should find a way of getting the W3C test code that doesn't involve cloning it again with every run, but I don't know that a git submodule is the best solution.
Are you planning to fix the other test errors before merging?
# clone w3c tracecontext tests | ||
mkdir -p target | ||
rm -rf ./target/trace-context | ||
git clone https://github.com/w3c/trace-context ./target/trace-context |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity: why do it this way instead of adding as a submodule?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think submodules work fine here, but generally track a branch vs a commit. I'll look through and may re-add as a followup, or fix it here depending on approvers.
@@ -0,0 +1,26 @@ | |||
#!/bin/bash | |||
# set -e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be commented out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, will re-add.
@@ -0,0 +1,26 @@ | |||
#!/bin/bash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#!/bin/bash | |
#!/usr/bin/env bash |
I think /bin/bash
is usually safe, but better to use env
in case the user wants a different bash
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK only /bin/sh is safe (sh is guaranteed by POSIX to exist)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
happy to take lowest common denominator to reduced potential issues. I'll go with /bin/sh.
# send a sigint, to ensure | ||
# it is caught as a KeyboardInterrupt in the | ||
# example service. | ||
kill -2 $EXAMPLE_SERVER_PID |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem to kill the development server?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative way is to run the server, spawn a separate unit test, fetch the test result and shutdown the server. I've explored this here and it seems to work well.
With this approach we don't need to wait for indefinite time and later kill the server process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was trying to find a quick way to do this in bash. I can send a standard kill signal which will kill the server, but at the cost of not calling shutdown.
In general I'll try to vet this a little more. Thanks @reyang for the example, I'll switch to a python script if I can't get the bash one fixed up more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@c24t for clarification, I've resolved the issue that kept the dev server from being killed (just switched it back to a sigkill). Was there something else you think needed to be improved aside from that?
The approach Reiley linked will not print out the traces as the unit tests enact them, which I found was critical to debugging the issues. The current output intermingles them as shown in https://travis-ci.org/open-telemetry/opentelemetry-python/jobs/604220488#L1083
tox.ini
Outdated
basepython: python3.7 | ||
deps = | ||
# needed for tracecontext | ||
aiohttp~=3.6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aiohttp~=3.6 | |
aiohttp~=3.6 |
tox.ini
Outdated
pip install -e {toxinidir}/ext/opentelemetry-ext-http-requests | ||
pip install -e {toxinidir}/ext/opentelemetry-ext-wsgi | ||
|
||
commands = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
commands = | |
commands = |
I'm still seeing these errors:
|
opentelemetry-api/src/opentelemetry/context/propagation/tracecontexthttptextformat.py
Outdated
Show resolved
Hide resolved
type(self).__name__, | ||
format_trace_id(self.trace_id), | ||
format_span_id(self.span_id), | ||
repr(self.trace_state), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"trace_state={!r}".format(self.trace_state)
@@ -1,5 +1,4 @@ | |||
# Copyright 2019, OpenTelemetry Authors | |||
# |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reason to remove this line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably my fat finger, will remove.
} | ||
trap on-shutdown EXIT | ||
cd ./target/trace-context/test | ||
python test.py http://127.0.0.1:5000/verify-tracecontext |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: empty line before EOF
Dismissing my review, as I don't have time to follow this at the moment
791baaf
to
bdb607b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks great to me that the tracecontext test is now integrated.
I left some general comments.
examples/trace/server.py
Outdated
@@ -14,6 +14,8 @@ | |||
# See the License for the specific language governing permissions and | |||
# limitations under the License. | |||
|
|||
import json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, weird that linting didn't catch that
examples/trace/server.py
Outdated
@@ -44,7 +46,9 @@ | |||
|
|||
|
|||
@app.route("/") | |||
def hello(): | |||
def index(): | |||
"""An example which starts a span within the span created for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a nice comment but I think it's not related to this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I previously added it when the validation server was joined. I've rebased in c24t's changes so this will disappear.
# typing.Dict's update is not recognized by pylint: | ||
# https://github.com/PyCQA/pylint/issues/2420 | ||
tracestate[key] = value # pylint:disable=E1137 | ||
if len(tracestate) > _TRACECONTEXT_MAXIMUM_TRACESTATE_KEYS: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this check be moved inside the loop by using a counter?
Having the check outside means the parsing of all headers is done before. Maybe this could be a treat for DoS attacks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably DoS could be achieved other ways, like adding a huge list of values that need to be parsed by regex. but agreed this could be a quick shortcut out.
@@ -62,6 +62,7 @@ | |||
""" | |||
|
|||
import enum | |||
import random |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's weird.. shouldn't linting have caught that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will remove.
}, | ||
) | ||
self.assertEqual(span_context, trace.INVALID_SPAN_CONTEXT) | ||
self.assertNotEqual(span_context.span_id, "1234567890123456") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although this assertion is not wrong, it is implicit in the previous one.
@@ -213,3 +169,31 @@ def test_propagate_invalid_context(self): | |||
output = {} # type:typing.Dict[str, str] | |||
FORMAT.inject(trace.INVALID_SPAN_CONTEXT, dict.__setitem__, output) | |||
self.assertFalse("traceparent" in output) | |||
|
|||
def test_tracestate_empty_header(self): | |||
"""Do not propagate invalid trace context. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this comment accurate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry copy-pasted boilerplate. will fix.
@@ -301,7 +300,6 @@ def set_status(self, status: trace_api.Status) -> None: | |||
|
|||
def generate_span_id() -> int: | |||
"""Get a new random span ID. | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a particular reason to remove these empty lines?
from opentelemetry.ext import http_requests | ||
from opentelemetry.ext.wsgi import OpenTelemetryMiddleware | ||
from opentelemetry.sdk.trace import Tracer | ||
from opentelemetry.sdk.trace.export import ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the consol exporter used for?
To have debug information in case the tests fail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correct. If you look at the output of the build, it actually logs the trace information as the test is executing, which was really helpful when it's a bit harder to pdb into separate processes.
@@ -0,0 +1,75 @@ | |||
#!/usr/bin/env python3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some doubts about the location of this file. Shouldn't it be on opentelemetry-api/tests/context/propagation
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That could be a good place, but I felt it wasn't accurate as currently the tracecontext test suite would fail without SDK behavior (specifically creating new spans when a spancontext is invalid from the propagator).
Verifying that our tracecontext is compliant with the w3c tracecontext reference is valuable. Adding a tox command to verify that the TraceContext propagator adheres to the w3c spec.
As the tracecontext spec calls for the creation of new, valid spans in the case of recieving invalid data from headers, it is necessary to have functions that generate valid span and trace ids.
…m ones. This fixes all the errors, leaving 6 failures.
The tracecontexthttptextformat now adheres completely to the w3c tracecontext test suite. moving the test endpoint to a non-root, to ensure that the basic example is clear. Adding unit tests to test_tracecontexhttptextformat that were helpful.
moving the generate span / trace id methods back to API. no longer needed due to open-telemetry#235 moving test service to it's own module. modifying shell script to use bourne shell, using posix standard location
Ensuring resources installed to the target directory are not included in style and linting. Modifying tox invocation to include python version to ensure it's called by travis-ci. Fixing tests that are no longer valid due to previous changes ( tracecontext returning INVALID_SPAN, start_as_current_span called)
8d24e0a
to
16a95d5
Compare
16a95d5
to
2bf7d12
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
I think there's still some cleanup to do in the tox file and script (in particular, running the test server as #228 (comment)), but the tests are valuable enough that I think we ought to merge them in and follow up with improvements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
I think further improvements can be done in follow up PRs so we can have it for our today's release.
@mauriciovasquezbernal it looks like you approved the changes, but not the full PR. can you approve the full PR? Currently only c24t is listed as approved. |
@toumorokoshi I am not an approver on this repo so mine doesn't count. I just marked it as approved to signal that it looks good to me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Closes open-telemetry#193 Signed-off-by: Olivier Albertini <olivier.albertini@montreal.ca>
This introduces the w3c tracing validation service as an integration test for opentelemetry's tracecontext implementation, enabling us to validate our tracecontext implementation without authoring another complete suite.
Currently the integration test fails due to w3c/trace-context#341
This includes fixes to the tracecontexthttptextformatter to adhere to the specification.