-
Notifications
You must be signed in to change notification settings - Fork 635
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add Jaeger exporter #174
add Jaeger exporter #174
Conversation
This commit adds a Jeager exporter for OpenTelemetry. This exporter is based on https://github.com/census-instrumentation/opencensus-python/tree/master/contrib/opencensus-ext-jaeger. The exporter uses thrift and can be configured to send data to the agent and also to a remote collector.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition to the below comments, I have one more about the generated files: If it is at all possible, I'd really like them to be generated automatically during build (pip install -e
) instead of being checked in to version control. Auto-generated, version-controlled files are usually an antipattern.
.flake8
Outdated
@@ -1,2 +1,3 @@ | |||
[flake8] | |||
ignore = E501,W503,E203 | |||
exclude = .svn,CVS,.bzr,.hg,.git,__pycache__,.tox,ext/opentelemetry-ext-jaeger/src/opentelemetry/ext/jaeger/gen/*,ext/opentelemetry-ext-jaeger/build/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from opentelemetry.sdk import trace | ||
from opentelemetry.sdk.trace import export | ||
|
||
tracer = trace.Tracer() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use the global tracer also in the example, we probably don't want to encourage constructing tracers willy-nilly.
|
||
"""Exports the spans to Jaeger. | ||
|
||
:param service_name: Service that logged an annotation in a trace. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of :param
RST syntax, it's probably more readable to use the Args:
syntax like elsewhere.
agent_host_name: str = DEFAULT_AGENT_HOST_NAME, | ||
agent_port: int = DEFAULT_AGENT_PORT, | ||
collector_host_name: str = None, | ||
collector_port: int = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't there a default port for the collector?
self._collector = Collector(thrift_url=thrift_url, auth=auth) | ||
return self._collector | ||
|
||
def export(self, spans: typing.Sequence[Span]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def export(self, spans: typing.Sequence[Span]): | |
def export(self, spans: typing.Collection[Span]): |
since we don't need the reversible property here (probably this should be changed in the API and is thus out of scope for this PR).
udp_socket.sendto(buff, self.address) | ||
|
||
except Exception as e: # pragma: NO COVER | ||
logging.error(getattr(e, "message", e)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #153 (comment) and #153 (comment)
logging.error(getattr(e, "message", e)) | ||
|
||
finally: | ||
if udp_socket is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since 3.2, socket supports the context manager protocol, so you could use with socket.socket(...) as udp_socket:
instead of finally:
.
|
||
finally: | ||
if self.http_transport.isOpen(): | ||
self.http_transport.close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we really close http_transport after every submit? Is it automatically reopened? Also, it seems dirty to use both a client that wraps the http transport and the http transport directly, but that might be fine with the thrift library, I know nothing about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately I don't have that many answers for you. It's the way it was done in OpenCensus.
I checked and yes, the transport should be close, it automatically opens when flush is called.
@@ -0,0 +1,13 @@ | |||
# Copyright 2019, OpenTelemetry Authors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't add license boilerplate to otherwise completely empty files usually.
py3{4,5,6,7,8}-test-{api,sdk,example-app,ext-wsgi,ext-http-requests} | ||
pypy3-test-{api,sdk,example-app,ext-wsgi,ext-http-requests} | ||
py3{4,5,6,7,8}-test-{api,sdk,example-app,ext-wsgi,ext-http-requests,ext-jaeger} | ||
pypy3-test-{api,sdk,example-app,ext-wsgi,ext-http-requests,ext-jaeger} | ||
lint | ||
py37-{mypy,mypyinstalled} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you use type annotations, it would make sense to also add this package to the mypy environment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not totally sure how to do it. Could you please provide an example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just look at the "mypy:" prefixed lines later in the tox.ini and add a corresponding line for the exporter (though that will almost certainly turn up lots of errors, so maybe do that in a separate PR. Cf. #201).
- use Args: notation in documentation - remove license boilerplate from empty file - use "not x" instead of "len(x) == 0" - rename logger
I know that having generated files could be seen as an anti-pattern, but I also think it makes things easier, specially because of the dependency of a specific thrift version compiler that would be needed. |
We had similar discussion in OpenCensus among @c24t @reyang and @songy23. |
+1 to not having the gen-files in the version control as they may quickly become out-of-sync with upstream. |
deque.copy() was introduced in python3.5, this commit removes it's usage and replaces it for deque(another_deque).
use deque(dq) instead of deque.copy()
I was thinking about including the thrift files here or as a submodule but at the end the synchronization process has to be done manually. Do you know which approach will be the best here? |
It should be possible to generate the files automatically from |
I updated the PR to generate the files using thrift. Now the thrift files are included in this repo, it is not clear for me how can we handle that, maybe a submodule?, btw, I am not the only one wondering how to do it: open-telemetry/opentelemetry-java#235. |
tox.ini
Outdated
@@ -82,6 +86,8 @@ commands = | |||
ext/opentelemetry-ext-azure-monitor/examples/ \ | |||
ext/opentelemetry-ext-azure-monitor/src/ \ | |||
ext/opentelemetry-ext-azure-monitor/tests/ \ | |||
ext/opentelemetry-ext-azure-monitor/src/ \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ext/opentelemetry-ext-azure-monitor/src/ \ | |
ext/opentelemetry-ext-jaeger/src/ \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It explains a lot of things. Thanks for catching this up!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although I have some minor comments/suggestions, I think we should get this merged first instead of blocking it indefinitely (given the nature it is a separate exporter package).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll have to take another pass to look at the tests and jaeger-specific logic. I've got a few small comments on the implementation and one big question about the installation.
If the only alternative to checking in the generated thrift files right now is to depend on docker, I think we're better off checking them in.
thrift_url = "http://{}:{}{}".format( | ||
self.collector_host_name, | ||
self.collector_port, | ||
self.collector_endpoint or DEFAULT_COLLECTOR_ENDPOINT, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You want this default here and in the constructor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somehow I messed it up in the porting. Only the constructor one is needed.
if code >= 300 or code < 200: | ||
logger.error( | ||
"Traces cannot be uploaded;\ | ||
HTTP status code: {}, message {}".format( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HTTP status code: {}, message {}".format( | |
HTTP status code: %s, message %s", |
Doing it this way means interpolation only happens if the logger is enabled. And IMO that line break in the error message is going to cause more harm than good.
I know this is copypasta from OC, but this seems like a good time to fix it.
|
||
# set basic auth header | ||
if auth is not None: | ||
import base64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does base64
have some import-time side-effect that we're trying to avoid here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so, just moved outside.
self, | ||
thrift_url: str = "", | ||
auth: typing.Tuple[str, str] = None, | ||
client=jaeger.Client, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why bother annotating only half of the args?
def __init__( | ||
self, | ||
thrift_url: str = "", | ||
auth: typing.Tuple[str, str] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this need to be Optional[Tuple[str, str]]
to match the default None
value? mypy
doesn't seem to complain about it, but that may be because it's masked by other problems.
service_name: str = "my_service", | ||
agent_host_name: str = DEFAULT_AGENT_HOST_NAME, | ||
agent_port: int = DEFAULT_AGENT_PORT, | ||
collector_host_name: str = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here and elsewhere: it looks like this needs to be Optional[str]
, unless I'm misunderstanding https://mypy.readthedocs.io/en/latest/kinds_of_types.html#optional-types-and-the-none-type (or we expect to check this module with --no-strict-optional
).
Since we're not actually checking this module's type annotations ourselves, it's likely that they'll become stale, or are wrong already. mypy
turns up errors here, even using the mypy-relaxed.ini
config. I think we should either add a test to check these annotations or remove them completely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are right, however I tried a simple example:
import typing
def myf(x: str = None):
print("hello")
myf(0)
Prints:
error: Argument 1 to "myf" has incompatible type "int"; expected "Optional[str]"
.
It appears that given the default value is None
mypy guesses it should be Optional
, I think we should not rely on this, I'll add Optional
explicitly.
It is my bad, I thought type annotations were being checked already, just found it #201. I'll add this test then.
|
||
for span in spans: | ||
trace_id = span.get_context().trace_id | ||
span_id = span.get_context().span_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: looks like you could avoid the second call to get_context
?
@@ -0,0 +1,27 @@ | |||
# Copyright (c) 2016 Uber Technologies, Inc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure seems like we need to put these in a separate opentelemetry-protos
repo.
with open(VERSION_FILENAME) as f: | ||
exec(f.read(), PACKAGE_INFO) | ||
|
||
BASE_CMD = """docker run --user `id -u` -v "$PWD:/data" \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This adds an install-time dependency on docker? That is a pretty big dependency.
Also it looks like the install depends on the pwd being the dir that includes setup.py
?
% python ext/opentelemetry-ext-jaeger/setup.py install
running install
[FAILURE:arguments:1] Could not open input file with realpath: /data/thrift/agent.thrift
[FAILURE:arguments:1] Could not open input file with realpath: /data/thrift/zipkincore.thrift
[FAILURE:arguments:1] Could not open input file with realpath: /data/thrift/jaeger.thriftrunning build
I understand not wanting to check in the generated thrift files, but we should consider generating them before releases instead. As it is now, a user would have to have docker
installed to install this package via setup.py install
. What about if they're installing from a wheel? How do we include the generated files in the distribution?
@Oberon00 how do you make this work with built wheels? Ideally |
AFAIK, when building wheels, |
It seems to me that there is a lot of discussion to be done around generating the files. It looks really cumbersome for a first implementation of the Jaeger exporter, if there are not objections I'll put the generated files back in this PR again, otherwise it'll be hold too long waiting for that discussion. |
- remove "optional" from docs - fix some type annotations - make pylint check the code - fix pylint issues
there is a long discussion ongoing yet about how to include generated files in the repo, so for now just put them here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove the "my_service" default, otherwise LGTM.
|
||
def __init__( | ||
self, | ||
service_name: str = "my_service", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, please make this a mandatory argument!
I updated the PR to include the generated files in this PR, this is something that we'll revisit in the future. |
@mauriciovasquezbernal from my perspective, you could merge this right now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding the generated files LGTM. It looks like there are still some inconsistent type annotations, but no other blocking comments from me.
self.collector.submit(batch) | ||
self.agent_client.emit(batch) | ||
|
||
return SpanExportResult.SUCCESS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A problem for another PR, but if agent_client.emit
fails above we should eat the exception and report a failure here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, the trickly part is to understand if return FAILED_RETRYABLE
or FAILED_NOT_RETRYABLE
depending on the exception type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd just return FAILED_NOT_RETRYABLE
if in doubt.
start_time_us = span.start_time / 1e3 | ||
duration_us = (span.end_time - span.start_time) / 1e3 | ||
|
||
parent_id = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be INVALID_SPAN_ID
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so, OpenCensus implementation uses 0 to describe the root span: https://github.com/census-instrumentation/opencensus-python/blob/3f01007a05dcae84cc912a28588d3074fff4f587/contrib/opencensus-ext-jaeger/opencensus/ext/jaeger/trace_exporter/__init__.py#L216.
trace_id = ctx.trace_id | ||
span_id = ctx.span_id | ||
|
||
start_time_us = span.start_time / 1e3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
start_time_us = span.start_time / 1e3 | |
start_time_us = span.start_time // 1e3 |
To save you the int cast below
span_id = ctx.span_id | ||
|
||
start_time_us = span.start_time / 1e3 | ||
duration_us = (span.end_time - span.start_time) / 1e3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
duration_us = (span.end_time - span.start_time) / 1e3 | |
duration_us = (span.end_time - span.start_time) // 1e3 |
were are not checking annotations with mypy, so remove types annotation completely
I removed all type annotations since mypy is not configured to check them. We can address it later on. |
👍 I'll go ahead and merge this now. |
for start_time and end_time Make lint happy Addressing comments Addressing comments Allowing 0 as start and end time Fix lint issues Metrics API RFC 0003 cont'd (open-telemetry#136) * Create functions Comments for Meter More comments Add more comments Fix typos * fix lint * Fix lint * fix typing * Remove options, constructors, seperate labels * Consistent naming for float and int * Abstract time series * Use ABC * Fix typo * Fix docs * seperate measure classes * Add examples * fix lint * Update to RFC 0003 * Add spancontext, measurebatch * Fix docs * Fix comments * fix lint * fix lint * fix lint * skip examples * white space * fix spacing * fix imports * fix imports * LabelValues to str * Black formatting * fix isort * Remove aggregation * Fix names * Remove aggregation from docs * Fix lint * metric changes * Typing * Fix lint * Fix lint * Add space * Fix lint * fix comments * address comments * fix comments Adding a working propagator, adding to integrations and example (open-telemetry#137) Adding a full, end-to-end example of propagation at work in the example application, including a test. Adding the use of propagators into the integrations. Metrics API RFC 0009 (open-telemetry#140) * Create functions Comments for Meter More comments Add more comments Fix typos * fix lint * Fix lint * fix typing * Remove options, constructors, seperate labels * Consistent naming for float and int * Abstract time series * Use ABC * Fix typo * Fix docs * seperate measure classes * Add examples * fix lint * Update to RFC 0003 * Add spancontext, measurebatch * Fix docs * Fix comments * fix lint * fix lint * fix lint * skip examples * white space * fix spacing * fix imports * fix imports * LabelValues to str * Black formatting * fix isort * Remove aggregation * Fix names * Remove aggregation from docs * Fix lint * metric changes * Typing * Fix lint * Fix lint * Add space * Fix lint * fix comments * handle, recordbatch * docs * Update recordbatch * black * Fix typo * remove ValueType * fix lint Console exporter (open-telemetry#156) Make use_span more flexible (closes open-telemetry#147). (open-telemetry#154) Co-Authored-By: Reiley Yang <reyang@microsoft.com> Co-Authored-By: Chris Kleinknecht <libc@google.com> WSGI fixes (open-telemetry#148) Fix http.url. Don't delay calling wrapped app. Skeleton for azure monitor exporters (open-telemetry#151) Add link to docs to README (open-telemetry#170) Move example app to the examples folder (open-telemetry#172) WSGI: Fix port 80 always appended in http.host (open-telemetry#173) Build and host docs via github action (open-telemetry#167) Add missing license boilerplate to a few files (open-telemetry#176) sdk/trace/exporters: add batch span processor exporter (open-telemetry#153) The exporters specification states that two built-in span processors should be implemented, the simple processor span and the batch processor span. This commit implements the latter, it is mainly based on the opentelemetry/java one. The algorithm implements the following logic: - a condition variable is used to notify the worker thread in case the queue is half full, so that exporting can start before the queue gets full and spans are dropped. - export is called each schedule_delay_millis if there is a least one new span to export. - when the processor is shutdown all remaining spans are exported. Implementing W3C TraceContext (fixes open-telemetry#116) (open-telemetry#180) * Implementing TraceContext (fixes open-telemetry#116) This introduces a w3c TraceContext propagator, primarily inspired by opencensus. fix time conversion bug (open-telemetry#182) Introduce Context.suppress_instrumentation (open-telemetry#181) Metrics Implementation (open-telemetry#160) * Create functions Comments for Meter More comments Add more comments Fix typos * fix lint * Fix lint * fix typing * Remove options, constructors, seperate labels * Consistent naming for float and int * Abstract time series * Use ABC * Fix typo * Fix docs * seperate measure classes * Add examples * fix lint * Update to RFC 0003 * Add spancontext, measurebatch * Fix docs * Fix comments * fix lint * fix lint * fix lint * skip examples * white space * fix spacing * fix imports * fix imports * LabelValues to str * Black formatting * fix isort * Remove aggregation * Fix names * Remove aggregation from docs * Fix lint * metric changes * Typing * Fix lint * Fix lint * Add space * Fix lint * fix comments * handle, recordbatch * docs * Update recordbatch * black * Fix typo * remove ValueType * fix lint * sdk * metrics * example * counter * Tests * Address comments * ADd tests * Fix typing and examples * black * fix lint * remove override * Fix tests * mypy * fix lint * fix type * fix typing * fix tests * isort * isort * isort * isort * noop * lint * lint * fix tuple typing * fix type * black * address comments * fix type * fix lint * remove imports * default tests * fix lint * usse sequence * remove ellipses * remove ellipses * black * Fix typo * fix example * fix type * fix type * address comments Implement Azure Monitor Exporter (open-telemetry#175) Span add override parameters for start_time and end_time (open-telemetry#179) CONTRIBUTING.md: Fix clone URL (open-telemetry#177) Add B3 exporter to alpha release table (open-telemetry#164) Update README for alpha release (open-telemetry#189) Update Contributing.md doc (open-telemetry#194) Add **simple** client/server examples (open-telemetry#191) Remove unused dev-requirements.txt (open-telemetry#200) The requirements are contained in tox.ini now. Fx bug in BoundedList for Python 3.4 and add tests (open-telemetry#199) * fix bug in BoundedList for python 3.4 and add tests collections.deque.copy() was introduced in python 3.5, this commit changes that by the deque constructor and adds some tests to BoundedList and BoundedDict to avoid similar problems in the future. Also, improve docstrings of BoundedList and BoundedDict classes Move util.time_ns to API. (open-telemetry#205) Add Jaeger exporter (open-telemetry#174) This adds a Jeager exporter for OpenTelemetry. This exporter is based on https://github.com/census-instrumentation/opencensus-python/tree/master/contrib/opencensus-ext-jaeger. The exporter uses thrift and can be configured to send data to the agent and also to a remote collector. There is a long discussion going on about how to include generated files in the repo, so for now just put them here. Add code coverage Revert latest commit Fix some "errors" found by mypy. (open-telemetry#204) Fix some errors found by mypy (split from open-telemetry#201). Update README for new milestones (open-telemetry#218) Refactor current span handling for newly created spans. (open-telemetry#198) 1. Make Tracer.start_span() simply create and start the Span, without setting it as the current instance. 2. Add an extra Tracer.start_as_current_span() to create the Span and set it as the current instance automatically. Co-Authored-By: Chris Kleinknecht <libc@google.com> Add set_status to Span (open-telemetry#213) Initial commit Initial version
|
||
|
||
.. _Jaeger: https://www.jaegertracing.io/ | ||
.. _OpenTelemetry: https://github.com/opentelemetry/opentelemetry-python/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI: this link is incorrect. it should be open-telemetry for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!, I think I'm contributor number one with more typos done...
Will be solved in #289.
This commit adds a Jeager exporter for OpenTelemetry. This exporter is based
on https://github.com/census-instrumentation/opencensus-python/tree/master/contrib/opencensus-ext-jaeger.
The exporter uses thrift and can be configured to send data to the agent and
also to a remote collector.
Fixes: #162