-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Storage support using InfluxDB interesting? #1628
Comments
This is interesting. However, l have no idea how to impl span model on influxdb. What's your design on metrics? |
You might be able to harvest some ideas from appdash, which doesn't seem to
be active, but includes influxdb support. Their model is different than
zipkin, but many of the ideas could be portable
sourcegraph/appdash#99
|
@adriancole Great! I'll take a look at it to get some ideas for modeling data. I know it would be an effort to maintain an additional backend to zipkin, but, we at Influx are walking the same open source path and are very willing to stay involved. |
just tweeted to help highlight the gracious offer to help with this. let's see how it goes! |
@JodeZer I have some ideas about how to implement the span model on influxdb. Now, this is really rough idea yet. @adriancole 's link to appdash may very well change my ideas, but here we go: The span's An individual span would be stored with these tags (indexed):
The span's fields (not indexed) would be:
I'm not sure yet about binary annotations... In reviewing the As for cardinality, I'm planning on leaning on InfluxDB's new TSI engine that allows us to store and query over a billion unique series. |
Here some info about the new TSI mentioned by @goller https://www.influxdata.com/path-1-billion-time-series-influxdb-high-cardinality-indexing-ready-testing/ |
👍 for me |
so for indexing you probably don't need to index parent id. Ironically, duration is something sometimes indexed (as the api allows you to search for duration < > some value. In mysql schema we treat annotation and binary annotation the same (making annotation have a dummy type). hope these notes help (plus looks like you've got a fair amount of interest here!) |
@adriancole we have been working on various schemas for the queries in
Because
query = "select * from zipkin where service_name=%s and name=%s and time > %d"
if (annotations)
foreach:
query += "and annotation_key=%s and annotation_value=%s", key, value
if (duration)
query += "and duration > %d and duration < %d"
query += "limit %d order by time DESC" |
Hi, Chris. At first glance, things except getDependencies look good
(dependencies are a bit more than what's pasted here due to the tree)
One thing to be careful about is not storing service_name and name as a
tag. These are special values, separate from tags aka binary annotations.
We don't have any rule that reserves tag names, for example.
PPS I will be trying this with elasticsearch soon. The data format is a bit
simpler and may or may not help you:
#1651
|
@adriancole Great ty! Tags in influx mean data that will be indexed for fast lookup (https://docs.influxdata.com/influxdb/v1.2/concepts/schema_and_data_layout/#encode-meta-data-in-tags) Tags also have special optimized query functions to look them up (e.g. As for the |
Tags in influx mean data that will be indexed for fast lookup (
https://docs.influxdata.com/influxdb/v1.2/concepts/schema_
and_data_layout/#encode-meta-data-in-tags)
Tags also have special optimized query functions to look them up (e.g. SHOW
TAGS)
ah tags naming overlap. ty for the clarification!
|
https://github.com/influxdata/telegraf/tree/master/plugins/inputs/zipkin is related. If the format matches up, especially if matching span2, it could be neat to add storage here so folks can query it with zipkin. |
Hey @adriancole I've been hacking around in jaeger here: uber/jaeger@master...influxdata:master#diff-d3419a852db652ac429192d6bd54262a in order to support telegraf's zipkin collector plugin (I'm far more comfortable in go than java!) I'm pretty happy with the queries at this point and will update our Influx branch here : master...influxdata:feature/influx-store Regarding span2 with my telegraf refactor today (influxdata/telegraf#3150) , it should be straightforward to add another codec. |
Hi, chris. Thanks for the update.
I'd suggest only using span2 vs having multiple codecs on storage.
Particularly it is a lot easier to reason with when you know how something
will be stored. span2 only supports string tags and also handles endpoint
lookup neatly. This stuff will help once this progresses to the point of
integration test (ex SpanStoreIntegrationTest or DependenciesTest)
Jaeger, despite currently accepting zipkin format, is a different system: I
wouldn't consider passing tests on one meaning passing tests on another.
Zipkin's format and what passes are defined here by integration tests and
specs etc.
|
Hey @adriancole, I think I understand ! Are you saying that the storage format should be span2 because the zipkin queries would work against span1 and span2, thus, I should use span2? |
PS the new storage component is out (for a little while now), but just in case. Again, I wouldn't re-introduce binaryAnnotation in any new work as it is complex, harder to query etc. We spent a while ensuring zipkin v2 format could solve these issues. |
I'm guessing by the fact that influx folks integrated with jaeger that nothing is planned here by them. Happy to be wrong. If community members are interested in moving this forward, pull requests welcome. The modeling job is much easier here as we have a simpler v2 json format which doesn't have the complexity present in the v1 model as baked into the telegraph plugin. |
Hi @adriancole, nope, it's just that I'm a one man show on the tracing front right now. We've delayed this work as we work towards our 1.4 release of InfluxDB. 1.4 will have much better support for very high cardinality, but, it's taking a while to get it stable. It is certainly my plan to continue to work integrating influxdb and zipkin. Regarding the v2 vs v1 model in telegraf, I very much want to support that as well. |
@goller ok cool. Yeah was just weird to see interest drummed up here then a blog post on zipkin showing how to use jaeger. |
@adriancole InfluxDB is a time series database and the idea is to have it as a backend for both because we think it can be a valuable way to store traces efficiently. As @goller said we are working hard on making it more efficient for this kind of data, plus we are not very java-oriented people. That's why we are using both. It probably creates a bit of confusion and we are sorry about this. But you know how it works :D At some point everything will be ready. Btw as you said if there is some java dev happy to help here let us know! |
That's why we are using both. It probably creates a bit of confusion and
we are sorry about this. But you know how it works :D At some point
everything will be ready.
yeah in hindsight if it were a resource problem, probably better to ask for
help because I've helped with others for example. For example, I suspect
the schema of "zipkin" in your telegraph plugin is now v1 format, which is
unfortunate as it is likely more complex than needed and would introduce
complexity to migrate if ever to v2. If in hindsight asked for help, I
could have helped you with the java part so that the schema could be
simple. Also, we wouldn't have the awkward situation where a team who
intentionally compete with zipkin's community (jaeger) are promoted with
"zipkin" blog posts.
Btw as you said if there is some java dev happy to help here let us know!
Ideally, if zipkin could own the format of the zipkin plugin that could
make things right. Personally I spent a good deal of effort in attempts to
roll it out in support of your plugin before you guys dropped off the face
of the earth :) Are you up to at least changing the schema to v2? I would
help with the java side if so.
|
@adriancole Telegraf is modular, we can write a new plugin called zipkin2 at some point. This is not a problem at all. We wrote the plugin because we were looking to build a data flow to test InfluxDB with traces and for us was more comfortable to write a telegraf plugin because we know the code better.
We spoke internally about this and I agree with you, we created a small chaos but only because we are really engaged with tracing. I am sorry about that. What I am trying to say is that Telegraf is not related to this issue, what I would like to have is influxdb as backend in zipkin. Do you think we should re-start the integration with zipkinv2? People that are using zipkin now will be able to use the influxdb backend in a easy way (just updating zipkin and configuring it properly) ? Or the migration path from zipkin1 to zipkin2 is more complicated? Thanks! |
@adriancole <https://github.com/adriancole> Telegraf is modular, we can
write a new plugin called zipkin2 at some point. This is not a problem at
all. We wrote the plugin because we were looking to build a data flow to
test InfluxDB with traces and for us was more comfortable to write a
telegraf plugin because we know the code better.
Not sure how things work, but for example if the schema is the same between
the both, sounds good. It would be a pain if you'd have to read two
different formats from influx. Less a pain if the telegraf converted
zipkin1 format to zipkin2 on the way in (so that the storage is coherent).
We spoke internally about this and I agree with you, we created a small
chaos but only because we are really engaged with tracing. I am sorry about
that.
appreciate you saying this.
What I am trying to say is that Telegraf is not related to this issue,
what I would like to have is influxdb as backend in zipkin. Do you think we
should re-start the integration with zipkinv2? People that are using zipkin
now will be able to use the influxdb backend in a easy way (just updating
zipkin and configuring it properly) ? Or the migration path from zipkin1 to
zipkin2 is more complicated?
the main thing is that the schema used in telegraf is v2 (it is simpler to
design it this way and simpler for long-term). For example, we convert v1
to v2 format so that queries on data are always in v2. User apps don't need
to change. So basically, yeah I'd restart the effort in v2 format. Thanks
for asking!
|
Ok, thank you for your clarification. At this point, we can speak internally about how to proceed in order to open a PR here with the new influxdb backend for zipkin2. |
Ok, thank you for your clarification. At this point, we can speak
internally about how to proceed in order to open a PR here with the new
influxdb backend for zipkin2.
sounds good! ps https://github.com/openzipkin/zipkin-go will very soon have
a zipkin2 model in go. You might want to watch the repo or #1778 (most
likely a go host agent) in case any code there becomes helpful to telegraf
in the future.
|
Great, thanks. At the moment my personal idea is to have a influxdb storage up and running in openzipkin. As I said previously the telegraf plugin was for us an easy way to validate our InfluxDB performs with traces and cardinality. We will keep it updated and as best as we can but it's not in the scope of this issue 👍 |
What kind of issue is this?
before. If you don't find anything, tell us what problem you’re trying to solve. Often a
solution already exists! Don’t send pull requests to implement new features without first
getting our support. Sometimes we leave features out on purpose to keep the project small.
I'm interesting in helping to write another storage backend for influxdb. Is this something that would be useful?
A lot of people are using influxdb to store event data and it is really easy to setup. With influx it is pretty straightforward to compare all sorts of metrics.
I'm planning to use the influxdb-java SDK here: https://github.com/influxdata/influxdb-java
The text was updated successfully, but these errors were encountered: