Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Storage support using InfluxDB interesting? #1628

Open
1 task done
goller opened this issue Jun 26, 2017 · 28 comments
Open
1 task done

Storage support using InfluxDB interesting? #1628

goller opened this issue Jun 26, 2017 · 28 comments
Labels
enhancement storage Group label for Storage components

Comments

@goller
Copy link

goller commented Jun 26, 2017

What kind of issue is this?

  • Feature Request. First, look at existing issues to see if the feature has been requested
    before. If you don't find anything, tell us what problem you’re trying to solve. Often a
    solution already exists! Don’t send pull requests to implement new features without first
    getting our support. Sometimes we leave features out on purpose to keep the project small.

I'm interesting in helping to write another storage backend for influxdb. Is this something that would be useful?

A lot of people are using influxdb to store event data and it is really easy to setup. With influx it is pretty straightforward to compare all sorts of metrics.

I'm planning to use the influxdb-java SDK here: https://github.com/influxdata/influxdb-java

@JodeZer
Copy link

JodeZer commented Jun 27, 2017

This is interesting. However, l have no idea how to impl span model on influxdb. What's your design on metrics?

@codefromthecrypt
Copy link
Member

codefromthecrypt commented Jun 28, 2017 via email

@goller
Copy link
Author

goller commented Jun 28, 2017

@adriancole Great! I'll take a look at it to get some ideas for modeling data. I know it would be an effort to maintain an additional backend to zipkin, but, we at Influx are walking the same open source path and are very willing to stay involved.

@codefromthecrypt
Copy link
Member

just tweeted to help highlight the gracious offer to help with this. let's see how it goes!

@goller
Copy link
Author

goller commented Jun 28, 2017

@JodeZer I have some ideas about how to implement the span model on influxdb. Now, this is really rough idea yet. @adriancole 's link to appdash may very well change my ideas, but here we go:

The span's time would be start time.

An individual span would be stored with these tags (indexed):

  • traceID
  • spanID
  • spanName
  • serviceName
  • parentID
  • annotation key

The span's fields (not indexed) would be:

  • duration
  • end time

I'm not sure yet about binary annotations...

In reviewing the QueryRequest.java I think this design would cover most queries.

As for cardinality, I'm planning on leaning on InfluxDB's new TSI engine that allows us to store and query over a billion unique series.

@gianarb
Copy link
Contributor

gianarb commented Jun 28, 2017

Here some info about the new TSI mentioned by @goller https://www.influxdata.com/path-1-billion-time-series-influxdb-high-cardinality-indexing-ready-testing/

@wdalmut
Copy link

wdalmut commented Jun 28, 2017

👍 for me

@codefromthecrypt
Copy link
Member

so for indexing you probably don't need to index parent id. Ironically, duration is something sometimes indexed (as the api allows you to search for duration < > some value. In mysql schema we treat annotation and binary annotation the same (making annotation have a dummy type).
One trick is being able to read-back what you've written. The SpanStoreTest base class will help with this.

hope these notes help (plus looks like you've got a fair amount of interest here!)

@goller
Copy link
Author

goller commented Jul 13, 2017

@adriancole we have been working on various schemas for the queries in SpanStore.

  • getTrace and getRawTrace
    SELECT * from zipkin where "trace_id"='2623801863023620058'
  • getServiceNames
    show tag values with key="service_name"
  • getSpanNames
    show tag values with key="name" where "service_name"='myservice'
  • getDependencies
     select count("duration") from zipkin where time > now() - 30m and time < now() group by "id","parent_id", time(1d)

Because getDependencies aggregates the call counts between parent and child link I have it as a tag.

  • getTraces
    This has a bit of logic to turn QueryRequest into InfluxQL. Here is some pseudo-code:
query = "select * from zipkin where service_name=%s and name=%s and time > %d"

if (annotations)
  foreach:
      query += "and annotation_key=%s and annotation_value=%s", key, value

if (duration) 
      query += "and duration > %d and duration < %d"

query += "limit %d order by time DESC"

@codefromthecrypt
Copy link
Member

codefromthecrypt commented Jul 14, 2017 via email

@goller
Copy link
Author

goller commented Jul 14, 2017

@adriancole Great ty!

Tags in influx mean data that will be indexed for fast lookup (https://docs.influxdata.com/influxdb/v1.2/concepts/schema_and_data_layout/#encode-meta-data-in-tags)

Tags also have special optimized query functions to look them up (e.g. SHOW TAGS)

As for the getDependencies, I'll work on a better query. Thanks for the tips 👍

@codefromthecrypt
Copy link
Member

codefromthecrypt commented Jul 14, 2017 via email

@codefromthecrypt
Copy link
Member

https://github.com/influxdata/telegraf/tree/master/plugins/inputs/zipkin is related. If the format matches up, especially if matching span2, it could be neat to add storage here so folks can query it with zipkin.

@goller
Copy link
Author

goller commented Aug 22, 2017

Hey @adriancole I've been hacking around in jaeger here: uber/jaeger@master...influxdata:master#diff-d3419a852db652ac429192d6bd54262a

in order to support telegraf's zipkin collector plugin

(I'm far more comfortable in go than java!)

I'm pretty happy with the queries at this point and will update our Influx branch here : master...influxdata:feature/influx-store

Regarding span2 with my telegraf refactor today (influxdata/telegraf#3150) , it should be straightforward to add another codec.

@codefromthecrypt
Copy link
Member

codefromthecrypt commented Aug 22, 2017 via email

@goller
Copy link
Author

goller commented Aug 22, 2017

Hey @adriancole, I think I understand !

Are you saying that the storage format should be span2 because the zipkin queries would work against span1 and span2, thus, I should use span2?

@codefromthecrypt
Copy link
Member

@goller yep, trying to save you some effort. We can convert out into span1 format

#1700 will land soon starting the version 2 storage component. I'll do my best to land the rest of it in the next couple days.

@codefromthecrypt
Copy link
Member

PS the new storage component is out (for a little while now), but just in case.

Again, I wouldn't re-introduce binaryAnnotation in any new work as it is complex, harder to query etc. We spent a while ensuring zipkin v2 format could solve these issues.

@codefromthecrypt
Copy link
Member

I'm guessing by the fact that influx folks integrated with jaeger that nothing is planned here by them. Happy to be wrong.

If community members are interested in moving this forward, pull requests welcome. The modeling job is much easier here as we have a simpler v2 json format which doesn't have the complexity present in the v1 model as baked into the telegraph plugin.

@goller
Copy link
Author

goller commented Oct 30, 2017

Hi @adriancole, nope, it's just that I'm a one man show on the tracing front right now. We've delayed this work as we work towards our 1.4 release of InfluxDB. 1.4 will have much better support for very high cardinality, but, it's taking a while to get it stable.

It is certainly my plan to continue to work integrating influxdb and zipkin. Regarding the v2 vs v1 model in telegraf, I very much want to support that as well.

@codefromthecrypt
Copy link
Member

@goller ok cool. Yeah was just weird to see interest drummed up here then a blog post on zipkin showing how to use jaeger.

@gianarb
Copy link
Contributor

gianarb commented Oct 31, 2017

@adriancole InfluxDB is a time series database and the idea is to have it as a backend for both because we think it can be a valuable way to store traces efficiently. As @goller said we are working hard on making it more efficient for this kind of data, plus we are not very java-oriented people. That's why we are using both. It probably creates a bit of confusion and we are sorry about this. But you know how it works :D At some point everything will be ready.

Btw as you said if there is some java dev happy to help here let us know!

@codefromthecrypt
Copy link
Member

codefromthecrypt commented Oct 31, 2017 via email

@gianarb
Copy link
Contributor

gianarb commented Nov 1, 2017

@adriancole Telegraf is modular, we can write a new plugin called zipkin2 at some point. This is not a problem at all. We wrote the plugin because we were looking to build a data flow to test InfluxDB with traces and for us was more comfortable to write a telegraf plugin because we know the code better.

before you guys dropped off the face of the earth

We spoke internally about this and I agree with you, we created a small chaos but only because we are really engaged with tracing. I am sorry about that.

What I am trying to say is that Telegraf is not related to this issue, what I would like to have is influxdb as backend in zipkin. Do you think we should re-start the integration with zipkinv2? People that are using zipkin now will be able to use the influxdb backend in a easy way (just updating zipkin and configuring it properly) ? Or the migration path from zipkin1 to zipkin2 is more complicated?

Thanks!

@codefromthecrypt
Copy link
Member

codefromthecrypt commented Nov 5, 2017 via email

@gianarb
Copy link
Contributor

gianarb commented Nov 6, 2017

Ok, thank you for your clarification. At this point, we can speak internally about how to proceed in order to open a PR here with the new influxdb backend for zipkin2.

@codefromthecrypt
Copy link
Member

codefromthecrypt commented Nov 6, 2017 via email

@gianarb
Copy link
Contributor

gianarb commented Nov 6, 2017

Great, thanks. At the moment my personal idea is to have a influxdb storage up and running in openzipkin. As I said previously the telegraf plugin was for us an easy way to validate our InfluxDB performs with traces and cardinality. We will keep it updated and as best as we can but it's not in the scope of this issue 👍

@shakuzen shakuzen added the storage Group label for Storage components label Oct 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement storage Group label for Storage components
Projects
None yet
Development

No branches or pull requests

6 participants