Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Discussion]Export OTel data to AWS S3 (exporter S3) #1963

Closed
JohnWu20 opened this issue Jan 8, 2021 · 8 comments
Closed

[Discussion]Export OTel data to AWS S3 (exporter S3) #1963

JohnWu20 opened this issue Jan 8, 2021 · 8 comments

Comments

@JohnWu20
Copy link
Contributor

JohnWu20 commented Jan 8, 2021

Is your feature request related to a problem? Please describe.

S3 is a very cheap way to store the data. Create a S3 exporter to export the traces, metrics and log data to AWS S3 bucket.

Describe the solution you'd like

Data format in S3

Traces : Each OTLP format span will send to S3 as a JSON object, and each span will be delimited by a new line.
Metrics: Each OTLP metric will send to S3 as a JSON object, and each metric will be delimited by a new line.
Logs: Each log will send to S3 as a JSON object, and each event will be delimited by a new line.

Upload methods

All the traces, metrics and logs will temporarily store in the disk separately before uploaded them to S3. This will guarantee the reliability.
Also will provide a method to upload directly to S3 without persistent disk.

Please plus 1 this issue if you agree S3 exporter is useful, please comment with suggestions or feedback or feature requests

Are these the right data formats for each type?
Would you be interested in other data formats?
If you are interested in an S3 OTel exporter, please post your use case in a comment, how would you use the data in S3 to improve your operations?

@anuraaga
Copy link
Contributor

anuraaga commented Jan 9, 2021

I think filesystem exporter of traces would be great. It'd probably be better if it's possible to use a storage abstraction so it can work with normal disk or any cloud, maybe stow (I think there are a few of these libraries, don't know which is best)

https://github.com/graymeta/stow

It would be good to measure the performance of writing individual spans va traces first - I don't know if span writing could scale well enough but if it could, it circumvents huge complexity related to having to group a trace first before writing.

Also, there isn't actually any such thing as an OTLP format span, or even trace. All we have are ResourceSpans. I have been thinking it could be good to have a spec'd protocol for an OTLP span with all it's information (basically a span with a resource and ilinfo fields) but we don't have that yet I think and may want to start from there. This discussion affects message queue exporting too.

@gramidt
Copy link
Member

gramidt commented Jan 11, 2021

Would this be meant just for archiving purposes? If not, what sort of tooling would you use to query and analyze this data?

Depending on the tooling you would like to use, you could use Cortex for metrics, Tempo for traces, and Loki for logs (A Loki exporter is currently a WIP, but I anticipate finishing it up later this week (#1894)). All of these systems support various block storage, such as S3, to keep the TCO to a minimum.

@anuraaga
Copy link
Contributor

anuraaga commented Jan 12, 2021

@gramidt I think all of those have ingestion APIs IIUC. I figured this was more about getting things saved, to take advantage of the global/high scale of cloud object storage ingestion frontends. I would probably have a lambda-type of object change listener to then forward it on to a backend like Zipkin or Tempo, probably using a tail-based lower sampling rate because it's not scaled out as much as storage backends. It's something that seems pretty cool

@gramidt
Copy link
Member

gramidt commented Jan 12, 2021

@anuraaga - That's an interesting concept! Curious how this approach would compare to a pub sub implementation if object storage was only used as a means of ingestion?

@anuraaga
Copy link
Contributor

anuraaga commented Jan 12, 2021

@gramidt With straight-to-storage, I guess the difference is it's both injection+archival. Links to traces from logs, for example, could probably work just fine even if just reading from the store, with no real latency to be made available, as soon as it's saved it's fetchable. If it's possible to have 100% saved to storage, and accessible by clicking on a link from a log, while having X% accessible through normal query views, that would be pretty cool. If the tracing backend goes down, having the direct links working is still pretty useful.

Of course, this depends on being able to save 100% to storage - while the storage backends are highly scalable and I would expect pretty good throughput from them, the actual scalability of writing straight to storage with no ingestion frontend in front would need to be verified. But I would generally expect S3, GCS, etc to do a better job at it than me :)

@gramidt
Copy link
Member

gramidt commented Jan 12, 2021

@anuraaga - Having injection+archival is definitely intriguing. 🤔

From a processing perspective, care will need to be taken when determining how writing should work. Systems like Cortex, Tempo, and Loki take great care in buffering and how they write to storage to keep the costs low. In fact early on in Cortex development, Cortex primarily used s3 and writing in compressed "chunks", but users found that most of their costs associated with running Cortex was spent on s3 write operations. Additional thought went into writing multiple chunks into a single s3 object, but decided it was too complex and moved onto other mechanisms to reduce the TCO (blocks).

From an availability perspective, I too believe S3 and GCS have a high degree of reliability that is hard to match. 😄

dyladan referenced this issue in dynatrace-oss-contrib/opentelemetry-collector-contrib Jan 29, 2021
* Fix panic in otlp traces to zipkin

* Gofmt tests

* Use common function from testdata for test
@alolita alolita added the comp:aws AWS components label Sep 2, 2021
ljmsc referenced this issue in ljmsc/opentelemetry-collector-contrib Feb 21, 2022
* add otlp trace http exporter

* remove `WithTraces*` options

* address discussions

* run make precommit

* rename WithURLPath option
@github-actions
Copy link
Contributor

github-actions bot commented Nov 4, 2022

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

@MovieStoreGuy
Copy link
Contributor

I am going to close this off this off and leave the conversation to be followed up in #2835

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants