[Discussion]Export OTel data to AWS S3 (exporter S3) #1963

JohnWu20 · 2021-01-08T18:30:43Z

Is your feature request related to a problem? Please describe.

S3 is a very cheap way to store the data. Create a S3 exporter to export the traces, metrics and log data to AWS S3 bucket.

Describe the solution you'd like

Data format in S3

Traces : Each OTLP format span will send to S3 as a JSON object, and each span will be delimited by a new line.
Metrics: Each OTLP metric will send to S3 as a JSON object, and each metric will be delimited by a new line.
Logs: Each log will send to S3 as a JSON object, and each event will be delimited by a new line.

Upload methods

All the traces, metrics and logs will temporarily store in the disk separately before uploaded them to S3. This will guarantee the reliability.
Also will provide a method to upload directly to S3 without persistent disk.

Please plus 1 this issue if you agree S3 exporter is useful, please comment with suggestions or feedback or feature requests

Are these the right data formats for each type?
Would you be interested in other data formats?
If you are interested in an S3 OTel exporter, please post your use case in a comment, how would you use the data in S3 to improve your operations?

anuraaga · 2021-01-09T02:06:43Z

I think filesystem exporter of traces would be great. It'd probably be better if it's possible to use a storage abstraction so it can work with normal disk or any cloud, maybe stow (I think there are a few of these libraries, don't know which is best)

https://github.com/graymeta/stow

It would be good to measure the performance of writing individual spans va traces first - I don't know if span writing could scale well enough but if it could, it circumvents huge complexity related to having to group a trace first before writing.

Also, there isn't actually any such thing as an OTLP format span, or even trace. All we have are ResourceSpans. I have been thinking it could be good to have a spec'd protocol for an OTLP span with all it's information (basically a span with a resource and ilinfo fields) but we don't have that yet I think and may want to start from there. This discussion affects message queue exporting too.

gramidt · 2021-01-11T16:36:41Z

Would this be meant just for archiving purposes? If not, what sort of tooling would you use to query and analyze this data?

Depending on the tooling you would like to use, you could use Cortex for metrics, Tempo for traces, and Loki for logs (A Loki exporter is currently a WIP, but I anticipate finishing it up later this week (#1894)). All of these systems support various block storage, such as S3, to keep the TCO to a minimum.

anuraaga · 2021-01-12T03:37:35Z

@gramidt I think all of those have ingestion APIs IIUC. I figured this was more about getting things saved, to take advantage of the global/high scale of cloud object storage ingestion frontends. I would probably have a lambda-type of object change listener to then forward it on to a backend like Zipkin or Tempo, probably using a tail-based lower sampling rate because it's not scaled out as much as storage backends. It's something that seems pretty cool

gramidt · 2021-01-12T03:55:01Z

@anuraaga - That's an interesting concept! Curious how this approach would compare to a pub sub implementation if object storage was only used as a means of ingestion?

anuraaga · 2021-01-12T04:40:17Z

@gramidt With straight-to-storage, I guess the difference is it's both injection+archival. Links to traces from logs, for example, could probably work just fine even if just reading from the store, with no real latency to be made available, as soon as it's saved it's fetchable. If it's possible to have 100% saved to storage, and accessible by clicking on a link from a log, while having X% accessible through normal query views, that would be pretty cool. If the tracing backend goes down, having the direct links working is still pretty useful.

Of course, this depends on being able to save 100% to storage - while the storage backends are highly scalable and I would expect pretty good throughput from them, the actual scalability of writing straight to storage with no ingestion frontend in front would need to be verified. But I would generally expect S3, GCS, etc to do a better job at it than me :)

gramidt · 2021-01-12T05:02:29Z

@anuraaga - Having injection+archival is definitely intriguing. 🤔

From a processing perspective, care will need to be taken when determining how writing should work. Systems like Cortex, Tempo, and Loki take great care in buffering and how they write to storage to keep the costs low. In fact early on in Cortex development, Cortex primarily used s3 and writing in compressed "chunks", but users found that most of their costs associated with running Cortex was spent on s3 write operations. Additional thought went into writing multiple chunks into a single s3 object, but decided it was too complex and moved onto other mechanisms to reduce the TCO (blocks).

From an availability perspective, I too believe S3 and GCS have a high degree of reliability that is hard to match. 😄

* Fix panic in otlp traces to zipkin * Gofmt tests * Use common function from testdata for test

* add otlp trace http exporter * remove `WithTraces*` options * address discussions * run make precommit * rename WithURLPath option

github-actions · 2022-11-04T03:54:15Z

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

MovieStoreGuy · 2023-01-27T10:46:56Z

I am going to close this off this off and leave the conversation to be followed up in #2835

andrewhsu added enhancement New feature or request priority:p2 Medium release:allowed-for-ga spec:logs spec:metrics spec:trace labels Jan 20, 2021

dyladan referenced this issue in dynatrace-oss-contrib/opentelemetry-collector-contrib Jan 29, 2021

Fix panic in otlp traces to zipkin (#1963)

6f61725

* Fix panic in otlp traces to zipkin * Gofmt tests * Use common function from testdata for test

gramidt mentioned this issue Feb 12, 2021

REQUEST: New membership for @gramidt open-telemetry/community#648

Closed

6 tasks

alolita added the comp:aws AWS components label Sep 2, 2021

ljmsc referenced this issue in ljmsc/opentelemetry-collector-contrib Feb 21, 2022

add otlp trace http exporter (#1963)

2371bb0

* add otlp trace http exporter * remove `WithTraces*` options * address discussions * run make precommit * rename WithURLPath option

github-actions bot added the Stale label Nov 4, 2022

MovieStoreGuy mentioned this issue Jan 27, 2023

Add an S3 exporter #2835

Closed

MovieStoreGuy closed this as completed Jan 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Discussion]Export OTel data to AWS S3 (exporter S3) #1963

[Discussion]Export OTel data to AWS S3 (exporter S3) #1963

JohnWu20 commented Jan 8, 2021

anuraaga commented Jan 9, 2021 •

edited

Loading

gramidt commented Jan 11, 2021 •

edited

Loading

anuraaga commented Jan 12, 2021 •

edited

Loading

gramidt commented Jan 12, 2021 •

edited

Loading

anuraaga commented Jan 12, 2021 •

edited

Loading

gramidt commented Jan 12, 2021 •

edited

Loading

github-actions bot commented Nov 4, 2022

MovieStoreGuy commented Jan 27, 2023

[Discussion]Export OTel data to AWS S3 (exporter S3) #1963

[Discussion]Export OTel data to AWS S3 (exporter S3) #1963

Comments

JohnWu20 commented Jan 8, 2021

anuraaga commented Jan 9, 2021 • edited Loading

gramidt commented Jan 11, 2021 • edited Loading

anuraaga commented Jan 12, 2021 • edited Loading

gramidt commented Jan 12, 2021 • edited Loading

anuraaga commented Jan 12, 2021 • edited Loading

gramidt commented Jan 12, 2021 • edited Loading

github-actions bot commented Nov 4, 2022

MovieStoreGuy commented Jan 27, 2023

anuraaga commented Jan 9, 2021 •

edited

Loading

gramidt commented Jan 11, 2021 •

edited

Loading

anuraaga commented Jan 12, 2021 •

edited

Loading

gramidt commented Jan 12, 2021 •

edited

Loading

anuraaga commented Jan 12, 2021 •

edited

Loading

gramidt commented Jan 12, 2021 •

edited

Loading