Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an S3 exporter #2835

Closed
rakyll opened this issue Mar 23, 2021 · 54 comments · Fixed by #20912
Closed

Add an S3 exporter #2835

rakyll opened this issue Mar 23, 2021 · 54 comments · Fixed by #20912
Labels
Accepted Component New component has been sponsored comp:aws AWS components enhancement New feature or request exporter/awss3

Comments

@rakyll
Copy link
Contributor

rakyll commented Mar 23, 2021

Various Otel users want to export raw telemetry data for long-term storage and analysis. We should add an S3 exporter that exports the incoming OTLP data points to sharded S3 objects in a bucket. Sharding can be done per telemetry type (metrics, traces, logs) and time (window to be configured by the user). Also we can consider serializing into Ion at the collector optionally.

(You can assign this issue to me.)

@anuraaga
Copy link
Contributor

@rakyll What format are you thinking for storage? There's some discussion on open-telemetry/opentelemetry-specification#1443 but I don't think we have a good format defined for stored traces yet.

@bogdandrutu
Copy link
Member

This can also be a plugin for fileexporter correct? It is just that you write into a remote "file".

@rakyll
Copy link
Contributor Author

rakyll commented Mar 25, 2021

Once open-telemetry/opentelemetry-specification#1443 is addressed, we should follow the specification there. Otherwise, the format needs discussion/proposal.

@bogdandrutu It's a good idea. Given it's vendor-specific, my initial suggestion was a standalone exporter. But, we can ask other vendors to implement support for their services and provide the functionality as a part of the fileexporter.

@sirianni
Copy link
Contributor

What format are you thinking for storage?

Should it be a design goal to pick a format (Parquet, ORC, etc.) (or have a pluggable format) that allows for the data to be queried "in-place" by a system like AWS Athena?

@emeraldbay
Copy link

Given the various data formats we have, can we first start build a few popular format like Parquet/ORC/JSON?

@alolita alolita added the enhancement New feature or request label May 13, 2021
@emeraldbay
Copy link

Initial proposal for the S3 exporter design requirements:

Exporter feature:
1). support metrics exporter as start, support log and traces later

File output format related feature:
1). Use snappy as basic coding format
2). Support parquet and compressed json format as start, support ORC later
3). Output file format is configurable through exporter config
4). Output schema is configurable through a json config file
5). Support specify the pdata.Metrics fields to output schema mapping through config file

S3 uploader related feature:
1). User should specify the S3 bucket name, S3 key prefix, output file name prefix, time partition granularity (in hour or in minute)
The final S3 key will be in the format:
s3://s3_bucket_name/s3_key_prefix/year=XXX/month=XX/day=XX/hour=XX/{min=XX/}file_name_prefix_{random_id}.file_format
2). Support basic S3 uploader configuration options (PartSize/Concurrency) as specified in https://aws.github.io/aws-sdk-go-v2/docs/sdk-utilities/s3/

@jrcamp
Copy link
Contributor

jrcamp commented Jun 14, 2021

Could this be a generic object storage using a library like https://github.com/chartmuseum/storage (not advocating this particular one, just as an example). Authentication methods may vary across cloud providers so maybe config options may still need to be cloud specific or something like that but using a library that already abstracts across the providers may still make it easier to extend to other providers. It even has one for local usage so makes sense to extend file exporter like @bogdandrutu suggested https://github.com/chartmuseum/storage/blob/main/local.go.

@emeraldbay
Copy link

Could this be a generic object storage using a library like https://github.com/chartmuseum/storage (not advocating this particular one, just as an example). Authentication methods may vary across cloud providers so maybe config options may still need to be cloud specific or something like that but using a library that already abstracts across the providers may still make it easier to extend to other providers. It even has one for local usage so makes sense to extend file exporter like @bogdandrutu suggested https://github.com/chartmuseum/storage/blob/main/local.go.

But I think this exporter is specific to S3

@emeraldbay
Copy link

@jrcamp
Copy link
Contributor

jrcamp commented Jun 24, 2021

@emeraldbay why must it be s3 specific?

alexperez52 referenced this issue in open-o11y/opentelemetry-collector-contrib Aug 18, 2021
Bumps [github.com/prometheus/common](https://github.com/prometheus/common) from 0.19.0 to 0.20.0.
- [Release notes](https://github.com/prometheus/common/releases)
- [Commits](prometheus/common@v0.19.0...v0.20.0)

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
@alolita alolita added the comp:aws AWS components label Sep 2, 2021
@knvpk
Copy link

knvpk commented Oct 21, 2021

This feature is very nice, so that we can archive so much of instrumentation data but still queryable.

@knvpk
Copy link

knvpk commented Dec 11, 2021

what is the progress on this feature?

@atoulme
Copy link
Contributor

atoulme commented Dec 14, 2021

Hey folks, I added a file exporter here: #6712
This is just outputting pipeline data as JSON.

@atoulme
Copy link
Contributor

atoulme commented Dec 14, 2021

Because this format is not well suited for s3, I have started working on Parquet support for OpenTelemetry.
Here is a draft PR: open-telemetry/opentelemetry-proto#346

Once OpenTelemetry data can be serialized in Parquet support, we can create a receiver and exporter for Parquet files.

@jpkrohling
Copy link
Member

I'm open to being a sponsor for the Parquet components on contrib. I think it would be a great addition.

@atoulme
Copy link
Contributor

atoulme commented Dec 18, 2021

That is great to hear! I'll try and stick to the approach of creating the component structure first. @jpkrohling please see here: #6903

@atoulme
Copy link
Contributor

atoulme commented Feb 11, 2022

Folks, open-telemetry/opentelemetry-proto#346 is ready for review. It is coming together. It is not a final Parquet mapping by any means but it gets us to a suitable support to experiment.

@jmcarp
Copy link
Contributor

jmcarp commented Mar 15, 2022

Is the idea that the stub parquet exporter in #6903 would eventually learn more formats and destinations, as in #6903 (comment)? Or would it be simpler to have one exporter to export text files to disk, another to export parquet files to disk, another to export parquet/json files to s3, etc.? I would be interested in exporting to both s3 and gcs, and happy to write a patch if it would be helpful.

@pdelewski
Copy link
Member

Recently, I started working on s3 exporter using this work as a baseline. Is anyone working on that actively?

@jpkrohling
Copy link
Member

I don't think anyone is working on this. If you decide to work on this, make sure you have a sponsor first, in order for this component to be accepted as part of the contrib repository.

@jmacd
Copy link
Contributor

jmacd commented Apr 26, 2022

👋 I have a personal interest in seeing OTC be able to write telemetry to S3. I feel that not many vendors have this interest, but for a user who wants to store years worth of telemetry and cares about costs, S3 looks like a good path forward.

@atoulme
Copy link
Contributor

atoulme commented Apr 26, 2022

I work on a parquet exporter, but no time recently.

@pmm-sumo
Copy link
Contributor

I volunteer to be a sponsor of the initiative. I think parquet exporter would be very useful and it's great to see the work started, though it's probably a somewhat separate item (essentially serialization mechanism). So I think it might be e.g. in some common package and then referenced by fileexporter and s3 exporter.

@nerochiaro
Copy link

I know this component is still marked in-development and not officially in contrib.

But to what extent can it already be used ? Would using it risk crashing the entire collector or at worse it won't do its job but leave everything else functioning ?

@pdelewski
Copy link
Member

Component is in alfa status, however it should be pretty stable. To continue work on that we need a sponsor as mentioned above.

@MovieStoreGuy
Copy link
Contributor

Hi all,

First of all, thank you for everyone being interested in seeing this implemented.
I want to ask if there was a use case for a dedicated aws s3 exporter? Or would users be okay if this was an extension to the file exporter?

The latter has the advantage that it can be implemented sooner and be rather opinionated on how to handle S3 issues / format.
The former means that other components could benefit from it by making it an extension?

I am happy to sponsor either, I would prefer the extension approach however, it depends on what everyone involved prefers?

@MovieStoreGuy
Copy link
Contributor

It looks like an earlier discussion leaned towards an abstraction for the fileexporter.

#1963

@pdelewski
Copy link
Member

@MovieStoreGuy Just to clarify, in the case of fileexporter, s3 storage would be treated as a kind of remote file system, right? Does fileexporter have similar extensions already?

@guyfrid
Copy link
Contributor

guyfrid commented Feb 7, 2023

First of all, thank you for everyone being interested in seeing this implemented.
I want to ask if there was a use case for a dedicated aws s3 exporter? Or would users be okay if this was an extension to the file exporter?

Hi @MovieStoreGuy! we also have an interest in this exporter.
our use case is to export and store spans to s3 bucket from a collector running locally inside the lambda handler.
I think(please correct me if i'm wrong) Both approaches can work for this use case.

Is there any active work ongoing on this?

@pdelewski
Copy link
Member

@guyfrid Work stopped due to lack of sponsor

@erez111
Copy link

erez111 commented Feb 15, 2023

I'm looking also for S3 exporter.

Having difficulties using otlp with http+json protocol as metrics exporter.

I thought using S3 restful api directly using http+json can be a quick alternative.

All I can send is http+protobuf or grpc (which isn't supported by s3).

AIs someone implemented it and can sens a config.yaml file or investigate it as well. Thanks guys for this useful thread

@philipcwhite
Copy link

I'd like to see this as well. We have a lot of data we ship to S3. I think it would be useful. Thanks

@atoulme
Copy link
Contributor

atoulme commented Mar 9, 2023

OK, I can sponsor this.

@atoulme atoulme added Accepted Component New component has been sponsored and removed Sponsor Needed New component seeking sponsor labels Mar 9, 2023
@rakyll
Copy link
Contributor Author

rakyll commented Mar 23, 2023

If anyone wants to work on this, they should feel free to take it. Unfortunately I'm working on something else nowadays.

@rakyll rakyll removed their assignment Mar 23, 2023
@atoulme
Copy link
Contributor

atoulme commented Mar 23, 2023

Thank you for your help! We will get it done.

@sudopras
Copy link

Hi, @atoulme would like to know if you need any help with this. Interested in contributing and sponsoring (if required)

@atoulme
Copy link
Contributor

atoulme commented Mar 30, 2023

No need for a sponsor, this PR is the latest:
#9979

Once it's in, we have to get the impl right.

@akshadha22
Copy link

akshadha22 commented Apr 7, 2023

I was unable to use awss3 exporter with the latest release because of the error:

error decoding 'exporters': unknown type: "awss3" for id: "awss3" (valid values:
[alibabacloud_logservice clickhouse googlemanagedprometheus instana jaeger pulsar sapm f5cloud loadbalancing sumologic otlp azuremonitor datadog elasticsearch influxdb mezmo skywalking awscloudwatchlogs awsemf dynatrace googlecloud tanzuobservability tencentcloud_logservice logicmonitor opencensus signalfx splunk_hec googlecloudpubsub loki prometheusremotewrite otlphttp azuredataexplorer carbon file jaeger_thrift logzio kafka prometheus sentry zipkin logging awskinesis awsxray coralogix])

@atoulme
Copy link
Contributor

atoulme commented Apr 7, 2023

The s3 exporter is not code complete yet from what I understand. We have just merged the skeleton of the exporter.

@pdelewski
Copy link
Member

The s3 exporter is not code complete yet from what I understand. We have just merged the skeleton of the exporter.

That's true. There is a second PR with implementation #10000, however it requires some work as interfaces changed during last few months.

@miguelb-gk
Copy link

Any idea when this will make its way in? We are also seeing the same unknown error.

@github-actions
Copy link
Contributor

Pinging code owners for exporter/awss3: @atoulme @pdelewski. See Adding Labels via Comments if you do not have permissions to add labels yourself.

@atoulme
Copy link
Contributor

atoulme commented Apr 15, 2023

#20912 is open to finish the job, please review.

@akshadha22
Copy link

I'm unable to pull latest release i.e., v0.76.3 from Dockerhub. v0.75.0 doesn't contain aws s3exporter option. When can we expect a new release that includes the latest changes for aws s3 exporter?

@MovieStoreGuy
Copy link
Contributor

Take a look at https://github.com/open-telemetry/opentelemetry-collector-releases/blob/main/distributions/otelcol-contrib/manifest.yaml#LL39C79-L39C92, it should be released as of v0.77.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Accepted Component New component has been sponsored comp:aws AWS components enhancement New feature or request exporter/awss3
Projects
None yet
Development

Successfully merging a pull request may close this issue.