Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Reporting] CSV "chunked" export #18322

Closed
elasticmachine opened this issue Nov 8, 2017 · 16 comments · Fixed by #108485
Closed

[Reporting] CSV "chunked" export #18322

elasticmachine opened this issue Nov 8, 2017 · 16 comments · Fixed by #108485
Labels
enhancement New value added to drive a business result Feature:Reporting:CSV Reporting issues pertaining to CSV file export high impact:critical This issue should be addressed immediately due to a critical level of impact on the product. loe:large Large Level of Effort needs-team Issues missing a team label

Comments

@elasticmachine
Copy link
Contributor

Original comment by @kobelb:

Currently, all reporting exports are stored in a single Elasticsearch document. Additionally, CSV exports are capped at a configurable 10mb presently. We can increase this limit by splitting the result across multiple Elasticsearch documents and streaming them to the user in such a way that it appears to be one file.

@elasticmachine elasticmachine added :Sharing (Deprecated) Feature:Reporting Use Reporting:Screenshot, Reporting:CSV, or Reporting:Framework instead labels Apr 25, 2018
@stacey-gammon stacey-gammon added enhancement New value added to drive a business result Team:Visualizations Visualization editors, elastic-charts and infrastructure and removed :Sharing labels Sep 13, 2018
@matthew-b-b
Copy link

For perspective, 10mb = Seven 3.5-floppy-discs,

Please increase the cap to something less Windows 95.

@tsullivan
Copy link
Member

@matthew-b-b the cap is configurable. The setting is xpack.reporting.csv.maxSizeBytes

I've seen user's successfully generate reports of around 200mb. The limitation is really capacity on network (web proxies) and Elasticsearch HTTP payload limits. All of those things are configurable as well.

I'm not sure this is a real issue.

@alexfrancoeur
Copy link

@tsullivan I believe this is a real issue, we have numerous customers who are asking for larger exports and I believe maxSizeBytes still has a limitation set by Elasticsearch payload limits as you've mentioned. The chunked export was always meant to be the next phase in CSV export, allowing users from within the UI and via Watcher to export larger data sets with ease and in a way where we can take as many precautions as possible in order to reduce impact on the cluster itself.

I believe the request is still valid and the user experience as defined by @kobelb is still common amongst our community.

cc: @AlonaNadler for awareness

@tbone2sk
Copy link

@alexfrancoeur do you know if the Kibana team is working on enhancements to allow for chunked exports? The chunked export approach is the only maintainable approach to allowing for large CSV files to be downloaded.

Based on this blog post it looks like some improvements are being made to export to CSV.
https://www.elastic.co/blog/keeping-up-with-kibana-2019-04-15

We have dozens of users that would like for the capacity to export large CSV files and the chunked export would be a perfect solution.

@AlonaNadler
Copy link

@tbone2sk we are not actively working on it atm but we do prepare the reporting backend so it will be easier to address it in the future. The upcoming ability to export CSV from a saved search in a dashboard is using a different implementation that should help us chunk csv export in the future. How big the files the users in your org want to export? and out of curiosity what do they do with these files after they export ?

@tbone2sk
Copy link

Sounds great, I look forward to the enhancements!

@AlonaNadler
We have been using Kibana for the analysis of marketing data. We try to complete most of the analysis within the tool, but there are cases where the functionality of Kibana is not as flexible as other tools and a smaller dataset is exported from Kibana. Most of these files are larger than 200,000 records and some can contain a few million records. It would be preferred if we can export files as large as 1GB.

@tsullivan tsullivan self-assigned this Jul 9, 2019
@timroes timroes added Team:Stack Services and removed Team:Visualizations Visualization editors, elastic-charts and infrastructure labels Jul 18, 2019
@bmcconaghy bmcconaghy added Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) and removed Team:Stack Services labels Dec 12, 2019
@bmcconaghy bmcconaghy added Team:Reporting Services and removed Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) labels Dec 20, 2019
@elasticmachine
Copy link
Contributor Author

Pinging @elastic/kibana-reporting-services (Team:Reporting Services)

@i-aggarwal
Copy link

Hi Elastic team,

We have a usecase where the CSV report around 1GB needs to be exported as well. The data is then used for analysis and tagging. Do you have any visibility of when the functionality chunked export be available?

@tsullivan
Copy link
Member

Thanks Alex, this definitely is a real issue. Chunking the data into smaller documents for storage also means relieving the memory pressure being put on the Kibana server to perform this feature. Doing this enhancement would greatly mature the CSV export feature.

Do you have any visibility of when the functionality chunked export be available?

@i-aggarwal the Reporting Services currently doesn't have a timeline for starting work on this.

@tsullivan
Copy link
Member

In addition to breaking the exported content into as many documents as necessary, another idea has come up about Base64-encoding the chunks, or gzipping the chunks.

Also, using the Binary datatype for the content field in the Reporting document would probably be needed to store gzipped content.

@tsullivan tsullivan removed their assignment Oct 6, 2020
@matschaffer
Copy link
Contributor

Also some code in the wild of folks working around the current state of CSV reporting:

@tsullivan
Copy link
Member

I spent a bit of time looking into this today. I think it'll be better in the long run to wait until after

If we don't wait, then this change has a lot of impact to ESQueue which is a legacy JS library for running Reporting tasks in the server background. Having that removed and using Task Manager gives a better foundation for making the change to use chunked CSV data for Kibana Reporting.

@tsullivan
Copy link
Member

https://github.com/fabiopipitone/elasticsearch-tocsv - builds a CSV straight from elasticsearch data

For users that need to just get a dump of Elasticsearch data into CSV, it would be better to do that straight out of Elasticsearch instead of using Reporting in the Kibana UI. Using a script to do that will makes possible to append CSV text to a file handle, making no need to keep amounts of data in RAM and then send it a network. I haven't reviewed the repository being linked to here, but I would look into custom scripts such as this one if I needed to take all of my Elasticsearch data and copy it to another storage system.

In contrast, Kibana Reporting field formatters, allows users to download reports multiple times, etc. And that course happens in one pass by holding all the data from the query in server memory. This issue will remain open though, because it will make Kibana Reporting better at providing CSV reports.

@elasticmachine
Copy link
Contributor Author

Pinging @elastic/kibana-app-services (Team:AppServices)

@tsullivan
Copy link
Member

The solution to look for here is a way to stream the report output into storage and out of storage. While chunking the CSV export enables the entire report data to be stored in Elasticsearch, it creates a new problem that the user can't download the data as a single file because the chunks would have to be stitched together in RAM.

@tsullivan
Copy link
Member

Update: there is an alternative proposal in Elasticsearch to support file storage. If that was available, we would use that to store CSV. The benefits would be that we could stream the bits to the file without chunking it or having to hold the entire contents in memory

@tsullivan tsullivan added high impact:high Addressing this issue will have a high level of impact on the quality/strength of our product. labels Apr 23, 2021
@exalate-issue-sync exalate-issue-sync bot added loe:small Small Level of Effort loe:medium Medium Level of Effort loe:large Large Level of Effort and removed loe:small Small Level of Effort loe:medium Medium Level of Effort labels Apr 26, 2021
@petrklapka petrklapka added 1 and removed 1 labels May 6, 2021
@exalate-issue-sync exalate-issue-sync bot added impact:critical This issue should be addressed immediately due to a critical level of impact on the product. and removed impact:high Addressing this issue will have a high level of impact on the quality/strength of our product. labels Jul 14, 2021
@sophiec20 sophiec20 added Feature:Reporting:CSV Reporting issues pertaining to CSV file export and removed (Deprecated) Feature:Reporting Use Reporting:Screenshot, Reporting:CSV, or Reporting:Framework instead (Deprecated) Team:Reporting Services labels Aug 21, 2024
@botelastic botelastic bot added the needs-team Issues missing a team label label Aug 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New value added to drive a business result Feature:Reporting:CSV Reporting issues pertaining to CSV file export high impact:critical This issue should be addressed immediately due to a critical level of impact on the product. loe:large Large Level of Effort needs-team Issues missing a team label
Projects
None yet
Development

Successfully merging a pull request may close this issue.