Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Reporting] Abstract reports storage #106821

Merged
merged 7 commits into from
Aug 4, 2021
Merged

Conversation

dokmic
Copy link
Contributor

@dokmic dokmic commented Jul 27, 2021

Summary

This pull-request is encapsulating the reporting storage logic behind a Node.js stream.

Resolves #98726.

Checklist

For maintainers

@dokmic dokmic added review (Deprecated) Feature:Reporting Use Reporting:Screenshot, Reporting:CSV, or Reporting:Framework instead v8.0.0 Team:AppServices release_note:skip Skip the PR/issue when compiling release notes v7.15.0 labels Jul 27, 2021
@dokmic dokmic force-pushed the feature/98726 branch 4 times, most recently from ee1bccf to 729ba1f Compare July 28, 2021 20:34
@dokmic dokmic marked this pull request as ready for review July 29, 2021 07:22
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-app-services (Team:AppServices)

@dokmic dokmic requested a review from tsullivan July 29, 2021 08:08
@tsullivan tsullivan requested review from a team July 29, 2021 16:58
}
}

_write(chunk: Buffer | string, encoding: string, callback: Callback) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's prefix unused variables with _ to stop warnings in the editor:

Suggested change
_write(chunk: Buffer | string, encoding: string, callback: Callback) {
_write(chunk: Buffer | string, _encoding: string, callback: Callback) {

async toString(): Promise<string> {
let result = '';

for await (const chunk of this) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unnecessary await?


constructor(
private reporting: ReportingCore,
private config: ReportingConfigType,
logger: LevelLogger
) {
this.logger = logger.clone(['runTask']);
this.getContentStream = getContentStreamFactory(reporting);
Copy link
Member

@tsullivan tsullivan Jul 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getContentStream always seems to be called only once per file, so I am not seeing a benefit from having it returned from a factory function.

We could remove the factory wrapper by having getContentStream take reporting as the first argument.

There are a lot of factory functions in Reporting, but it is a leftover of the old platform. The existing code that's like that should all be cleaned up in this PR: #106940

Copy link
Member

@tsullivan tsullivan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking great! I'm excited to get this in.

I left a few comments, but the main concern is we need to make sure when a report document is queried from Elasticsearch, the report was created by the authenticated user.

constant_score: {
filter: {
bool: {
must: [{ term: { _id: id } }],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to add the filter that was in jobsQueries.getContent, which was to match the user of the document with the authenticated user.

This would preserve the requirement that users can not download reports created by other users.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added that at first, but then I decided not to do that because I don't think it belongs there. The stream itself should be as simple as possible and responsible only for reading and writing the data. And apart from that, we already have this check in the store and perform that here before reading from the stream.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ after I wrote my previous comment, I realized that .get is checking the username for us.

index: report._index!,
if_primary_term: report._primary_term,
if_seq_no: report._seq_no,
});
Copy link
Member

@tsullivan tsullivan Jul 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had in mind that the abstracted file storage mechanism would be used inside of the task runner functions, (aka execute_job functions) so they can take control of streaming their output content to storage as it becomes available.

Internally, the file storage mechanism could chunk up the data into multiple documents as it is available, which lends itself towards solving #18322

It might be good to create a stream variable in _performJob, and pass it to the task runner functions on line 247. We can have the task runners return the stream back again because the _seq_no and _primary_term need to be updated. Then we could remove the parts from execute_report.ts that handle the entire output content. Something like that would be the only way to allow the csv.maxSizeBytes setting to be unlimited.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i agree we should be writing inside execute job as we are getting data, but imo its ok to do that in a follow up PR, while keeping this PR as small as possible and just abstracting our storage without doing any other changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have addressed that in the latest commit. The task runner functions no longer return content but write that to the stream.

@tsullivan tsullivan requested a review from ppisljar July 29, 2021 22:19
@tsullivan
Copy link
Member

Let's also get @ppisljar to review

Copy link
Member

@ppisljar ppisljar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM after Tims concerns are addressed.

Copy link
Member

@tsullivan tsullivan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

This tackles the issue very well. GREAT WORK!

@kibanamachine
Copy link
Contributor

💚 Build Succeeded

Metrics [docs]

✅ unchanged

History

  • 💔 Build #142817 failed dfdae11a8389fff5dec2d8ed7731906aa4344c63
  • 💚 Build #142084 succeeded 431a2e3748c9c57e1936e3a3c1e55e06d67d8d5c
  • 💚 Build #141487 succeeded 729ba1f17eb7ae4159130925c241c3291d20750d
  • 💔 Build #141459 failed ee1bccfa847c57e7e24717057f757df0b024a68c
  • 💔 Build #141113 failed eecc01d1b50524adf63ba97b58ed70b5e074d733

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@dokmic dokmic merged commit 1e8bc92 into elastic:master Aug 4, 2021
@dokmic dokmic deleted the feature/98726 branch August 4, 2021 17:52
dokmic added a commit that referenced this pull request Aug 4, 2021
* Add duplex content stream
* Add content stream factory
* Move report contents gathering and writing to the content stream
* Update jobs executors to use content stream instead of returning report contents
# Conflicts:
#	x-pack/plugins/reporting/server/export_types/printable_pdf/execute_job/index.test.ts
streamich pushed a commit to vadimkibana/kibana that referenced this pull request Aug 8, 2021
* Add duplex content stream
* Add content stream factory
* Move report contents gathering and writing to the content stream
* Update jobs executors to use content stream instead of returning report contents
if_seq_no?: number;
}

export class ContentStream extends Duplex {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the expected behaviour of writing to a stream? Will we only ever allow writing to non-existing document IDs?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
(Deprecated) Feature:Reporting Use Reporting:Screenshot, Reporting:CSV, or Reporting:Framework instead release_note:skip Skip the PR/issue when compiling release notes review v7.15.0 v8.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

abstract reporting file storage as duplex stream
6 participants