-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Reporting] Abstract reports storage #106821
Conversation
ee1bccf
to
729ba1f
Compare
Pinging @elastic/kibana-app-services (Team:AppServices) |
} | ||
} | ||
|
||
_write(chunk: Buffer | string, encoding: string, callback: Callback) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's prefix unused variables with _
to stop warnings in the editor:
_write(chunk: Buffer | string, encoding: string, callback: Callback) { | |
_write(chunk: Buffer | string, _encoding: string, callback: Callback) { |
async toString(): Promise<string> { | ||
let result = ''; | ||
|
||
for await (const chunk of this) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unnecessary await?
|
||
constructor( | ||
private reporting: ReportingCore, | ||
private config: ReportingConfigType, | ||
logger: LevelLogger | ||
) { | ||
this.logger = logger.clone(['runTask']); | ||
this.getContentStream = getContentStreamFactory(reporting); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getContentStream
always seems to be called only once per file, so I am not seeing a benefit from having it returned from a factory function.
We could remove the factory wrapper by having getContentStream
take reporting
as the first argument.
There are a lot of factory functions in Reporting, but it is a leftover of the old platform. The existing code that's like that should all be cleaned up in this PR: #106940
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking great! I'm excited to get this in.
I left a few comments, but the main concern is we need to make sure when a report document is queried from Elasticsearch, the report was created by the authenticated user.
constant_score: { | ||
filter: { | ||
bool: { | ||
must: [{ term: { _id: id } }], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to add the filter that was in jobsQueries.getContent
, which was to match the user
of the document with the authenticated user.
This would preserve the requirement that users can not download reports created by other users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added that at first, but then I decided not to do that because I don't think it belongs there. The stream itself should be as simple as possible and responsible only for reading and writing the data. And apart from that, we already have this check in the store and perform that here before reading from the stream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++ after I wrote my previous comment, I realized that .get
is checking the username for us.
index: report._index!, | ||
if_primary_term: report._primary_term, | ||
if_seq_no: report._seq_no, | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had in mind that the abstracted file storage mechanism would be used inside of the task runner functions, (aka execute_job
functions) so they can take control of streaming their output content to storage as it becomes available.
Internally, the file storage mechanism could chunk up the data into multiple documents as it is available, which lends itself towards solving #18322
It might be good to create a stream
variable in _performJob
, and pass it to the task runner functions on line 247. We can have the task runners return the stream back again because the _seq_no
and _primary_term
need to be updated. Then we could remove the parts from execute_report.ts
that handle the entire output content. Something like that would be the only way to allow the csv.maxSizeBytes
setting to be unlimited.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i agree we should be writing inside execute job as we are getting data, but imo its ok to do that in a follow up PR, while keeping this PR as small as possible and just abstracting our storage without doing any other changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have addressed that in the latest commit. The task runner functions no longer return content but write that to the stream.
Let's also get @ppisljar to review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM after Tims concerns are addressed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This tackles the issue very well. GREAT WORK!
💚 Build SucceededMetrics [docs]
History
To update your PR or re-run it, just comment with: |
* Add duplex content stream * Add content stream factory * Move report contents gathering and writing to the content stream * Update jobs executors to use content stream instead of returning report contents # Conflicts: # x-pack/plugins/reporting/server/export_types/printable_pdf/execute_job/index.test.ts
* Add duplex content stream * Add content stream factory * Move report contents gathering and writing to the content stream * Update jobs executors to use content stream instead of returning report contents
if_seq_no?: number; | ||
} | ||
|
||
export class ContentStream extends Duplex { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the expected behaviour of writing to a stream? Will we only ever allow writing to non-existing document IDs?
Summary
This pull-request is encapsulating the reporting storage logic behind a Node.js stream.
Resolves #98726.
Checklist
For maintainers