Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BQ sink produces sample of successful inserts #875

Merged
merged 2 commits into from
Jul 13, 2020

Conversation

pyalex
Copy link
Collaborator

@pyalex pyalex commented Jul 13, 2020

What this PR does / why we need it:

Output of BQ batch insert is now sample from all inserted rows.

Which issue(s) this PR fixes:

Currently BQ's successful inserts produces all rows that were in batch. In some use-cases like batch ingestion it could produce up to 30k / sec rows (or ~5 M rows in batch for the default flush frequency (3m)). This output is used in metrics where whole batch (with 5M rows) is grouped into single list, which requires it to fit into memory of one machine. This can lead to memory issues.

I suspect that this is overkill since metrics can be easily calculated from sample of inserted data instead of whole batch. This PR changes BQ successful inserts to output only sample of inserted data.

Does this PR introduce a user-facing change?:


@feast-ci-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: pyalex, woop

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@woop
Copy link
Member

woop commented Jul 13, 2020

/retest

@woop
Copy link
Member

woop commented Jul 13, 2020

/lgtm

@feast-ci-bot feast-ci-bot merged commit 2327b29 into feast-dev:master Jul 13, 2020
pyalex pushed a commit to pyalex/feast that referenced this pull request Jul 17, 2020
* feature row batch produces sample

* lint

Co-authored-by: Willem Pienaar <git@willem.co>
pyalex pushed a commit that referenced this pull request Jul 17, 2020
* feature row batch produces sample

* lint

Co-authored-by: Willem Pienaar <git@willem.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants