Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add missing params to Python Bigtable MutationsBatcher #31791

Merged
merged 1 commit into from
Jul 10, 2024

Conversation

nguymin4
Copy link
Contributor

@nguymin4 nguymin4 commented Jul 7, 2024

Issue
In https://github.com/apache/beam/pull/27085/files, we replaced _MutationsBatcher with MutationsBatcher from google.cloud.bigtable.batcher.

However we didn't set flush_count and max_row_bytes of MutationsBatcher thus both constants FLUSH_COUNT and MAX_ROW_BYTES are currently not in used.

=> This PR fixed that issue

Implication
Although, in google.cloud.bigtable library, MutationsBatcher also set default value for flush-count but there is currently a discrepancy between default value in documentation 1000 and actual default value 100 See more

For small/medium size use cases, there shouldn't be any problems. But I have 1 use case in which 1 million rows are written to Bigtable and flush_count: 100 is too low which creates a lot of requests to Bigtable server.

One potential improvement is to allow both flush_count and max_row_bytes to be configurable. But this is open-to further discussion and out of scope of this PR.


GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests
Go tests

See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.

Copy link
Contributor

github-actions bot commented Jul 7, 2024

Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment assign set of reviewers

@nguymin4
Copy link
Contributor Author

nguymin4 commented Jul 7, 2024

assign set of reviewers

Copy link
Contributor

github-actions bot commented Jul 7, 2024

Assigning reviewers. If you would like to opt out of this review, comment assign to next reviewer:

R: @damccorm for label python.
R: @shunping for label io.
R: @igorbernstein2 for label bigtable.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

Copy link
Contributor

@damccorm damccorm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM, thanks for the contribution! Sorry for the slow review, fortunately this should still make it into the next release (2.58.0) :)

@damccorm damccorm merged commit 018bcdf into apache:master Jul 10, 2024
87 of 88 checks passed
@nguymin4 nguymin4 deleted the add-python-bigtable-flush-count branch July 10, 2024 10:17
@nguymin4
Copy link
Contributor Author

Thank you, that's good to know.

acrites pushed a commit to acrites/beam that referenced this pull request Jul 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants