-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow GlobalWindows to be encoded as IntervalWindows #32569
Conversation
R: @Abacn |
Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control. If you'd like to restart, comment |
I'll note that there's probably a better fix here to introspect the function and figure out if it can return globally windowed things ahead of time so we can get our coders right. That's also a larger more ambiguous problem, and I'm hoping we can just get BQ IO (and other IOs which use this pattern like spanner) right for now |
@@ -822,15 +824,19 @@ def _from_normal_time(self, value): | |||
|
|||
def encode_to_stream(self, value, out, nested): | |||
# type: (IntervalWindow, create_OutputStream, bool) -> None | |||
typed_value = value | |||
if not TYPE_CHECKING: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not be checking this at every element encoding; instead lift the import.
pass | ||
|
||
def finish_bundle(self): | ||
yield beam.transforms.window.GlobalWindows.windowed_value('test') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One should only emit to windows that were part of the input.
Superceded by #32583 |
Right now, we window outputs from bigquery write connector (representing failures) back into the global window, but we do this as part of the transform instead of in an explicit windowing step. This fails because when we do sampling, it expects the value to be in the window which was passed in. This behavior exists in other places and is supported in other SDKs, but doesn't work in python because we aren't able to encode the window correctly. This fixes the problem and allows the encoding to be a bit more permissive.
Without my change, this fails when performing sampling (example on Dataflow)
After my change, this succeeds. I also added a representative test
Fixes #25014
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123
), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>
instead.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.