Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow GlobalWindows to be encoded as IntervalWindows #32569

Closed
wants to merge 14 commits into from

Conversation

damccorm
Copy link
Contributor

@damccorm damccorm commented Sep 26, 2024

Right now, we window outputs from bigquery write connector (representing failures) back into the global window, but we do this as part of the transform instead of in an explicit windowing step. This fails because when we do sampling, it expects the value to be in the window which was passed in. This behavior exists in other places and is supported in other SDKs, but doesn't work in python because we aren't able to encode the window correctly. This fixes the problem and allows the encoding to be a bit more permissive.

Without my change, this fails when performing sampling (example on Dataflow)

After my change, this succeeds. I also added a representative test

Fixes #25014


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests
Go tests

See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.

@damccorm damccorm changed the title [WIP]: Do not merge, trying to repro failure [WIP]: Allow GlobalWindows to be encoded as IntervalWindows Sep 26, 2024
@damccorm damccorm changed the title [WIP]: Allow GlobalWindows to be encoded as IntervalWindows Allow GlobalWindows to be encoded as IntervalWindows Sep 27, 2024
@damccorm damccorm marked this pull request as ready for review September 27, 2024 12:50
@damccorm
Copy link
Contributor Author

R: @Abacn

Copy link
Contributor

Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control. If you'd like to restart, comment assign set of reviewers

@damccorm
Copy link
Contributor Author

I'll note that there's probably a better fix here to introspect the function and figure out if it can return globally windowed things ahead of time so we can get our coders right. That's also a larger more ambiguous problem, and I'm hoping we can just get BQ IO (and other IOs which use this pattern like spanner) right for now

@@ -822,15 +824,19 @@ def _from_normal_time(self, value):

def encode_to_stream(self, value, out, nested):
# type: (IntervalWindow, create_OutputStream, bool) -> None
typed_value = value
if not TYPE_CHECKING:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not be checking this at every element encoding; instead lift the import.

pass

def finish_bundle(self):
yield beam.transforms.window.GlobalWindows.windowed_value('test')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One should only emit to windows that were part of the input.

@damccorm
Copy link
Contributor Author

Superceded by #32583

@damccorm damccorm closed this Sep 27, 2024
@damccorm damccorm deleted the users/damccorm/bq_sampling branch October 28, 2024 17:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TypeError: Cannot convert GlobalWindow to apache_beam.utils.windowed_value._IntervalWindowBase
2 participants