Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add map_windows support to Go SDK #24307

Merged
merged 6 commits into from
Nov 30, 2022
Merged

Conversation

camphillips22
Copy link
Contributor

@camphillips22 camphillips22 commented Nov 22, 2022

Adds support to the Go SDK for the map_windows urn.

The existing type model does not allow IntervalWindow as a FullValue and the existing coders will throw away the window values when decoding the KV type here in elideSingleElmFV. This change updates coder construction to allow an IntervalWindow as a value coder.

Addresses #23106.


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests
Go tests

See CI.md for more information about GitHub Actions CI.

@github-actions github-actions bot added the go label Nov 22, 2022
@camphillips22
Copy link
Contributor Author

R: @lostluck

@github-actions
Copy link
Contributor

Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control

@camphillips22
Copy link
Contributor Author

A note about testing: I was unable to write a batch pipeline that resulted in a window mapping operation.

Here is a link to the tests I wrote which did not result in a map_windows operation. It also contains the changes I made to the streaming wordcap example to make it happen on dataflow. I have not written a pipeline yet that works on Flink and uses map windows, but I wanted to go ahead get this change out for review.

@lostluck
Copy link
Contributor

This is a very good start!

A note about testing: I was unable to write a batch pipeline that resulted in a window mapping operation.

Here is a link to the tests I wrote which did not result in a map_windows operation. It also contains the changes I made to the streaming wordcap example to make it happen on dataflow. I have not written a pipeline yet that works on Flink and uses map windows, but I wanted to go ahead get this change out for review.

I looked into the Dataflow's graph analyzer and it doesn't use map windows in batch, so that's not amazing for batch testing. We should at least manually validate in streaming mode. I can do this if you like.

We should still have unit tests for the MapWindows Node in the exec package. We should test to the spec (with the KV<nonce, Window> inputs and outputs), per the proto. eg.That we preserve the "nonce", and similar. https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/org/apache/beam/model/pipeline/v1/beam_runner_api.proto#L285

I've got a few more specific comments in a moment or two.

@codecov
Copy link

codecov bot commented Nov 22, 2022

Codecov Report

Merging #24307 (38931b6) into master (1335b98) will decrease coverage by 0.00%.
The diff coverage is 58.82%.

@@            Coverage Diff             @@
##           master   #24307      +/-   ##
==========================================
- Coverage   73.37%   73.37%   -0.01%     
==========================================
  Files         718      718              
  Lines       97101    97164      +63     
==========================================
+ Hits        71252    71293      +41     
- Misses      24505    24523      +18     
- Partials     1344     1348       +4     
Flag Coverage Δ
go 51.58% <58.82%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
sdks/go/pkg/beam/core/graph/coder/coder.go 84.68% <0.00%> (-0.82%) ⬇️
sdks/go/pkg/beam/core/runtime/exec/translate.go 17.19% <0.00%> (-0.27%) ⬇️
sdks/go/pkg/beam/core/runtime/graphx/translate.go 38.43% <ø> (ø)
sdks/go/pkg/beam/core/runtime/exec/window.go 55.00% <65.51%> (+5.98%) ⬆️
sdks/go/pkg/beam/core/runtime/exec/coder.go 58.03% <70.00%> (+0.28%) ⬆️
sdks/go/pkg/beam/core/runtime/graphx/coder.go 53.36% <100.00%> (+1.13%) ⬆️
sdks/go/pkg/beam/core/runtime/graphx/dataflow.go 52.43% <100.00%> (+0.72%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Contributor

@lostluck lostluck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And just a heads up, in order to unblock RunInference, I've got a change that makes slices use the beam Iterable coder by default, for better cross language support. It affects some overlapping code, so be aware of merge conflicts.

@@ -177,6 +177,7 @@ const (
Iterable Kind = "I"
KV Kind = "KV"
LP Kind = "LP" // Explicitly length prefixed, likely at the runner's direction.
IWCValue Kind = "IWCvalue"

Window Kind = "window" // A debug wrapper around a window coder.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It bother's me we can't re-use this, but I agree we shouldn't mix these up just now. It's an internal implementation detail at least, so it can be cleaned up at some future point.

sdks/go/pkg/beam/core/graph/coder/coder.go Outdated Show resolved Hide resolved
sdks/go/pkg/beam/core/runtime/graphx/coder.go Outdated Show resolved Hide resolved
sdks/go/pkg/beam/core/runtime/exec/window.go Show resolved Hide resolved
@camphillips22
Copy link
Contributor Author

@lostluck Anything left to do here other than resolve conflicts?

@lostluck
Copy link
Contributor

I think just resolving the conflicts? I'll take another look after i finish lunch.

Adds support to the Go SDK for the [map_windows][1] urn.

The existing type model does not allow `IntervalWindow` as
a FullValue and the existing coders will throw away the window
values when decoding the KV type [here][2] in `elideSingleElmFV`.
This change updates coder construction to allow an IntervalWindow
as a value coder.

[1]: https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/org/apache/beam/model/pipeline/v1/beam_runner_api.proto#L296
[2]: https://github.com/camphillips22/beam/blob/8be4cdcaf65a1e53e3041ac3354e2e99c845e915/sdks/go/pkg/beam/core/runtime/exec/coder.go#L580-L580
@lostluck
Copy link
Contributor

Run Go PreCommit

@lostluck
Copy link
Contributor

PRecommit failure was a flake, so just rerunning to be sure. Otherwise this LGTM.

@lostluck
Copy link
Contributor

Run Go PostCommit

@lostluck
Copy link
Contributor

Run Go Flink ValidatesRunner

@lostluck lostluck merged commit 3951a7d into apache:master Nov 30, 2022
@lostluck
Copy link
Contributor

Thank you very much for working through this! Hopefully now it's possible for someone to cobble together a PeriodicImpulse or similar and have functioning Slowly Changing Side Inputs.

@camphillips22 camphillips22 deleted the cam/mapwindows branch November 30, 2022 14:15
@camphillips22
Copy link
Contributor Author

Thanks for helping me get this merged! It will help tremendously with our pipelines.

@riteshghorse riteshghorse mentioned this pull request Dec 1, 2022
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants