-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tour of Beam markdown touchups #32536
Conversation
Assigning reviewers. If you would like to opt out of this review, comment R: @lostluck added as fallback since no labels match configuration Available commands:
The PR bot will only process comments in the main thread (not review comments). |
assign to next reviewer |
Sorry about the delay. The tail end of Google internal release tasks made me miss this earlier this week. Looking now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved! Nothing blocking so I'll merge, but we can always discuss and add another PR later.
Sorry again for the delay.
@@ -22,7 +22,7 @@ The Beam SDKs provide several abstractions that simplify the mechanics of large- | |||
|
|||
→ `PCollection`: A PCollection represents a distributed data set that your Beam pipeline operates on. The data set can be bounded, meaning it comes from a fixed source like a file, or unbounded, meaning it comes from a continuously updating source via a subscription or other mechanism. Your pipeline typically creates an initial PCollection by reading data from an external data source, but you can also create a PCollection from in-memory data within your driver program. From there, PCollections are the inputs and outputs for each step in your pipeline. | |||
|
|||
→ `PTransform`: A PTransform represents a data processing operation, or a step, in your pipeline. Every PTransform takes one or more PCollection objects as the input, performs a processing function that you provide on the elements of that PCollection, and then produces zero or more output PCollection objects. | |||
→ `PTransform`: A PTransform represents a data processing operation, or a step, in your pipeline. Every PTransform takes zero or more PCollection objects as the input, performs a processing function that you provide on the elements of that PCollection, and then produces zero or more output PCollection objects. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm on the fence between "yes, this is accurate and technically correct" and "no, this doesn't help users learn the model, as it's easier to automatically follow best practices by treating the 0 input cases as special/exceptional".
But I don't feel strongly enough for the latter to force further rewrites.
@@ -61,9 +61,9 @@ In java, you need to set runner to `args` when you start the program. | |||
{{end}} | |||
|
|||
{{if (eq .Sdk "python")}} | |||
In the Python SDK , the default is runner **DirectRunner**. | |||
In the Python SDK , the **DirectRunner** is the default runner and is used if no runner is specified. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No action required.
Obligatory complaint that we never explain anywhere that the Direct Runner isn't a monolith and has very different behaviors between SDKs. I can't finish Prism soon enough...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That relates a little bit to your other comment. Technically true that the DirectRunners are not a monolith but I imagine most people are single SDK users so the different SDK DirectRunner behaviors are unlikely to bite them in practice (but I may be overgeneralizing my experience :)
Just some formatting and clarification changes to a few Tour of Beam pages. (I've been referring many people to Tour of Beam!)
Before:
Adjusted
PTransform
definition which used to say that PTransforms took one or more PCollection when they can really take zero or more.Small capitalization / language clarifications