-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Yaml API: Day Zero tutorial notebook #27284
Conversation
R: @robertwb |
Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #27284 +/- ##
===========================================
+ Coverage 38.53% 72.15% +33.61%
===========================================
Files 698 691 -7
Lines 102361 101185 -1176
===========================================
+ Hits 39447 73009 +33562
+ Misses 61283 26563 -34720
+ Partials 1631 1613 -18
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice tutorial! and cool idea :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks great, thanks for writing this Bartosz!
@@ -0,0 +1,424 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a landing page (similar to https://beam.apache.org/get-started/try-apache-beam/) which jumps to this collab page?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be great! Is this something @robertwb can arrange?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could be wrong, but I imagine it's just a matter of making a new page similar to that one and adding to the TOC
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tech Writers can probably help answer this question too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Me or @svetakvsundhar will work on this once this PR is merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filed #27450
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM, can you use the new YAML tags that Jeff created? https://bugdashboard.corp.google.com/app/tree;dashboardId=551658
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Just add in the TODOs :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great Bartosz, thanks again
Hi @robertwb, could you take a look at this PR? Thanks! |
…ransform in favour of CombinePerKey with a callable argument
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Bartosz! This looks really good. I left a couple comments, but overall seems to provide a great starting point. I think there will need to be some extra blocks around setup (installing dependencies, beam, etc.), but I see you opened a FR for that.
…orial # Conflicts: # sdks/python/apache_beam/yaml/yaml_provider.py
Hi @Polber, thanks for the review. All the changes are applied now. |
Co-authored-by: Jeff Kinard <35542536+Polber@users.noreply.github.com>
Hi @Polber, do you think we can merge it now? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @bzablocki We are currently trying to get some docs onto the Beam website which I think would be a good idea to get submitted first so that users have docs to reference after trying out the day zero notebook.
While playing around with this notebook again, I noticed a couple more minor things, but no rush.
"pipeline = '''\n", | ||
"pipeline:\n", | ||
" type: chain\n", | ||
" transforms:\n", | ||
" - type: ReadFromCsv\n", | ||
" config:\n", | ||
" path: data/people.csv\n", | ||
" - type: Filter\n", | ||
" config:\n", | ||
" language: python\n", | ||
" keep: \"age >= 18\"\n", | ||
" - type: LogForTesting\n", | ||
"'''\n", | ||
"save_to_file(pipeline, 'pipelines/pipeline-filter-01.yaml')\n", | ||
"! python -m apache_beam.yaml.main --pipeline_spec_file=pipelines/pipeline-filter-01.yaml" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In which case I would move the run command to its own cell.
Thanks for taking another look at this PR @Polber!
That makes a lot of sense, I agree it's better to release some docs first. Thanks for the update. I'll wait for the documentation and I'll update this PR with the relevant references once that is done. |
@bzablocki The docs have been pushed to the beam site, if you want to link them, I think we can get this PR merged |
Ping! Seems like this is ready to merge? |
Thanks for the ping, I'll submit an updated PR today. |
I updated the PR with documentation and a short explanation on why a pipeline with a transform from an expansion service (Filter-sql) logs output in a different format than a simple pipeline without an expansion service. |
As discussed in person, it feels a bit awkward to have every command be a combination of save file + execute command in Python. Could we set the interpreter to shell instead? |
@Polber I converted saving the file to the built-in Jupyter's '%%writefile'. PR is ready for the final review/merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working so hard on this!
I've created the first version of Getting Started with Yaml notebooks.
The target group are users who don't have a lot of experience with coding and/or beam; and want to quickly get started with writing pipelines in beam.
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123
), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>
instead.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI.