Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move stagingLocation and tempLocation to GcsOptions #31

Closed
wants to merge 0 commits into from

Conversation

peihe
Copy link
Contributor

@peihe peihe commented Mar 8, 2016

This will allow BigQueryOptions to extend GcsOptions, and BigQueryIO can access them through BigQueryOptions.

@peihe
Copy link
Contributor Author

peihe commented Mar 8, 2016

R: @lukecwik @dhalperi

@davorbonaci
Copy link
Member

Hm... Can you argue more why this is the right change?

GcsOptions should be about accessing GCS itself, but not necessarily job-wide settings like stagingLocation. Also, it is unclear that there's no a converse case -- where somebody could benefit from this actually being in DataflowPipelineOptions.

@peihe
Copy link
Contributor Author

peihe commented Mar 9, 2016

The reason is BigQueryIO needs a tempLocation option across runners.
The issue is the default value of tempLocation is coming from stagingLocation.
That is why I pull both of them to GcsOptions.

Another option is to pull both of them to a new PipelineOption.

@davorbonaci
Copy link
Member

Sounds like we might need a job-wide temporary location. I think this needs some discussion.

@dhalperi
Copy link
Contributor

This one is tough -- the tempLocation here needs to be a GCS bucket -- BigQuery can only export/import to/from GCS. So this needs to be runner-independent (can't use DataflowPipelineOptions.getLempLocation()) but needs to be GCS/GCP/BigQuery specific.

Would be nice to be able to reuse the one from DataflowPipelineOptions if such exists.

@peihe peihe force-pushed the temp-option branch 2 times, most recently from f95b969 to a78a806 Compare March 14, 2016 21:55
@peihe peihe closed this Mar 14, 2016
echauchot added a commit to echauchot/beam that referenced this pull request May 12, 2017
cosmoskitten pushed a commit to cosmoskitten/beam that referenced this pull request Jun 16, 2017
query5: Add comment on key lifting (issue apache#30)

query10: Add comment for strange groupByKey (issue apache#31)

query11: Replace Count.perKey by Count.perElement (issue apache#32)
asfgit pushed a commit that referenced this pull request Aug 23, 2017
query5: Add comment on key lifting (issue #30)

query10: Add comment for strange groupByKey (issue #31)

query11: Replace Count.perKey by Count.perElement (issue #32)
lukecwik referenced this pull request in lukecwik/incubator-beam Mar 22, 2018
Don't close data channels from environment sessions
mareksimunek pushed a commit to mareksimunek/beam that referenced this pull request May 9, 2018
mareksimunek pushed a commit to mareksimunek/beam that referenced this pull request May 9, 2018
mareksimunek pushed a commit to mareksimunek/beam that referenced this pull request May 9, 2018
apache#31 [euphoria-core] Implementation of accumulator API
dmvk pushed a commit to dmvk/beam that referenced this pull request May 15, 2018
dmvk pushed a commit to dmvk/beam that referenced this pull request May 15, 2018
tvalentyn pushed a commit to tvalentyn/beam that referenced this pull request May 15, 2018
mareksimunek referenced this pull request in seznam/beam Jul 9, 2018
dmvk referenced this pull request in seznam/beam Aug 17, 2018
dmvk referenced this pull request in seznam/beam Aug 17, 2018
kennknowles pushed a commit that referenced this pull request Oct 16, 2018
kennknowles pushed a commit that referenced this pull request Oct 16, 2018
hengfengli referenced this pull request in hengfengli/beam Mar 21, 2022
* feat: add new data record read metrics

Adds the following metrics:

- Initial partition from created to scheduled state time (in ms)
- Data record commit timestamp to read time (in ms)
- Data record read time to emitted time (in ms)

* feat: refine logging

Adds the partition token to all possible logging. Add temporary metric
of data record count as well.

* chore: spotless apply
sjvanrossum pushed a commit to sjvanrossum/beam that referenced this pull request May 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants