-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Lineage metrics for BigtableIO #32068
Conversation
@@ -71,6 +71,7 @@ | |||
import org.apache.beam.sdk.io.gcp.bigtable.BigtableIO.BigtableSource; | |||
import org.apache.beam.sdk.io.range.ByteKeyRange; | |||
import org.apache.beam.sdk.metrics.Distribution; | |||
import org.apache.beam.sdk.metrics.Lineage; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the challenges is that BigtableIO has a complicated fallback logic to resolve projectId / instanceId. From transform setting / Bigtable setting / pipeline options, and there is BigtableConfig/BigtableWriteOptions/BigtableDataSettings. This was partly due to client migration that supported both old and new config class.
For this reason this PR chose to emit metrics at the low level (BigtableServiceImpl) where the worker actually talking to the server, and where projectId/instanceId has been resolved definitely
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you.
@@ -23,11 +23,8 @@ | |||
public class Lineage { | |||
|
|||
public static final String LINEAGE_NAMESPACE = "lineage"; | |||
public static final String SOURCE_METRIC_NAME = "sources"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is breaking change but Lineage is just introduced for one release and only used internally. I would suggest change the constant String to a enum so we can have a query method guarantees the metric name is among defined ones
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SG. Thanks.
checkLineageSourceMetric(r, tableId); | ||
} | ||
|
||
private void checkLineageSourceMetric(PipelineResult r, String tableId) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one cannot check Lineage metrics in unit test in BigtableIO as what BigQueryIO does is because the metrics are emitted in ServiceImpl, where unit tests used a fake Reader + ServiceImpl which does not emit Lineage (and it does not make sense add Lineage metrics there as well, because it is completely differerent code path from the real Reader / ServiceImpl)
testE2EBigtableSegmentRead failed Lineage metrics not exist, however the test passed locally The reason is that there is actually an OOM seen in log:
consequently, the pipeline actually failed, but the test does not assert it! This test actually has always been failing |
Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment |
Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control. If you'd like to restart, comment |
R: @robertwb |
@@ -23,11 +23,8 @@ | |||
public class Lineage { | |||
|
|||
public static final String LINEAGE_NAMESPACE = "lineage"; | |||
public static final String SOURCE_METRIC_NAME = "sources"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SG. Thanks.
@@ -218,6 +218,10 @@ task integrationTest(type: Test, dependsOn: processTestResources) { | |||
|
|||
useJUnit { | |||
excludeCategories "org.apache.beam.sdk.testing.UsesKms" | |||
filter { | |||
// https://github.com/apache/beam/issues/32071 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interesting.....
Thanks for catching this.
...a/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableIO.java
Outdated
Show resolved
Hide resolved
@@ -71,6 +71,7 @@ | |||
import org.apache.beam.sdk.io.gcp.bigtable.BigtableIO.BigtableSource; | |||
import org.apache.beam.sdk.io.range.ByteKeyRange; | |||
import org.apache.beam.sdk.metrics.Distribution; | |||
import org.apache.beam.sdk.metrics.Lineage; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you.
@@ -38,4 +38,35 @@ public static StringSet getSources() { | |||
public static StringSet getSinks() { | |||
return SINKS; | |||
} | |||
|
|||
/** Query {@link StringSet} metrics from {@link MetricResults}. */ | |||
public static Set<String> query(MetricResults results, Type type) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
Please add a meaningful description for your change here
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123
), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>
instead.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.