[BEAM-6291] Generic BigQuery schema load tests metrics #7614

kkucharc · 2019-01-24T16:51:49Z

It was decided to change tables to have metrics column where it will be name of metric which is collected.
Also added two minor changes:

added an env variable to disable load tests,
added pipeline option to disable saving metrics

Follow this checklist to help us incorporate your contribution quickly and easily:

Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
If this contribution is large, please file an Apache Individual Contributor License Agreement.

It will help us expedite review of your Pull Request if you tag someone (e.g. @username) to look at it.

Post-Commit Tests Status (on master branch)

Lang	Apex	Dataflow	Flink	Gearpump	Samza	Spark
Go	---	---	---	---	---	---
Java
Python	---		---	---	---	---

kkucharc · 2019-01-25T09:25:00Z

Hi @udim I know that you reviewed similar code for Java recently, so I hope that you will find a few minutes to take a look at this PR as well. I based my BQ changes on your code. Other two commits are small fixes.

@pabloem and @lgajowy I would be grateful if you will check this too.

udim

+1 for simplification and more consistency!
Please see my comments.

udim · 2019-01-25T23:24:09Z

sdks/python/apache_beam/testing/load_tests/load_test_metrics_utils.py

-  def _prepare_schema(self, schemas):
-    return [_get_schema_field(schema) for schema in schemas]
+  def _prepare_schema(self):
+    return [get_schema_field(row) for row in SCHEMA]


I think this could be simplified, if you rename the key type to field_type in SCHEMA:

SCHEMA = [ {'name': ID_LABEL, 'field_type': 'STRING', 'mode': 'REQUIRED' }, ....

then this line could be simplified to:

return [SchemaField(**row) for row in SCHEMA]

Thanks a lot! That's really clever solution. I'm changing it this way.

udim · 2019-01-26T02:45:46Z

sdks/python/apache_beam/testing/load_tests/load_test_metrics_utils.py

-  def match_and_save(self, result_list):
-    rows_tuple = tuple(self._match_inserts_by_schema(result_list))
-    self._insert_data(rows_tuple)
+  def match_and_save(self, results_lists):


Could you document what type results_list is?
It seems that each item in results_list is a list of dictionaries, and each dict looks like:
{'label': SUBMIT_TIMESTAMP_LABEL, 'value': time.time()}
but I'm not 100% sure.

I think this module would be easier to understand if each item in results_list was a single dict:

{ ID_LABEL: uuid, SUBMIT_TIMESTAMP_LABEL: time.time(), METRICS_TYPE_LABEL: RUNTIME_METRIC, VALUE_LABEL: value, }

Note that _bq_client.insert_rows() also accepts a list of dicts so there would be no need to convert the above to tuple form.

I agree with you in 100%. I had same impression this naming is not so clear. I will refactor it according to suggestions. Hopefully it will simplify.

… documentation and pipelineoption.

kkucharc · 2019-01-30T16:43:08Z

Thank you @udim for the review. It was really helpful. I applied your comments, do you think it looks ok now?

kkucharc · 2019-01-31T11:25:17Z

Run Python PreCommit

udim

LGTM!
Sorry for the delay

kkucharc · 2019-02-05T09:54:43Z

No problem, thank you @udim !

kkucharc added 3 commits January 24, 2019 18:04

[BEAM-6291] Generic schema for BQ load tests

2833de5

[BEAM-6291] Added pipeline option to check if metrics are required

b5bacf0

[BEAM-6291] Common environment variable to disable load tests.

de395da

kkucharc force-pushed the BEAM-6291-generic-schema-for-BQ-load-tests branch from 08bb6e5 to de395da Compare January 24, 2019 17:23

udim requested changes Jan 26, 2019

View reviewed changes

[BEAM-6291] Refactored BQ and Metrics load tests utils. Added missing…

ff8d527

… documentation and pipelineoption.

kkucharc force-pushed the BEAM-6291-generic-schema-for-BQ-load-tests branch from 8223ab0 to ff8d527 Compare January 30, 2019 16:35

udim approved these changes Feb 5, 2019

View reviewed changes

pabloem merged commit 09996b6 into apache:master Feb 5, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BEAM-6291] Generic BigQuery schema load tests metrics #7614

[BEAM-6291] Generic BigQuery schema load tests metrics #7614

kkucharc commented Jan 24, 2019

kkucharc commented Jan 25, 2019

udim left a comment

udim Jan 25, 2019

kkucharc Jan 30, 2019

udim Jan 26, 2019

kkucharc Jan 30, 2019

kkucharc commented Jan 30, 2019

kkucharc commented Jan 31, 2019

udim left a comment

kkucharc commented Feb 5, 2019

[BEAM-6291] Generic BigQuery schema load tests metrics #7614

[BEAM-6291] Generic BigQuery schema load tests metrics #7614

Conversation

kkucharc commented Jan 24, 2019

Post-Commit Tests Status (on master branch)

kkucharc commented Jan 25, 2019

udim left a comment

Choose a reason for hiding this comment

udim Jan 25, 2019

Choose a reason for hiding this comment

kkucharc Jan 30, 2019

Choose a reason for hiding this comment

udim Jan 26, 2019

Choose a reason for hiding this comment

kkucharc Jan 30, 2019

Choose a reason for hiding this comment

kkucharc commented Jan 30, 2019

kkucharc commented Jan 31, 2019

udim left a comment

Choose a reason for hiding this comment

kkucharc commented Feb 5, 2019