Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-6291] Generic BigQuery schema load tests metrics #7614

Merged

Conversation

kkucharc
Copy link
Contributor

It was decided to change tables to have metrics column where it will be name of metric which is collected.
Also added two minor changes:

  • added an env variable to disable load tests,
  • added pipeline option to disable saving metrics

Follow this checklist to help us incorporate your contribution quickly and easily:

  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

It will help us expedite review of your Pull Request if you tag someone (e.g. @username) to look at it.

Post-Commit Tests Status (on master branch)

Lang SDK Apex Dataflow Flink Gearpump Samza Spark
Go Build Status --- --- --- --- --- ---
Java Build Status Build Status Build Status Build Status
Build Status
Build Status
Build Status Build Status Build Status
Python Build Status --- Build Status
Build Status
--- --- --- ---

@kkucharc kkucharc force-pushed the BEAM-6291-generic-schema-for-BQ-load-tests branch from 08bb6e5 to de395da Compare January 24, 2019 17:23
@kkucharc
Copy link
Contributor Author

Hi @udim I know that you reviewed similar code for Java recently, so I hope that you will find a few minutes to take a look at this PR as well. I based my BQ changes on your code. Other two commits are small fixes.

@pabloem and @lgajowy I would be grateful if you will check this too.

Copy link
Member

@udim udim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for simplification and more consistency!
Please see my comments.

def _prepare_schema(self, schemas):
return [_get_schema_field(schema) for schema in schemas]
def _prepare_schema(self):
return [get_schema_field(row) for row in SCHEMA]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could be simplified, if you rename the key type to field_type in SCHEMA:

SCHEMA = [
    {'name': ID_LABEL,
     'field_type': 'STRING',
     'mode': 'REQUIRED'
    },
....

then this line could be simplified to:

return [SchemaField(**row) for row in SCHEMA]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot! That's really clever solution. I'm changing it this way.

def match_and_save(self, result_list):
rows_tuple = tuple(self._match_inserts_by_schema(result_list))
self._insert_data(rows_tuple)
def match_and_save(self, results_lists):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you document what type results_list is?
It seems that each item in results_list is a list of dictionaries, and each dict looks like:
{'label': SUBMIT_TIMESTAMP_LABEL, 'value': time.time()}
but I'm not 100% sure.

I think this module would be easier to understand if each item in results_list was a single dict:

{
  ID_LABEL: uuid,
  SUBMIT_TIMESTAMP_LABEL: time.time(),
  METRICS_TYPE_LABEL: RUNTIME_METRIC,
  VALUE_LABEL: value,
}

Note that _bq_client.insert_rows() also accepts a list of dicts so there would be no need to convert the above to tuple form.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you in 100%. I had same impression this naming is not so clear. I will refactor it according to suggestions. Hopefully it will simplify.

@kkucharc kkucharc force-pushed the BEAM-6291-generic-schema-for-BQ-load-tests branch from 8223ab0 to ff8d527 Compare January 30, 2019 16:35
@kkucharc
Copy link
Contributor Author

Thank you @udim for the review. It was really helpful. I applied your comments, do you think it looks ok now?

@kkucharc
Copy link
Contributor Author

Run Python PreCommit

Copy link
Member

@udim udim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!
Sorry for the delay

@kkucharc
Copy link
Contributor Author

kkucharc commented Feb 5, 2019

No problem, thank you @udim !

@pabloem pabloem merged commit 09996b6 into apache:master Feb 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants