BigQuery: Add support to Dataset for project_ids with org prefix. #8877

emar-kar · 2019-08-01T13:32:10Z

Closes: #8646

added support to Dataset for project_ids with org prefix

updated tests to check dataset chgs

IlyaFaer · 2019-08-01T15:09:40Z

@emar-kar, you should run black reformat on bigquery/dataset.py file to get lint session OK.
Also there are not covered lines of code:
google/cloud/bigquery/dataset.py 309, 306->309
See:
https://source.cloud.google.com/results/invocations/5834d3bb-e5dc-4f3f-aeba-34ddf36780be/targets/cloud-devrel%2Fclient-libraries%2Fgoogle-cloud-python%2Fpresubmit%2Fbigquery/log

bigquery/google/cloud/bigquery/dataset.py

tswast · 2019-08-02T23:14:03Z

bigquery/google/cloud/bigquery/dataset.py

+        if with_prefix is None:
+            parts = dataset_id.split(".")
+        else:
+            parts = with_prefix.group("ref").split(".")


I'm confused. What is this doing? I think the prefix needs to be part of the project ID, right?

From the issue's Stack trace:
The error occurs due to the prefix google.com:. Previously the passed string was separated only by ., what led to ValueError raising because of the len(parts) > 2 at google.com:[project].ryan_dataset. As I see here prefix is not the part of the Project ID itself. I was trying to find out, how could I parse the string and fulfill both the previous and new format. That is why I decided to use regular expressions. Now, with template's help, it will separate prefix and solve several situations:

string-project.string_dataset - will pass the template successfully as it was before;

prefix:string-project.string_dataset - will group the part without prefix and then will divide it;

string-project:string_dataset - if the default_project was not defined raises ValueError;

google.com:project:dataset_id - same as above.

ValueError: Too many parts in dataset_id. Expected a fully-qualified dataset ID in standard SQL format. e.g. "project.dataset_id", got google.com:[project].ryan_dataset

Right, but it appears to me that we're discarding the prefix? Is that correct?

Yeah, that is what I was thinking before this conversation. So, now I'm a bit confused. I thought the prefix is an extra part and should be just removed. But if it is actually the part of the Project ID, I'll need to reconfigure the pattern.

Applying requested chgs. // Removed description for 'single prefix'.

tswast · 2019-08-07T23:29:50Z

bigquery/google/cloud/bigquery/dataset.py

@@ -26,6 +27,14 @@
 from google.cloud.bigquery.table import TableReference


+_W_PREFIX = re.compile(


Can we pick a better name for this? Maybe _PROJECT_PREFIX_PATTERN?

tswast · 2019-08-07T23:33:13Z

bigquery/google/cloud/bigquery/dataset.py

@@ -26,6 +27,14 @@
 from google.cloud.bigquery.table import TableReference


+_W_PREFIX = re.compile(
+    r"""
+    (\S*)\:(?P<ref>\S*)


Since at least one character is required, this should probably be \S+, right?

Also, ref isn't all that meaningful to me. How about remaining, since it's everything after the : character?

tswast · 2019-08-07T23:33:59Z

bigquery/google/cloud/bigquery/dataset.py

+        if with_prefix is None:
+            parts = dataset_id.split(".")
+        else:
+            parts = with_prefix.group("ref").split(".")


Right, but it appears to me that we're discarding the prefix? Is that correct?

tswast · 2019-08-07T23:34:27Z

bigquery/tests/unit/test_dataset.py

+    def test_from_string_w_prefix(self):
+        cls = self._get_target_class()
+        got = cls.from_string("prefix:string-project.string_dataset")
+        self.assertEqual(got.project, "string-project")


Shouldn't this be prefix:string-project, since the prefix is actually part of the project ID?

Complete template change.

tswast · 2019-08-09T16:15:22Z

bigquery/google/cloud/bigquery/dataset.py

@@ -26,6 +27,14 @@
 from google.cloud.bigquery.table import TableReference


+_PROJECT_PREFIX_PATTERN = re.compile(
+    r"""
+    (?P<prefix>\S+\:\S+)\.+(?P<remaining>\S*)


Now that we're matching this way, prefix isn't the right term. Should be project_id. Likewise, remaining should be renamed to dataset_id.

Also, instead of \S, we should be matching for characters other than ., that is [^.]+.

We want to match the whole string, so we should probably end this pattern with $.

I renamed parts of the pattern, but the second comment about [^.] seems inappropriate to me. As we know the string could be google.com:project.dataset, that means that dot could be a part of the prefix. I checked couple of variants and as I see \S fits more.

tswast · 2019-08-09T16:15:33Z

bigquery/google/cloud/bigquery/dataset.py

@@ -26,6 +27,14 @@
 from google.cloud.bigquery.table import TableReference


+_PROJECT_PREFIX_PATTERN = re.compile(
+    r"""


Why are we using a multi-line string here?

Just for the readability. I think I’ll switch this to the single line, after correcting the pattern implementation.

minor corrections

tswast · 2019-08-14T20:06:08Z

bigquery/tests/unit/test_dataset.py

    def test_from_string_legacy_string(self):
        cls = self._get_target_class()
        with self.assertRaises(ValueError):
            cls.from_string("string-project:string_dataset")

+    def test_from_string_w_incorrect_prefix(self):


Let's add an additional test where the project ID / dataset ID contains an illegal . character. Another way to say that, is the string contains too many "parts". e.g. google.com:project-id.dataset_id.table_id. This should also fail with ValueError.

tswast · 2019-08-14T20:07:44Z

bigquery/google/cloud/bigquery/dataset.py

@@ -26,6 +27,9 @@
 from google.cloud.bigquery.table import TableReference


+_PROJECT_PREFIX_PATTERN = re.compile(r"(?P<project_id>\S+\:\S+)\.+(?P<dataset_id>\S+)$")


I think this will match patterns with too many . characters. Let's try something like:

(?P<project_id>\S+\:[^.]+)\.(?P<dataset_id>[^.]+)$

I see what you mean, sorry for misunderstanding. Appreciate your help.

pattern rewrote with the '[^.]' and .VERBOSE (due to blacken session) added test to check extra parts within the string with the prefix reconf prefix in an existed test

…ogleapis#8877)

emar-kar added 4 commits July 30, 2019 17:43

Update dataset.py

2526dc7

added support to Dataset for project_ids with org prefix

Update test_dataset.py

533f959

updated tests to check dataset chgs

minor chgs

009ef57

*

452adf8

googlebot added the cla: yes This human has signed the Contributor License Agreement. label Aug 1, 2019

IlyaFaer added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Aug 1, 2019

yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Aug 1, 2019

IlyaFaer added the api: bigquery Issues related to the BigQuery API. label Aug 1, 2019

fixed tests issue

3e04273

IlyaFaer added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Aug 1, 2019

yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Aug 1, 2019

IlyaFaer requested a review from tswast August 2, 2019 07:47

IlyaFaer marked this pull request as ready for review August 2, 2019 07:47

IlyaFaer requested a review from a team August 2, 2019 07:47

tswast reviewed Aug 2, 2019

View reviewed changes

minor corrections

7bc877f

Applying requested chgs. // Removed description for 'single prefix'.

tswast self-requested a review August 6, 2019 00:27

tswast requested changes Aug 7, 2019

View reviewed changes

major corrections

4c13b06

Complete template change.

AVaksman added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Aug 8, 2019

yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Aug 8, 2019

tswast reviewed Aug 9, 2019

View reviewed changes

pattern update

07dde13

minor corrections

tseaver added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Aug 12, 2019

yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Aug 12, 2019

tswast requested changes Aug 14, 2019

View reviewed changes

update pattern and tests

e75e0fc

pattern rewrote with the '[^.]' and .VERBOSE (due to blacken session) added test to check extra parts within the string with the prefix reconf prefix in an existed test

IlyaFaer added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Aug 15, 2019

yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Aug 15, 2019

tswast approved these changes Aug 16, 2019

View reviewed changes

IlyaFaer merged commit 2ab105b into googleapis:master Aug 22, 2019

emar-kar deleted the adding-support-2-project_ids-w-org-prefix branch August 26, 2019 11:36

tswast mentioned this pull request Aug 26, 2019

BigQuery: Table.from_string() doesn't support GCP project ID with org prefix #7827

Closed

HemangChothani pushed a commit to HemangChothani/google-cloud-python that referenced this pull request Aug 29, 2019

BigQuery: Add support to Dataset for project_ids with org prefix. (go…

67f81e9

…ogleapis#8877)

emar-kar added a commit to MaxxleLLC/google-cloud-python that referenced this pull request Sep 18, 2019

BigQuery: Add support to Dataset for project_ids with org prefix. (go…

7502a1a

…ogleapis#8877)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BigQuery: Add support to Dataset for project_ids with org prefix. #8877

BigQuery: Add support to Dataset for project_ids with org prefix. #8877

emar-kar commented Aug 1, 2019

IlyaFaer commented Aug 1, 2019

tswast Aug 2, 2019

emar-kar Aug 5, 2019

tswast Aug 7, 2019

emar-kar Aug 8, 2019

tswast Aug 7, 2019

tswast Aug 7, 2019

tswast Aug 7, 2019

tswast Aug 7, 2019

tswast Aug 9, 2019

emar-kar Aug 12, 2019

tswast Aug 9, 2019

emar-kar Aug 9, 2019

tswast Aug 14, 2019

tswast Aug 14, 2019

emar-kar Aug 15, 2019

		@@ -26,6 +27,14 @@
		from google.cloud.bigquery.table import TableReference


		_W_PREFIX = re.compile(

		@@ -26,6 +27,9 @@
		from google.cloud.bigquery.table import TableReference


		_PROJECT_PREFIX_PATTERN = re.compile(r"(?P<project_id>\S+\:\S+)\.+(?P<dataset_id>\S+)$")

BigQuery: Add support to Dataset for project_ids with org prefix. #8877

BigQuery: Add support to Dataset for project_ids with org prefix. #8877

Conversation

emar-kar commented Aug 1, 2019

IlyaFaer commented Aug 1, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment