From 28d7f284fa8df2f33842f8b4c7afd43c5c5ce410 Mon Sep 17 00:00:00 2001 From: Peter Lamut Date: Tue, 19 Oct 2021 22:16:20 +0300 Subject: [PATCH 1/7] docs: add migration guide from version 2.x. to 3.x --- UPGRADING.md | 103 ++++++++++++++++++++++++++++++++++++++++++++++++- docs/index.rst | 3 +- 2 files changed, 104 insertions(+), 2 deletions(-) diff --git a/UPGRADING.md b/UPGRADING.md index c75c4fddb..32564178d 100644 --- a/UPGRADING.md +++ b/UPGRADING.md @@ -13,7 +13,108 @@ limitations under the License. # 3.0.0 Migration Guide -TODO +## New Required Dependencies + +Some of the previously optional dependencies are now *required* in `3.x` versions of the +library, namely +[google-cloud-bigquery-storage](https://pypi.org/project/google-cloud-bigquery-storage/) +(minimum version `2.0.0`) and [pyarrow](https://pypi.org/project/pyarrow/) (minimum +version `3.0.0`). + +The behavior of some of the package "extras" has thus also changed: + * The `bqstorage` extra has been preserved for comaptibility reasons, but it is now a + no-op and should be omitted when installing the BigQuery client library. + + **Before:** + ``` + $ pip install google-cloud-bigquery[bqstorage] + ``` + + **After:** + ``` + $ pip install google-cloud-bigquery + ``` + + * The `bignumeric_type` extra has been removed, as `BIGNUMERIC` type is now + automatically supported. That extra should thus not be used. + + **Before:** + ``` + $ pip install google-cloud-bigquery[bignumeric_type] + ``` + + **After:** + ``` + $ pip install google-cloud-bigquery + ``` + +## Re-organized Types + +The auto-generated parts of the library has been removed, and proto-based types formerly +found in `google.cloud.bigquery_v2` have been replaced by the new implementation (but +see the [section](#legacy-types) below). + +For example, the standard SQL data types should new be imported from a new location: + +**Before:** +```py +from google.cloud.bigquery_v2 import StandardSqlDataType +from google.cloud.bigquery_v2.types import StandardSqlField +from google.cloud.bigquery_v2.types.standard_sql import StandardSqlStructType +``` + +**After:** +```py +from google.cloud.bigquery import StandardSqlDataType +from google.cloud.bigquery.standard_sql import StandardSqlField +from google.cloud.bigquery.standard_sql import StandardSqlStructType +``` + +The `TypeKind` enum defining all possible SQL types for schema fields has been renamed +and is not nested anymore under `StandardSqlDataType`: + + +**Before:** +```py +from google.cloud.bigquery_v2 import StandardSqlDataType + +if field_type == StandardSqlDataType.TypeKind.STRING: + ... +``` + +**After:** +```py + +from google.cloud.bigquery import StandardSqlTypeNames + +if field_type == StandardSqlTypeNames.STRING: + ... +``` + + + +## Legacy Types + +For compatibility reasons, the legacy proto-based types still exists as static code +and can be imported: + +```py +from google.cloud.bigquery_v2 import StandardSqlDataType # a sublcass of proto.Message +``` + +Mind, however, that importing them will issue a warning, because aside from being +importable, these types **are not maintained anymore** in any way. They may differ both +from the types in `google.cloud.bigquery`, and from the types supported on the backend. + +Unless you have a very specific situation that warrants using them, you should instead +use the actively maintained types from `google.cloud.bigquery`. + + +## Destination Table is Preserved on Query Jobs + +When the BigQuery client creates a `QueryJob`, it no longer removes the destination +table from the job's configuration. Destination table for the query can thus be +explicitly defined by the user. # 2.0.0 Migration Guide diff --git a/docs/index.rst b/docs/index.rst index 3f8ba2304..4ab0a298d 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -30,7 +30,8 @@ API Reference Migration Guide --------------- -See the guide below for instructions on migrating to the 2.x release of this library. +See the guides below for instructions on migrating from older to newer *major* releases +of this library (from ``1.x`` to ``2.x``, or from ``2.x`` to ``3.x``). .. toctree:: :maxdepth: 2 From babf0d196b3b22e2c3ee1abd35022153c3a38e56 Mon Sep 17 00:00:00 2001 From: Peter Lamut Date: Thu, 11 Nov 2021 12:45:25 +0200 Subject: [PATCH 2/7] Add a section on typee annotations --- UPGRADING.md | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/UPGRADING.md b/UPGRADING.md index 32564178d..748703e86 100644 --- a/UPGRADING.md +++ b/UPGRADING.md @@ -117,6 +117,24 @@ table from the job's configuration. Destination table for the query can thus be explicitly defined by the user. +## Type Annotations + +The library is now type-annotated and declares itself as such. If you use a static +type checker such as `mypy`, you might start getting errors in places where +`google-cloud-bigquery` package is used. + +It is recommended to update your code and/or type annotations to fix these errors, but +if this is not feasible in the short term, you can temporarily ignore type annotations +in `google-cloud-bigquery`, for example by using a special `# type: ignore` comment: + +```py +from google.cloud import bigquery # type: ignore +``` + +But again, this is only recommended as a possible short-term workaround if immediately +fixing the type check errors in your project is not feasible. + + # 2.0.0 Migration Guide The 2.0 release of the `google-cloud-bigquery` client drops support for Python From c29c8502cdf7aaa2673cb7af6f68d4998b2ade92 Mon Sep 17 00:00:00 2001 From: Peter Lamut Date: Mon, 15 Nov 2021 17:50:55 +0200 Subject: [PATCH 3/7] Explain additional requirement of pandas extra --- UPGRADING.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/UPGRADING.md b/UPGRADING.md index 748703e86..636293b5d 100644 --- a/UPGRADING.md +++ b/UPGRADING.md @@ -22,6 +22,8 @@ library, namely version `3.0.0`). The behavior of some of the package "extras" has thus also changed: + * The `pandas` extra now requires the [db-types](https://pypi.org/project/db-dtypes/) + package. * The `bqstorage` extra has been preserved for comaptibility reasons, but it is now a no-op and should be omitted when installing the BigQuery client library. From 691409d3996029681372848218d61f210143f838 Mon Sep 17 00:00:00 2001 From: Peter Lamut Date: Mon, 15 Nov 2021 18:11:40 +0200 Subject: [PATCH 4/7] Mention new default type for TZ-aware datetimes --- UPGRADING.md | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/UPGRADING.md b/UPGRADING.md index 636293b5d..cda074bf0 100644 --- a/UPGRADING.md +++ b/UPGRADING.md @@ -119,6 +119,21 @@ table from the job's configuration. Destination table for the query can thus be explicitly defined by the user. +## Changed Default Inferred Type for Naive `datetime` Instances. + +In the absence of schema information, columns with naive `datetime.datetime` values, +i.e. without timezone information, are recognized and loaded using the `DATETIME` type. +On the other hand, for columns with timezone-aware `datetime.dateime` values, the +`TIMESTAMP` type is continued to be used. + + +## Destination Table is Preserved on Query Jobs + +When the BigQuery client creates a `QueryJob`, it no longer removes the destination +table from the job's configuration. Destination table for the query can thus be +explicitly defined by the user. + + ## Type Annotations The library is now type-annotated and declares itself as such. If you use a static From 47f8e018234b751ad6b1e53ce2e642f66ee557d0 Mon Sep 17 00:00:00 2001 From: Tim Swast Date: Wed, 19 Jan 2022 17:05:20 -0600 Subject: [PATCH 5/7] rearrange and add a section --- UPGRADING.md | 77 ++++++++++++++++++++++++++-------------------------- 1 file changed, 38 insertions(+), 39 deletions(-) diff --git a/UPGRADING.md b/UPGRADING.md index cda074bf0..0342fee02 100644 --- a/UPGRADING.md +++ b/UPGRADING.md @@ -50,6 +50,24 @@ The behavior of some of the package "extras" has thus also changed: $ pip install google-cloud-bigquery ``` + +## Type Annotations + +The library is now type-annotated and declares itself as such. If you use a static +type checker such as `mypy`, you might start getting errors in places where +`google-cloud-bigquery` package is used. + +It is recommended to update your code and/or type annotations to fix these errors, but +if this is not feasible in the short term, you can temporarily ignore type annotations +in `google-cloud-bigquery`, for example by using a special `# type: ignore` comment: + +```py +from google.cloud import bigquery # type: ignore +``` + +But again, this is only recommended as a possible short-term workaround if immediately +fixing the type check errors in your project is not feasible. + ## Re-organized Types The auto-generated parts of the library has been removed, and proto-based types formerly @@ -94,62 +112,43 @@ if field_type == StandardSqlTypeNames.STRING: ``` - -## Legacy Types - -For compatibility reasons, the legacy proto-based types still exists as static code -and can be imported: - -```py -from google.cloud.bigquery_v2 import StandardSqlDataType # a sublcass of proto.Message -``` - -Mind, however, that importing them will issue a warning, because aside from being -importable, these types **are not maintained anymore** in any way. They may differ both -from the types in `google.cloud.bigquery`, and from the types supported on the backend. - -Unless you have a very specific situation that warrants using them, you should instead -use the actively maintained types from `google.cloud.bigquery`. - - -## Destination Table is Preserved on Query Jobs +## Issuing queries with `Client.create_job` preserves destination table -When the BigQuery client creates a `QueryJob`, it no longer removes the destination -table from the job's configuration. Destination table for the query can thus be +The `Client.create_job` method no longer removes the destination table from a +query job's configuration. Destination table for the query can thus be explicitly defined by the user. -## Changed Default Inferred Type for Naive `datetime` Instances. +## Changes to data types when reading a pandas DataFrame + +TODO +## Changes to data types loading a pandas DataFrame In the absence of schema information, columns with naive `datetime.datetime` values, i.e. without timezone information, are recognized and loaded using the `DATETIME` type. On the other hand, for columns with timezone-aware `datetime.dateime` values, the `TIMESTAMP` type is continued to be used. +## Changes to get_model and list_models -## Destination Table is Preserved on Query Jobs - -When the BigQuery client creates a `QueryJob`, it no longer removes the destination -table from the job's configuration. Destination table for the query can thus be -explicitly defined by the user. - - -## Type Annotations +TODO -The library is now type-annotated and declares itself as such. If you use a static -type checker such as `mypy`, you might start getting errors in places where -`google-cloud-bigquery` package is used. + +## Legacy Types -It is recommended to update your code and/or type annotations to fix these errors, but -if this is not feasible in the short term, you can temporarily ignore type annotations -in `google-cloud-bigquery`, for example by using a special `# type: ignore` comment: +For compatibility reasons, the legacy proto-based types still exists as static code +and can be imported: ```py -from google.cloud import bigquery # type: ignore +from google.cloud.bigquery_v2 import StandardSqlDataType # a sublcass of proto.Message ``` -But again, this is only recommended as a possible short-term workaround if immediately -fixing the type check errors in your project is not feasible. +Mind, however, that importing them will issue a warning, because aside from being +importable, these types **are not maintained anymore** in any way. They may differ both +from the types in `google.cloud.bigquery`, and from the types supported on the backend. + +Unless you have a very specific situation that warrants using them, you should instead +use the actively maintained types from `google.cloud.bigquery`. # 2.0.0 Migration Guide From 760fd995cde78494f7abc5b57b81b5800b5a5f4b Mon Sep 17 00:00:00 2001 From: Tim Swast Date: Thu, 20 Jan 2022 12:37:13 -0600 Subject: [PATCH 6/7] start documenting model properties that have changed --- UPGRADING.md | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/UPGRADING.md b/UPGRADING.md index 0342fee02..985bf0f0b 100644 --- a/UPGRADING.md +++ b/UPGRADING.md @@ -129,9 +129,12 @@ i.e. without timezone information, are recognized and loaded using the `DATETIME On the other hand, for columns with timezone-aware `datetime.dateime` values, the `TIMESTAMP` type is continued to be used. -## Changes to get_model and list_models +## Changes to `Model`, `Client.get_model`, `Client.update_model`, and `Client.list_models` -TODO +The types of several `Model` properties have been changed. + +- `Model.feature_columns` now returns a sequence of dictionaries, as recieved from the [BigQuery REST API](https://cloud.google.com/bigquery/docs/reference/rest/v2/models#Model.FIELDS.feature_columns). +- `Model.label_columns` now returns a sequence of dictionaries, as recieved from the [BigQuery REST API](https://cloud.google.com/bigquery/docs/reference/rest/v2/models#Model.FIELDS.label_columns). ## Legacy Types From 9cc3c867efb5a4e358480db700e7c9328b5dfa0a Mon Sep 17 00:00:00 2001 From: Tim Swast Date: Thu, 20 Jan 2022 16:35:17 -0600 Subject: [PATCH 7/7] add table of changes for pandas and Model --- UPGRADING.md | 72 +++++++++++++++++++++++++++++++++++++++++----------- 1 file changed, 57 insertions(+), 15 deletions(-) diff --git a/UPGRADING.md b/UPGRADING.md index 985bf0f0b..95f87f7ee 100644 --- a/UPGRADING.md +++ b/UPGRADING.md @@ -121,38 +121,80 @@ explicitly defined by the user. ## Changes to data types when reading a pandas DataFrame -TODO +The default dtypes returned by the `to_dataframe` method have changed. + +* Now, the BigQuery `BOOLEAN` data type maps to the pandas `boolean` dtype. + Previously, this mapped to the pandas `bool` dtype when the column did not + contain `NULL` values and the pandas `object` dtype when `NULL` values are + present. +* Now, the BigQuery `INT64` data type maps to the pandas `Int64` dtype. + Previously, this mapped to the pandas `int64` dtype when the column did not + contain `NULL` values and the pandas `float64` dtype when `NULL` values are + present. +* Now, the BigQuery `DATE` data type maps to the pandas `dbdate` dtype, which + is provided by the + [db-dtypes](https://googleapis.dev/python/db-dtypes/latest/index.html) + package. If any date value is outside of the range of + [pandas.Timestamp.min](https://pandas.pydata.org/docs/reference/api/pandas.Timestamp.min.html) + (1677-09-22) and + [pandas.Timestamp.max](https://pandas.pydata.org/docs/reference/api/pandas.Timestamp.max.html) + (2262-04-11), the data type maps to the pandas `object` dtype. The + `date_as_object` parameter has been removed. +* Now, the BigQuery `TIME` data type maps to the pandas `dbtime` dtype, which + is provided by the + [db-dtypes](https://googleapis.dev/python/db-dtypes/latest/index.html) + package. + + ## Changes to data types loading a pandas DataFrame -In the absence of schema information, columns with naive `datetime.datetime` values, -i.e. without timezone information, are recognized and loaded using the `DATETIME` type. -On the other hand, for columns with timezone-aware `datetime.dateime` values, the -`TIMESTAMP` type is continued to be used. +In the absence of schema information, pandas columns with naive +`datetime64[ns]` values, i.e. without timezone information, are recognized and +loaded using the `DATETIME` type. On the other hand, for columns with +timezone-aware `datetime64[ns, UTC]` values, the `TIMESTAMP` type is continued +to be used. ## Changes to `Model`, `Client.get_model`, `Client.update_model`, and `Client.list_models` The types of several `Model` properties have been changed. -- `Model.feature_columns` now returns a sequence of dictionaries, as recieved from the [BigQuery REST API](https://cloud.google.com/bigquery/docs/reference/rest/v2/models#Model.FIELDS.feature_columns). -- `Model.label_columns` now returns a sequence of dictionaries, as recieved from the [BigQuery REST API](https://cloud.google.com/bigquery/docs/reference/rest/v2/models#Model.FIELDS.label_columns). +- `Model.feature_columns` now returns a sequence of `google.cloud.bigquery.standard_sql.StandardSqlField`. +- `Model.label_columns` now returns a sequence of `google.cloud.bigquery.standard_sql.StandardSqlField`. +- `Model.model_type` now returns a string. +- `Model.training_runs` now returns a sequence of dictionaries, as recieved from the [BigQuery REST API](https://cloud.google.com/bigquery/docs/reference/rest/v2/models#Model.FIELDS.training_runs). - -## Legacy Types + +## Legacy Protocol Buffers Types For compatibility reasons, the legacy proto-based types still exists as static code and can be imported: ```py -from google.cloud.bigquery_v2 import StandardSqlDataType # a sublcass of proto.Message +from google.cloud.bigquery_v2 import Model # a sublcass of proto.Message ``` -Mind, however, that importing them will issue a warning, because aside from being -importable, these types **are not maintained anymore** in any way. They may differ both -from the types in `google.cloud.bigquery`, and from the types supported on the backend. +Mind, however, that importing them will issue a warning, because aside from +being importable, these types **are not maintained anymore**. They may differ +both from the types in `google.cloud.bigquery`, and from the types supported on +the backend. + +### Maintaining compatibility with `google-cloud-bigquery` version 2.0 + +If you maintain a library or system that needs to support both +`google-cloud-bigquery` version 2.x and 3.x, it is recommended that you detect +when version 2.x is in use and convert properties that use the legacy protocol +buffer types, such as `Model.training_runs`, into the types used in 3.x. -Unless you have a very specific situation that warrants using them, you should instead -use the actively maintained types from `google.cloud.bigquery`. +Call the [`to_dict` +method](https://proto-plus-python.readthedocs.io/en/latest/reference/message.html#proto.message.Message.to_dict) +on the protocol buffers objects to get a JSON-compatible dictionary. +```py +from google.cloud.bigquery_v2 import Model + +training_run: Model.TrainingRun = ... +training_run_dict = training_run.to_dict() +``` # 2.0.0 Migration Guide