-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support persist docs on BigQuery: columns #2335
Comments
Just like in the generated docs, could the column descriptions also include the schema tests? We are using the schema tests as documentation as well as tests, and I think it would be helpful to show whether a column is UNIQUE, NOT NULL, etc in BigQuery too. |
I think I'd recommend using the dbt documentation website for that purpose. I'm super on board with persisting column descriptions into the database, but I'd hesitate to add a bunch of free-form text generated by dbt at the same time. What I can imagine is providing tools to generate a list of tests for a column as a string, then include that string in the stated column description. I'm also open to figuring out a way to add this to every column comment so that it happens automatically, but that's out of scope for the feature described in this issue IMO |
In case it helps, here's the code we use for this: from typing import Dict, List, Sequence
from google.cloud import bigquery
from google.cloud.bigquery.schema import SchemaField
def update_schema_documentation(
client: bigquery.Client,
table_name: str,
table_description: str,
field_descriptions: Dict[str, str],
) -> None: # pragma: no cover
"""Update the documentation for a BigQuery table.
Known limitations:
- Does not support nested field definitions
Feel free to fix these limitations!
"""
table = client.get_table(table_name)
table.description = table_description
table.schema = _document_schema(table_name, table.schema, field_descriptions)
client.update_table(table, ["description", "schema"])
def _document_schema(
table_name: str, schema: Sequence[SchemaField], field_descriptions: Dict[str, str]
) -> List[SchemaField]:
"""Create a documented version of the given BigQuery schema."""
existing_fields = set()
new_schema = []
for field in schema:
description = field_descriptions.get(field.name)
new_schema.append(
SchemaField(
name=field.name,
field_type=field.field_type,
mode=field.mode,
description=description,
fields=field.fields,
)
)
existing_fields.add(field.name)
undescribed_fields = existing_fields - set(field_descriptions)
# TODO: Also raise an exception if we have described fields that don't exist.
if undescribed_fields:
# TODO: Raise a more specific exception so we don't have to pass table_name in.
raise RuntimeError(f"Unexpected fields defined in {table_name}: {undescribed_fields}")
return new_schema Tests: def test_document_schema_undocumented_fields() -> None:
# _document_schema raises RuntimeError if we are missing documentation for
# a field.
with pytest.raises(RuntimeError):
_document_schema("foo", [SchemaField("name", "INTEGER")], {})
def test_document_schema_empty() -> None:
# We successfully document an empty schema by returning the same thing.
assert _document_schema("foo", [], {}) == []
def test_document_schema_adds_docs() -> None:
# _document_schema returns a new schema with the given descriptions added.
new_schema = _document_schema(
"table",
[SchemaField("foo", "INTEGER"), SchemaField("bar", "STRING")],
{"foo": "Full of fooness", "bar": "As you'd expect"},
)
assert new_schema == [
SchemaField("foo", "INTEGER", "NULLABLE", "Full of fooness"),
SchemaField("bar", "STRING", "NULLABLE", "As you'd expect"),
] |
thanks @jml - that's super helpful! Are you able to submit a PR with a change like this? It would be super welcomed! |
I'd genuinely love to, but I doubt I'll have an opportunity to do so any time soon. |
I need this feature, so I'll take a stab at it. But I'm new to the codebase so might be slow... if anybody else wants to work on this please do! |
hey @oliverxchen - we're wrapping up all of the code for v0.17.0, and we anticipate releasing a "release candidate" in the next couple of days! To that end, I submitted #2402 which adds column-level docs persistence for BigQuery. Just wanted to give you a quick heads up in case this was something you're still chipping away at. Let me know if there's another issue that catches your eye - would be happy to help out with a contribution however I can! |
@drewbanin great, thank you for the feature and up coming release! Sorry I couldn't get this in, was still trying to understand the code. I'll look for other issues I can try to help on |
Hello guys, |
Child issue of #1573
To do:
If the project specifies a column that does not exist in the destination table, the
comment on column
statement may fail. This is the desired behavior.BigQuery does not support adding column-level descriptions in DDL unless all columns in the relation are specified. Instead, we should pursue the following approach:
The text was updated successfully, but these errors were encountered: