Skip to content

Commit

Permalink
Fix ensuring we produce valid jsonschema artifacts for manifest, cata…
Browse files Browse the repository at this point in the history
…log, sources, and run-results (#9155)

* Drop `all_refs=True` from jsonschema-ization build process

Passing `all_refs=True` makes it so that Everything is a ref, even
the top level schema. In jsonschema land, this essentially makes the
produced artifact not a full schema, but a fractal object to be included
in a schema. Thus when `$id` is passed in, jsonschema tools blow up
because `$id` is for identifying a schema, which we explicitly weren't
creating. The alternative was to drop the inclusion of `$id`. Howver, we're
intending to create a schema, and having an `$id` is recommended best
practice. Additionally since we were intending to create a schema,
not a fractal, it seemed best to create to full schema.

* Explicity produce jsonschemas using DRAFT_2020_12 dialect

Previously were were implicitly using the `DRAFT_2020_12` dialect through
mashumaro. It felt wise to begin explicitly specifying this. First, it
is closest in available mashumaro provided dialects to what we produced
pre 1.7. Secondly, if mashumaro changes its default for whatever reason
(say a new dialect is added, and mashumaro moves to that), we don't want
to automatically inherit that.

* Bump manifest version to v12

Core 1.7 released with manifest v11, and we don't want to be overriding
that with 1.8. It'd be weird for 1.7 and 1.8 to both have v11 manifests,
but for them to be different, right?

* Begin including schema dialect specification in produced jsonschema

In jsonschema's documentation they state
> It's not always easy to tell which draft a JSON Schema is using.
> You can use the $schema keyword to declare which version of the JSON Schema specification the schema is written to.
> It's generally good practice to include it, though it is not required.

and

> For brevity, the $schema keyword isn't included in most of the examples in this book, but it should always be used in the real world.

Basically, to know how to parse a schema, it's important to include what
schema dialect is being used for the schema specification. The change in
this commit ensures we include that information.

* Create manifest v12 jsonschema specification

* Add change documentation for jsonschema schema production fix

* Bump run-results version to v6

* Generate new v6 run-results jsonschema

* Regenerate catalog v1 and sources v3 with fixed jsonschema production

* Update tests to handle bumped versions of manifest and run-results

(cherry picked from commit 0ab954e)
  • Loading branch information
QMalcolm authored and github-actions[bot] committed Dec 6, 2023
1 parent fc2d16f commit 30e160e
Show file tree
Hide file tree
Showing 12 changed files with 20,201 additions and 454 deletions.
7 changes: 7 additions & 0 deletions .changes/unreleased/Fixes-20231127-165244.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
kind: Fixes
body: Ensure we produce valid jsonschema schemas for manifest, catalog, run-results,
and sources
time: 2023-11-27T16:52:44.590313-08:00
custom:
Author: QMalcolm
Issue: "8991"
3 changes: 2 additions & 1 deletion core/dbt/contracts/graph/manifest.py
Original file line number Diff line number Diff line change
Expand Up @@ -1550,7 +1550,7 @@ def __init__(self, macros) -> None:


@dataclass
@schema_version("manifest", 11)
@schema_version("manifest", 12)
class WritableManifest(ArtifactMixin):
nodes: Mapping[UniqueID, ManifestNode] = field(
metadata=dict(description=("The nodes defined in the dbt project and its dependencies"))
Expand Down Expand Up @@ -1618,6 +1618,7 @@ def compatible_previous_versions(cls) -> Iterable[Tuple[str, int]]:
("manifest", 8),
("manifest", 9),
("manifest", 10),
("manifest", 11),
]

@classmethod
Expand Down
3 changes: 2 additions & 1 deletion core/dbt/contracts/results.py
Original file line number Diff line number Diff line change
Expand Up @@ -249,7 +249,7 @@ def write(self, path: str):


@dataclass
@schema_version("run-results", 5)
@schema_version("run-results", 6)
class RunResultsArtifact(ExecutionResult, ArtifactMixin):
results: Sequence[RunResultOutput]
args: Dict[str, Any] = field(default_factory=dict)
Expand All @@ -275,6 +275,7 @@ def from_execution_results(
def compatible_previous_versions(cls) -> Iterable[Tuple[str, int]]:
return [
("run-results", 4),
("run-results", 5),
]

@classmethod
Expand Down
3 changes: 2 additions & 1 deletion core/dbt/contracts/util.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
ValidationError,
)
from mashumaro.jsonschema import build_json_schema
from mashumaro.jsonschema.dialects import DRAFT_2020_12
import functools


Expand Down Expand Up @@ -197,7 +198,7 @@ class VersionedSchema(dbtClassMixin):
@classmethod
@functools.lru_cache
def json_schema(cls) -> Dict[str, Any]:
json_schema_obj = build_json_schema(cls, all_refs=True)
json_schema_obj = build_json_schema(cls, dialect=DRAFT_2020_12, with_dialect_uri=True)
json_schema = json_schema_obj.to_dict()
json_schema["$id"] = str(cls.dbt_schema_version)
return json_schema
Expand Down
Loading

0 comments on commit 30e160e

Please sign in to comment.