-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Write JSON manifest file to disk #761
Conversation
Leaves the voluptuous stuff in for now as other parts require it
Unit tests pass, at least.
… things that are going to be tricky
…t that has the agate_table in it.
…fely receive an UnparsedNode without having to set additionalProperties to true
…into the compiler, register macro loading with the graph loader.
…stuff. Punt on compiler for now
this looks great! since we paired on most of it, and it's a big change, i'm going to give @drewbanin a chance to review before merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One question for my edification, but this looks A+
COMPILED_NODE_CONTRACT = deep_merge( | ||
PARSED_NODE_CONTRACT, | ||
{ | ||
# TODO: when we add 'extra_ctes' back in, flip this back to False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the deal with extra_ctes
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They're stored as an OrderedDict
, which doesn't have any nice JSON representation (the standard technique I've seen is to use a list of dicts like [{'key': 'foo', 'value': 'bar'}, {'key': 'baz', 'value': 'quux'}, ...]
). So to store it as schema-validated JSON, we'd end up spending a lot of time packing/unpacking everything into/from the OrderedDict
/list-of-dicts.
A better longer-term path is probably to convert the extra_ctes
into just an array, or leave it out of the contract, like I did with the agate tables. I briefly tested out the array change and broke a lot of tests, so obviously there's something I don't understand going on.
There are a two significant components to this PR:
APIObject
subclasses instead of dictionaries and pass those around, and generate a manifest object (defined incontracts/graph/parsed.py
) that can be converted to a json-serializable dict for writing to disk, and to the flat_graph required by the compiler.Some side effects of note:
all
validator invalidation_utils
and writing a newany
one.UnparsedNode
object, sodbt.parser.parse_node
had some slight signature changes, anddbt.parser.parse_seed
now returns the unparsed node and the agate tables separatelyUnparsedNode
sdbt.loader.GraphLoader
a bit to only register node loaders, since macros have to be loaded before any of the nodes can be loaded anyway.We probably want to understand/examine the performance of serializing all this JSON to strings and writing that out like the current code does. It might be good to make writing the manifest an option depending upon how slow it is.
This PR looks quite a bit bigger than it really is, a large chunk of it is converting sets/tuples to lists and stuff like that so the json schema validator is happy.