-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[bugfix] Fix for issue #109 #111
[bugfix] Fix for issue #109 #111
Conversation
If we do it the other way around, changing the record denesting instead to match the schema we get:
To get this change, the def _denest_record(table_path, record, records_map, key_properties, pk_fks, level):
""""""
"""
{...}
"""
denested_record = {}
for prop, value in record.items():
"""
str : {...} | [...] | None | <literal>
"""
if isinstance(value, dict):
"""
{...}
"""
_denest_subrecord(table_path + (prop,),
table_path + (prop,),
denested_record,
value,
records_map,
key_properties,
pk_fks,
level)
elif isinstance(value, list):
"""
[...]
"""
_denest_records(table_path + (prop,),
value,
records_map,
key_properties,
pk_fks=pk_fks,
level=level + 1)
elif value is None:
"""
None
"""
continue
else:
"""
<literal>
"""
denested_record[(prop,)] = (json_schema.python_type(value), value)
if table_path not in records_map:
records_map[table_path] = []
records_map[table_path].append(denested_record) |
I have prepared a branch for this other scenario and will push that as well and then you can decide what fix to use. |
Merge either this PR or #112 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@CStejmar we definitely want this over changing the records. The way you have this implemented here is great. Nice job tracking down the problem! I think we should merge this, and then rebase my tests fork on master and merge is as well. The tests which are therein are beneficial since this change:
a) doesn't break any tests 😬
b) doesn't fix and tests
@awm33 I think this fixes current broken schemas etc. There will be a problem with things where tables which have this problem will still have the busted column remaining. I'm not sure whether trying to create a migration for this is worth it. Thoughts?
(@CStejmar I'd approve but I'd like to sus out the current implications of the bug with @awm33 here first so we understand what's going to change etc. and whether we need to "fix" this for current schemas)
@AlexanderMann Thank you! Glad I can help! Yes I think we should do as you suggest, merge this and then rebase your tests on top of that 👍 . Regarding your list: Yes I understand, discuss it with @awm33 and get back to me. Btw, what do you mean with "whether we need to "fix" this for current schemas". Because the fix shouldn't break any schemas or records, only match them. However, current databases and their tables could change if the schema used to create them are complex (nested) enough. Am I right? |
I noticed one thing now when looking trough my tables in a test database. The objects looks fine and all data enters the database. However, naming of subtables (arrays in the schemas) miss their array name when we have arrays within arrays. The first array name is dropped in the table name. With denest fix for schema:
Without denest fix/fixing record instead:
The above has errors in the naming of nested objects as you can see in So my conclusion is that we need some more work with the fix regarding this denesting before merging. I will start looking into it now! |
Yeah @CStejmar I think you're correct here. In looking into the code, it looks like the And then on top of that, the denesting logic for Expected Behaviour@awm33 should confirm this, but my recollection of how this logic should be working is based on the StitchData docs Specifically, a JSON object can only have three cases:
For each case, we should be doing the following (schema and records alike):
Scalar
Object
Array
Changes which are probable necessary in the denesting logic...@CStejmar I'm curious to see where you get with this, but I'm nervous that for the current schema denesting logic we need to introduce the concept of Then, for records denesting we'll need to fix the issues around object denesting changing/not-changing the table path... If I get the time I'm going to update my fork/branch with the broken tests to also include tests for the additional bug you've got up top here. |
@AlexanderMann thanks for the breakdown of the problem! It is similar or exact to the thoughts I have had this morning :). I have been working with a fix for this today and just pushed it. It produces the correct and expected tables with all data in the database and also passes all tests present at the moment on the master branch. As you wrote, I needed to introduce the |
issue: datamill-co#109 The table_schemas and the table_records did not match for more complex and nested schemas.
55ef65a
to
3b865c0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@CStejmar this is 👍 from me. I also don't think that this will break/mess up any current usages of Target Postgres. Anyone who is using this now, who doesn't already have a problem, is someone who has nullable fields in their schema. Due to this, when we replicate, create new columns, and populate those, we'll have null values to populate the broken columns.
@awm33 if this gets a 👍 from you, I can merge and get Target Redshift etc. deployed.
Fix for issue #109
The table_schemas and the table_records did not match for more complex and nested schemas. This is now fixed and they match. All nested data that I had trouble with before now enters the database as expected. One question remains: Is this the way we want to organize/structure the tables and naming of them? I guess we could change the records denesting instead to match the schema we had before, that should also work and that will give us a slightly different structure/naming of tables and objects.
Below you find an example of how the tables for schema and record mismatched before and how they now match for a simple example using data from the
CATS_SCHEMA
in target-postgres. Look for thevaccination_type
especially.Before:
After:
In the database for this example it now looks like this:
As you can see we now have data in the field
vaccination_type__shot
which didn't work before.