Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CDK: make TypeTransformer more robust to incorrect incoming records #16544

Merged
merged 8 commits into from
Sep 13, 2022

Conversation

grubberr
Copy link
Contributor

@grubberr grubberr commented Sep 9, 2022

Signed-off-by: Sergey Chvalyuk grubberr@gmail.com

What

Try to fix
https://github.com/airbytehq/alpha-beta-issues/issues/222
#15405

There are cases which can kill transformer by exceptions, for example:

schema = {
    "type": "object",
    "properties": {
        "value": {
            "type": "array",
            "items": {"type": "string"}
        }
    }
}

record = {"value": {"key": "value"}}
TypeTransformer(TransformConfig.DefaultSchemaNormalization).transform(record, schema)

RuntimeError: dictionary changed size during iteration

Schema assume that "value" has to be array of strings, but we get sub-object.
In such cases transformer has to generate warning message, leave record as is and not be killed.

How

When we process "array" just make sure that instance variable of type list
if it's not list just skip it, we can do nothing in such cases.

Signed-off-by: Sergey Chvalyuk <grubberr@gmail.com>
@grubberr grubberr requested a review from a team as a code owner September 9, 2022 17:55
@github-actions github-actions bot added the CDK Connector Development Kit label Sep 9, 2022
@brianjlai
Copy link
Contributor

Changes seem pretty straightforward, but can you add some additional context to the PR description as to which connector you were upgrading and what exactly triggered the need for this change to the CDK?

Signed-off-by: Sergey Chvalyuk <grubberr@gmail.com>
Signed-off-by: Sergey Chvalyuk <grubberr@gmail.com>
@grubberr grubberr changed the title cdk transform fix CDK: make TypeTransformer more sustainable to incorrect incoming records Sep 9, 2022
@grubberr grubberr changed the title CDK: make TypeTransformer more sustainable to incorrect incoming records CDK: make TypeTransformer more robust to incorrect incoming records Sep 9, 2022
@grubberr grubberr linked an issue Sep 9, 2022 that may be closed by this pull request
@grubberr grubberr self-assigned this Sep 9, 2022
Signed-off-by: Sergey Chvalyuk <grubberr@gmail.com>
Signed-off-by: Sergey Chvalyuk <grubberr@gmail.com>
@@ -145,14 +145,14 @@ def resolve(subschema):
return subschema

# Transform object and array values before running json schema type checking for each element.
if schema_key == "properties":
if schema_key == "properties" and isinstance(instance, dict):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add the comment you added in the PR description as a source code comment to clarify for the next developer reading this why we verify the isinstance?

@grubberr
Copy link
Contributor Author

grubberr commented Sep 13, 2022

/publish-cdk dry-run=true

🕑 https://github.com/airbytehq/airbyte/actions/runs/3042879452
https://github.com/airbytehq/airbyte/actions/runs/3042879452

@grubberr
Copy link
Contributor Author

grubberr commented Sep 13, 2022

/publish-cdk dry-run=false

🕑 https://github.com/airbytehq/airbyte/actions/runs/3043056678
https://github.com/airbytehq/airbyte/actions/runs/3043056678

@grubberr grubberr merged commit 5a3b6d8 into master Sep 13, 2022
@grubberr grubberr deleted the grubberr/15405-airbyte_cdk_transform_fix_2 branch September 13, 2022 07:30
robbinhan pushed a commit to robbinhan/airbyte that referenced this pull request Sep 29, 2022
jhammarstedt pushed a commit to jhammarstedt/airbyte that referenced this pull request Oct 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CDK Connector Development Kit
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Source Mixpanel: Fails during import
3 participants