Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Low-code CDK] Add ability to remove fields #14402

Merged
merged 17 commits into from
Jul 12, 2022

Conversation

sherifnada
Copy link
Contributor

@sherifnada sherifnada commented Jul 5, 2022

What

Closes #12988

The main differences from the issue as spec'd by Alex:

  1. Transformations are defined on the Stream level rather than on the retriever or extractor. The reason is that transformations are probably going to be stream-specific rather than retrieval-strategy specific. Open to feedback on this though!
  2. Transformations are defined as an ordered list then applied to the records in the definition order. The main downside is that this is not "declarative". The main upside is that we push the concern of dependencies between transformations up to the developer. Otherwise there would be some weird undefined behavior (what if I remove a field then add a field with the same name, or add a field that depends on a field that is removed?).

Recommended reading order

  1. DeclarativeStream
  2. RecordTransformation
  3. RemoveFieldTransformation
  4. tests

@github-actions github-actions bot added the CDK Connector Development Kit label Jul 5, 2022
"""
for pointer in self._field_pointers:
try:
dpath.util.delete(record, pointer)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very cool! can we make sure there's a test for removing nested fields?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@sherifnada sherifnada marked this pull request as ready for review July 11, 2022 05:31
@sherifnada sherifnada requested a review from a team as a code owner July 11, 2022 05:31
@sherifnada sherifnada requested a review from girarda July 11, 2022 05:32
#


# The order of imports matters, so we add the split directive below to tell isort to sort imports while keeping RecordTransformation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does the order matter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because every other class in this package implements RecordTransformation, so if you try to import RemoveFields first, it'll try importing RecordTransformation which goes through this init.py. Added a comment to clarify.

During transformation, if a field or any of its parents does not exist in the record, no error is thrown.

If an input field pointer references an item in a list (e.g: ["k", 0] in the object {"k": ["a", "b", "c"]}) then
the object at that index is set to None rather than being not entirely removed from the list. TODO change this behavior.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for removing the None value from the list

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this is the better behavior but given that dpath implements it this way I hesitated on adding a custom implementation. After thinking about it, this seemed like an OK thing to leave as a TODO for the following reasons: this case (removing an element in an array) seems like an uncommon use case. The 98% case is removing the whole field. Plus, for some reasons outlined in this issue I opened in dpath I think implementing this correctly is tricky, because it will require shifting array indices in a stateful way.

WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed that this is not a common use case. sounds good to me!

@@ -0,0 +1,55 @@
#
# Copyright (c) 2022 Airbyte, Inc., all rights reserved.
#
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think CI will complain about a missing new line between the header and the imports

@sherifnada sherifnada merged commit 743e6c2 into master Jul 12, 2022
@sherifnada sherifnada deleted the sherif/yaml-cdk-field-transforms branch July 12, 2022 02:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CDK Connector Development Kit
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Low-code connectors: Support removing fields from records
2 participants