Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Databuilder support for nested columns #1695

Merged

Conversation

kristenarmes
Copy link
Contributor

@kristenarmes kristenarmes commented Feb 2, 2022

Signed-off-by: Kristen Armes karmes@lyft.com

Summary of Changes

This PR adds databuilder support for nested columns by introducing a new TypeMetadata base class, along with Map/Array/Struct/Scalar subclasses (with opportunity to add more as needed). Columns will have an empty TypeMetadata by default and Amundsen will not require it for the time being. To eliminate breaking changes, setting TypeMetadata is done via the ComplexTypeTransformer. With time this logic can easily be integrated into extractors themselves.

The TypeMetadata classes are intended to be general and reusable. Syntax-specific parsers will need to be added per-database for converting type strings into structured objects. In this PR we have implemented a Hive parser that uses pyparsing.

After this we'll work on the APIs and then the frontend.

Tests

Added test cases in test_table_metadata.py, test_type_metadata.py, test_complex_type_transformer.py, test_hive_complex_type_parser.py

Documentation

Added details about the ComplexTypeTransformer in databuilder/README.md

CheckList

Make sure you have checked all steps below to ensure a timely review.

  • PR title addresses the issue accurately and concisely. Example: "Updates the version of Flask to v1.0.2"
  • PR includes a summary of changes.
  • PR adds unit tests, updates existing unit tests, OR documents why no test additions or modifications are needed.
  • In case of new functionality, my PR adds documentation that describes how to use it.
    • All the public functions and the classes in the PR contain docstrings that explain what it does

Signed-off-by: Kristen Armes <karmes@lyft.com>
@boring-cyborg boring-cyborg bot added area:databuilder From databuilder folder category:models labels Feb 2, 2022
Signed-off-by: Kristen Armes <karmes@lyft.com>
Copy link

@dkunitsk dkunitsk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No major problems, looks great. A few tweaks I'd suggest, let's discuss IRL.

Signed-off-by: Kristen Armes <karmes@lyft.com>
…g extra StructItem class and using dict for structs instead, and implemented ScalarTypeMetadata to use as the terminal type rather than string

Signed-off-by: Kristen Armes <karmes@lyft.com>
Signed-off-by: Kristen Armes <karmes@lyft.com>
Signed-off-by: Kristen Armes <karmes@lyft.com>
databuilder/databuilder/models/type_metadata.py Outdated Show resolved Hide resolved
databuilder/databuilder/models/type_metadata.py Outdated Show resolved Hide resolved
databuilder/databuilder/models/type_metadata.py Outdated Show resolved Hide resolved
databuilder/databuilder/models/type_metadata.py Outdated Show resolved Hide resolved
databuilder/databuilder/models/type_metadata.py Outdated Show resolved Hide resolved
Signed-off-by: Kristen Armes <karmes@lyft.com>
…type transformer

Signed-off-by: Kristen Armes <karmes@lyft.com>
@@ -0,0 +1,150 @@
# Copyright Contributors to the Amundsen project.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separated DescriptionMetadata from TableMetadata due to circular import issues with TypeMetadata

Signed-off-by: Kristen Armes <karmes@lyft.com>
Signed-off-by: Kristen Armes <karmes@lyft.com>
…etadata can now get own key/parent key, fixing struct node handling

Signed-off-by: Kristen Armes <karmes@lyft.com>
@kristenarmes kristenarmes changed the title feat: Type metadata classes to represent complex column types feat: Ability to parse complex column types, store in ColumnMetadata, and create graph nodes for them Feb 18, 2022
Signed-off-by: Kristen Armes <karmes@lyft.com>
@boring-cyborg boring-cyborg bot added area:all Related to all the project area:docs labels Feb 18, 2022
databuilder/databuilder/models/table_metadata.py Outdated Show resolved Hide resolved
databuilder/databuilder/models/type_metadata.py Outdated Show resolved Hide resolved
databuilder/databuilder/models/type_metadata.py Outdated Show resolved Hide resolved
databuilder/databuilder/models/type_metadata.py Outdated Show resolved Hide resolved
databuilder/databuilder/models/type_metadata.py Outdated Show resolved Hide resolved
…ame instead of data_type (test and mypy fixes remaining todo)

Signed-off-by: Kristen Armes <karmes@lyft.com>
Signed-off-by: Kristen Armes <karmes@lyft.com>
Signed-off-by: Kristen Armes <karmes@lyft.com>
@dkunitsk dkunitsk marked this pull request as ready for review February 23, 2022 19:23
@dkunitsk dkunitsk requested a review from a team as a code owner February 23, 2022 19:23
@dkunitsk dkunitsk changed the title feat: Ability to parse complex column types, store in ColumnMetadata, and create graph nodes for them feat: Backend support for nested columns Feb 23, 2022
Signed-off-by: Kristen Armes <karmes@lyft.com>
databuilder/databuilder/models/type_metadata.py Outdated Show resolved Hide resolved
databuilder/databuilder/models/type_metadata.py Outdated Show resolved Hide resolved
databuilder/databuilder/models/type_metadata.py Outdated Show resolved Hide resolved
databuilder/databuilder/models/type_metadata.py Outdated Show resolved Hide resolved
databuilder/databuilder/models/type_metadata.py Outdated Show resolved Hide resolved
databuilder/databuilder/models/type_metadata.py Outdated Show resolved Hide resolved
databuilder/databuilder/models/type_metadata.py Outdated Show resolved Hide resolved
Signed-off-by: Kristen Armes <karmes@lyft.com>
databuilder/README.md Outdated Show resolved Hide resolved
databuilder/databuilder/models/type_metadata.py Outdated Show resolved Hide resolved
Signed-off-by: Kristen Armes <karmes@lyft.com>
Copy link

@dkunitsk dkunitsk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very very nice. Love the test coverage also.

@dkunitsk dkunitsk merged commit 9854948 into amundsen-io:main Feb 24, 2022
@kristenarmes kristenarmes deleted the support-complex-column-types-on-backend branch February 24, 2022 19:11
@dkunitsk dkunitsk changed the title feat: Backend support for nested columns feat: Databuilder support for nested columns Feb 24, 2022
ozandogrultan pushed a commit to deliveryhero/amundsen that referenced this pull request Apr 28, 2022
* Type metadata classes to represent complex column types

Signed-off-by: Kristen Armes <karmes@lyft.com>

* Adding a few more type metadata tests

Signed-off-by: Kristen Armes <karmes@lyft.com>

* Adding __eq__ implementation for the different types for easier testing

Signed-off-by: Kristen Armes <karmes@lyft.com>

* Addressing PR comments, bringing description into base class, removing extra StructItem class and using dict for structs instead, and implemented ScalarTypeMetadata to use as the terminal type rather than string

Signed-off-by: Kristen Armes <karmes@lyft.com>

* lint

Signed-off-by: Kristen Armes <karmes@lyft.com>

* mypy

Signed-off-by: Kristen Armes <karmes@lyft.com>

* lint

Signed-off-by: Kristen Armes <karmes@lyft.com>

* Changes to type metadata, adding hive parser, adding generic complex type transformer

Signed-off-by: Kristen Armes <karmes@lyft.com>

* lint

Signed-off-by: Kristen Armes <karmes@lyft.com>

* Updating description metadata imports

Signed-off-by: Kristen Armes <karmes@lyft.com>

* Adding parent and name attributes to child type metadata, base type metadata can now get own key/parent key, fixing struct node handling

Signed-off-by: Kristen Armes <karmes@lyft.com>

* Adding documentation for the transformer

Signed-off-by: Kristen Armes <karmes@lyft.com>

* Changing the type metadata and parser to be created with parent and name instead of data_type (test and mypy fixes remaining todo)

Signed-off-by: Kristen Armes <karmes@lyft.com>

* Fixing tests to work with changes

Signed-off-by: Kristen Armes <karmes@lyft.com>

* mypy fixes

Signed-off-by: Kristen Armes <karmes@lyft.com>

* Addressing PR comments

Signed-off-by: Kristen Armes <karmes@lyft.com>

* Addressing PR comments and fixing up is terminal check handling

Signed-off-by: Kristen Armes <karmes@lyft.com>

* Ceaning up map attributes and removing info from README for now

Signed-off-by: Kristen Armes <karmes@lyft.com>
Signed-off-by: Ozan Dogrultan <ozan.dogrultan@deliveryhero.com>
zacr pushed a commit to SaltIO/amundsen that referenced this pull request May 3, 2022
* Type metadata classes to represent complex column types

Signed-off-by: Kristen Armes <karmes@lyft.com>

* Adding a few more type metadata tests

Signed-off-by: Kristen Armes <karmes@lyft.com>

* Adding __eq__ implementation for the different types for easier testing

Signed-off-by: Kristen Armes <karmes@lyft.com>

* Addressing PR comments, bringing description into base class, removing extra StructItem class and using dict for structs instead, and implemented ScalarTypeMetadata to use as the terminal type rather than string

Signed-off-by: Kristen Armes <karmes@lyft.com>

* lint

Signed-off-by: Kristen Armes <karmes@lyft.com>

* mypy

Signed-off-by: Kristen Armes <karmes@lyft.com>

* lint

Signed-off-by: Kristen Armes <karmes@lyft.com>

* Changes to type metadata, adding hive parser, adding generic complex type transformer

Signed-off-by: Kristen Armes <karmes@lyft.com>

* lint

Signed-off-by: Kristen Armes <karmes@lyft.com>

* Updating description metadata imports

Signed-off-by: Kristen Armes <karmes@lyft.com>

* Adding parent and name attributes to child type metadata, base type metadata can now get own key/parent key, fixing struct node handling

Signed-off-by: Kristen Armes <karmes@lyft.com>

* Adding documentation for the transformer

Signed-off-by: Kristen Armes <karmes@lyft.com>

* Changing the type metadata and parser to be created with parent and name instead of data_type (test and mypy fixes remaining todo)

Signed-off-by: Kristen Armes <karmes@lyft.com>

* Fixing tests to work with changes

Signed-off-by: Kristen Armes <karmes@lyft.com>

* mypy fixes

Signed-off-by: Kristen Armes <karmes@lyft.com>

* Addressing PR comments

Signed-off-by: Kristen Armes <karmes@lyft.com>

* Addressing PR comments and fixing up is terminal check handling

Signed-off-by: Kristen Armes <karmes@lyft.com>

* Ceaning up map attributes and removing info from README for now

Signed-off-by: Kristen Armes <karmes@lyft.com>
Signed-off-by: Zachary Ruiz <zacharyruiz@Zacharys-MacBook-Pro-2.local>
zacr pushed a commit to SaltIO/amundsen that referenced this pull request May 13, 2022
* Type metadata classes to represent complex column types

Signed-off-by: Kristen Armes <karmes@lyft.com>

* Adding a few more type metadata tests

Signed-off-by: Kristen Armes <karmes@lyft.com>

* Adding __eq__ implementation for the different types for easier testing

Signed-off-by: Kristen Armes <karmes@lyft.com>

* Addressing PR comments, bringing description into base class, removing extra StructItem class and using dict for structs instead, and implemented ScalarTypeMetadata to use as the terminal type rather than string

Signed-off-by: Kristen Armes <karmes@lyft.com>

* lint

Signed-off-by: Kristen Armes <karmes@lyft.com>

* mypy

Signed-off-by: Kristen Armes <karmes@lyft.com>

* lint

Signed-off-by: Kristen Armes <karmes@lyft.com>

* Changes to type metadata, adding hive parser, adding generic complex type transformer

Signed-off-by: Kristen Armes <karmes@lyft.com>

* lint

Signed-off-by: Kristen Armes <karmes@lyft.com>

* Updating description metadata imports

Signed-off-by: Kristen Armes <karmes@lyft.com>

* Adding parent and name attributes to child type metadata, base type metadata can now get own key/parent key, fixing struct node handling

Signed-off-by: Kristen Armes <karmes@lyft.com>

* Adding documentation for the transformer

Signed-off-by: Kristen Armes <karmes@lyft.com>

* Changing the type metadata and parser to be created with parent and name instead of data_type (test and mypy fixes remaining todo)

Signed-off-by: Kristen Armes <karmes@lyft.com>

* Fixing tests to work with changes

Signed-off-by: Kristen Armes <karmes@lyft.com>

* mypy fixes

Signed-off-by: Kristen Armes <karmes@lyft.com>

* Addressing PR comments

Signed-off-by: Kristen Armes <karmes@lyft.com>

* Addressing PR comments and fixing up is terminal check handling

Signed-off-by: Kristen Armes <karmes@lyft.com>

* Ceaning up map attributes and removing info from README for now

Signed-off-by: Kristen Armes <karmes@lyft.com>
hansadriaans pushed a commit to DataChefHQ/amundsen that referenced this pull request Jun 30, 2022
* Type metadata classes to represent complex column types

Signed-off-by: Kristen Armes <karmes@lyft.com>

* Adding a few more type metadata tests

Signed-off-by: Kristen Armes <karmes@lyft.com>

* Adding __eq__ implementation for the different types for easier testing

Signed-off-by: Kristen Armes <karmes@lyft.com>

* Addressing PR comments, bringing description into base class, removing extra StructItem class and using dict for structs instead, and implemented ScalarTypeMetadata to use as the terminal type rather than string

Signed-off-by: Kristen Armes <karmes@lyft.com>

* lint

Signed-off-by: Kristen Armes <karmes@lyft.com>

* mypy

Signed-off-by: Kristen Armes <karmes@lyft.com>

* lint

Signed-off-by: Kristen Armes <karmes@lyft.com>

* Changes to type metadata, adding hive parser, adding generic complex type transformer

Signed-off-by: Kristen Armes <karmes@lyft.com>

* lint

Signed-off-by: Kristen Armes <karmes@lyft.com>

* Updating description metadata imports

Signed-off-by: Kristen Armes <karmes@lyft.com>

* Adding parent and name attributes to child type metadata, base type metadata can now get own key/parent key, fixing struct node handling

Signed-off-by: Kristen Armes <karmes@lyft.com>

* Adding documentation for the transformer

Signed-off-by: Kristen Armes <karmes@lyft.com>

* Changing the type metadata and parser to be created with parent and name instead of data_type (test and mypy fixes remaining todo)

Signed-off-by: Kristen Armes <karmes@lyft.com>

* Fixing tests to work with changes

Signed-off-by: Kristen Armes <karmes@lyft.com>

* mypy fixes

Signed-off-by: Kristen Armes <karmes@lyft.com>

* Addressing PR comments

Signed-off-by: Kristen Armes <karmes@lyft.com>

* Addressing PR comments and fixing up is terminal check handling

Signed-off-by: Kristen Armes <karmes@lyft.com>

* Ceaning up map attributes and removing info from README for now

Signed-off-by: Kristen Armes <karmes@lyft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:all Related to all the project area:databuilder From databuilder folder
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants