-
Notifications
You must be signed in to change notification settings - Fork 966
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Databuilder support for nested columns #1695
feat: Databuilder support for nested columns #1695
Conversation
Signed-off-by: Kristen Armes <karmes@lyft.com>
Signed-off-by: Kristen Armes <karmes@lyft.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No major problems, looks great. A few tweaks I'd suggest, let's discuss IRL.
Signed-off-by: Kristen Armes <karmes@lyft.com>
…g extra StructItem class and using dict for structs instead, and implemented ScalarTypeMetadata to use as the terminal type rather than string Signed-off-by: Kristen Armes <karmes@lyft.com>
…type transformer Signed-off-by: Kristen Armes <karmes@lyft.com>
@@ -0,0 +1,150 @@ | |||
# Copyright Contributors to the Amundsen project. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Separated DescriptionMetadata
from TableMetadata
due to circular import issues with TypeMetadata
databuilder/databuilder/transformer/complex_type_transformer.py
Outdated
Show resolved
Hide resolved
Signed-off-by: Kristen Armes <karmes@lyft.com>
…etadata can now get own key/parent key, fixing struct node handling Signed-off-by: Kristen Armes <karmes@lyft.com>
Signed-off-by: Kristen Armes <karmes@lyft.com>
…ame instead of data_type (test and mypy fixes remaining todo) Signed-off-by: Kristen Armes <karmes@lyft.com>
Signed-off-by: Kristen Armes <karmes@lyft.com>
Signed-off-by: Kristen Armes <karmes@lyft.com>
Signed-off-by: Kristen Armes <karmes@lyft.com>
Signed-off-by: Kristen Armes <karmes@lyft.com>
Signed-off-by: Kristen Armes <karmes@lyft.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very very nice. Love the test coverage also.
* Type metadata classes to represent complex column types Signed-off-by: Kristen Armes <karmes@lyft.com> * Adding a few more type metadata tests Signed-off-by: Kristen Armes <karmes@lyft.com> * Adding __eq__ implementation for the different types for easier testing Signed-off-by: Kristen Armes <karmes@lyft.com> * Addressing PR comments, bringing description into base class, removing extra StructItem class and using dict for structs instead, and implemented ScalarTypeMetadata to use as the terminal type rather than string Signed-off-by: Kristen Armes <karmes@lyft.com> * lint Signed-off-by: Kristen Armes <karmes@lyft.com> * mypy Signed-off-by: Kristen Armes <karmes@lyft.com> * lint Signed-off-by: Kristen Armes <karmes@lyft.com> * Changes to type metadata, adding hive parser, adding generic complex type transformer Signed-off-by: Kristen Armes <karmes@lyft.com> * lint Signed-off-by: Kristen Armes <karmes@lyft.com> * Updating description metadata imports Signed-off-by: Kristen Armes <karmes@lyft.com> * Adding parent and name attributes to child type metadata, base type metadata can now get own key/parent key, fixing struct node handling Signed-off-by: Kristen Armes <karmes@lyft.com> * Adding documentation for the transformer Signed-off-by: Kristen Armes <karmes@lyft.com> * Changing the type metadata and parser to be created with parent and name instead of data_type (test and mypy fixes remaining todo) Signed-off-by: Kristen Armes <karmes@lyft.com> * Fixing tests to work with changes Signed-off-by: Kristen Armes <karmes@lyft.com> * mypy fixes Signed-off-by: Kristen Armes <karmes@lyft.com> * Addressing PR comments Signed-off-by: Kristen Armes <karmes@lyft.com> * Addressing PR comments and fixing up is terminal check handling Signed-off-by: Kristen Armes <karmes@lyft.com> * Ceaning up map attributes and removing info from README for now Signed-off-by: Kristen Armes <karmes@lyft.com> Signed-off-by: Ozan Dogrultan <ozan.dogrultan@deliveryhero.com>
* Type metadata classes to represent complex column types Signed-off-by: Kristen Armes <karmes@lyft.com> * Adding a few more type metadata tests Signed-off-by: Kristen Armes <karmes@lyft.com> * Adding __eq__ implementation for the different types for easier testing Signed-off-by: Kristen Armes <karmes@lyft.com> * Addressing PR comments, bringing description into base class, removing extra StructItem class and using dict for structs instead, and implemented ScalarTypeMetadata to use as the terminal type rather than string Signed-off-by: Kristen Armes <karmes@lyft.com> * lint Signed-off-by: Kristen Armes <karmes@lyft.com> * mypy Signed-off-by: Kristen Armes <karmes@lyft.com> * lint Signed-off-by: Kristen Armes <karmes@lyft.com> * Changes to type metadata, adding hive parser, adding generic complex type transformer Signed-off-by: Kristen Armes <karmes@lyft.com> * lint Signed-off-by: Kristen Armes <karmes@lyft.com> * Updating description metadata imports Signed-off-by: Kristen Armes <karmes@lyft.com> * Adding parent and name attributes to child type metadata, base type metadata can now get own key/parent key, fixing struct node handling Signed-off-by: Kristen Armes <karmes@lyft.com> * Adding documentation for the transformer Signed-off-by: Kristen Armes <karmes@lyft.com> * Changing the type metadata and parser to be created with parent and name instead of data_type (test and mypy fixes remaining todo) Signed-off-by: Kristen Armes <karmes@lyft.com> * Fixing tests to work with changes Signed-off-by: Kristen Armes <karmes@lyft.com> * mypy fixes Signed-off-by: Kristen Armes <karmes@lyft.com> * Addressing PR comments Signed-off-by: Kristen Armes <karmes@lyft.com> * Addressing PR comments and fixing up is terminal check handling Signed-off-by: Kristen Armes <karmes@lyft.com> * Ceaning up map attributes and removing info from README for now Signed-off-by: Kristen Armes <karmes@lyft.com> Signed-off-by: Zachary Ruiz <zacharyruiz@Zacharys-MacBook-Pro-2.local>
* Type metadata classes to represent complex column types Signed-off-by: Kristen Armes <karmes@lyft.com> * Adding a few more type metadata tests Signed-off-by: Kristen Armes <karmes@lyft.com> * Adding __eq__ implementation for the different types for easier testing Signed-off-by: Kristen Armes <karmes@lyft.com> * Addressing PR comments, bringing description into base class, removing extra StructItem class and using dict for structs instead, and implemented ScalarTypeMetadata to use as the terminal type rather than string Signed-off-by: Kristen Armes <karmes@lyft.com> * lint Signed-off-by: Kristen Armes <karmes@lyft.com> * mypy Signed-off-by: Kristen Armes <karmes@lyft.com> * lint Signed-off-by: Kristen Armes <karmes@lyft.com> * Changes to type metadata, adding hive parser, adding generic complex type transformer Signed-off-by: Kristen Armes <karmes@lyft.com> * lint Signed-off-by: Kristen Armes <karmes@lyft.com> * Updating description metadata imports Signed-off-by: Kristen Armes <karmes@lyft.com> * Adding parent and name attributes to child type metadata, base type metadata can now get own key/parent key, fixing struct node handling Signed-off-by: Kristen Armes <karmes@lyft.com> * Adding documentation for the transformer Signed-off-by: Kristen Armes <karmes@lyft.com> * Changing the type metadata and parser to be created with parent and name instead of data_type (test and mypy fixes remaining todo) Signed-off-by: Kristen Armes <karmes@lyft.com> * Fixing tests to work with changes Signed-off-by: Kristen Armes <karmes@lyft.com> * mypy fixes Signed-off-by: Kristen Armes <karmes@lyft.com> * Addressing PR comments Signed-off-by: Kristen Armes <karmes@lyft.com> * Addressing PR comments and fixing up is terminal check handling Signed-off-by: Kristen Armes <karmes@lyft.com> * Ceaning up map attributes and removing info from README for now Signed-off-by: Kristen Armes <karmes@lyft.com>
* Type metadata classes to represent complex column types Signed-off-by: Kristen Armes <karmes@lyft.com> * Adding a few more type metadata tests Signed-off-by: Kristen Armes <karmes@lyft.com> * Adding __eq__ implementation for the different types for easier testing Signed-off-by: Kristen Armes <karmes@lyft.com> * Addressing PR comments, bringing description into base class, removing extra StructItem class and using dict for structs instead, and implemented ScalarTypeMetadata to use as the terminal type rather than string Signed-off-by: Kristen Armes <karmes@lyft.com> * lint Signed-off-by: Kristen Armes <karmes@lyft.com> * mypy Signed-off-by: Kristen Armes <karmes@lyft.com> * lint Signed-off-by: Kristen Armes <karmes@lyft.com> * Changes to type metadata, adding hive parser, adding generic complex type transformer Signed-off-by: Kristen Armes <karmes@lyft.com> * lint Signed-off-by: Kristen Armes <karmes@lyft.com> * Updating description metadata imports Signed-off-by: Kristen Armes <karmes@lyft.com> * Adding parent and name attributes to child type metadata, base type metadata can now get own key/parent key, fixing struct node handling Signed-off-by: Kristen Armes <karmes@lyft.com> * Adding documentation for the transformer Signed-off-by: Kristen Armes <karmes@lyft.com> * Changing the type metadata and parser to be created with parent and name instead of data_type (test and mypy fixes remaining todo) Signed-off-by: Kristen Armes <karmes@lyft.com> * Fixing tests to work with changes Signed-off-by: Kristen Armes <karmes@lyft.com> * mypy fixes Signed-off-by: Kristen Armes <karmes@lyft.com> * Addressing PR comments Signed-off-by: Kristen Armes <karmes@lyft.com> * Addressing PR comments and fixing up is terminal check handling Signed-off-by: Kristen Armes <karmes@lyft.com> * Ceaning up map attributes and removing info from README for now Signed-off-by: Kristen Armes <karmes@lyft.com>
Signed-off-by: Kristen Armes karmes@lyft.com
Summary of Changes
This PR adds databuilder support for nested columns by introducing a new TypeMetadata base class, along with Map/Array/Struct/Scalar subclasses (with opportunity to add more as needed). Columns will have an empty
TypeMetadata
by default and Amundsen will not require it for the time being. To eliminate breaking changes, setting TypeMetadata is done via theComplexTypeTransformer
. With time this logic can easily be integrated into extractors themselves.The TypeMetadata classes are intended to be general and reusable. Syntax-specific parsers will need to be added per-database for converting type strings into structured objects. In this PR we have implemented a Hive parser that uses
pyparsing
.After this we'll work on the APIs and then the frontend.
Tests
Added test cases in
test_table_metadata.py
,test_type_metadata.py
,test_complex_type_transformer.py
,test_hive_complex_type_parser.py
Documentation
Added details about the
ComplexTypeTransformer
indatabuilder/README.md
CheckList
Make sure you have checked all steps below to ensure a timely review.