-
Notifications
You must be signed in to change notification settings - Fork 967
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Databuilder support for nested columns #1695
Merged
dkunitsk
merged 18 commits into
amundsen-io:main
from
kristenarmes:support-complex-column-types-on-backend
Feb 24, 2022
Merged
Changes from all commits
Commits
Show all changes
18 commits
Select commit
Hold shift + click to select a range
b3c77b0
Type metadata classes to represent complex column types
kristenarmes 0c3072d
Adding a few more type metadata tests
kristenarmes ece1376
Adding __eq__ implementation for the different types for easier testing
kristenarmes fa3a6de
Addressing PR comments, bringing description into base class, removin…
kristenarmes c8c6cfd
lint
kristenarmes 31ced31
mypy
kristenarmes 8c2e7ad
lint
kristenarmes 215d037
Changes to type metadata, adding hive parser, adding generic complex …
kristenarmes 1b73a15
lint
kristenarmes b9389c6
Updating description metadata imports
kristenarmes 5aa6cf5
Adding parent and name attributes to child type metadata, base type m…
kristenarmes be6b937
Adding documentation for the transformer
kristenarmes 814ced2
Changing the type metadata and parser to be created with parent and n…
kristenarmes 21139b9
Fixing tests to work with changes
kristenarmes ed2f949
mypy fixes
kristenarmes 9ba71f7
Addressing PR comments
kristenarmes 29f4e6f
Addressing PR comments and fixing up is terminal check handling
kristenarmes 62cc060
Ceaning up map attributes and removing info from README for now
kristenarmes File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,158 @@ | ||
# Copyright Contributors to the Amundsen project. | ||
# SPDX-License-Identifier: Apache-2.0 | ||
|
||
from typing import ( | ||
Any, Iterator, Optional, Union, | ||
) | ||
|
||
from databuilder.models.atlas_entity import AtlasEntity | ||
from databuilder.models.atlas_relationship import AtlasRelationship | ||
from databuilder.models.atlas_serializable import AtlasSerializable | ||
from databuilder.models.graph_node import GraphNode | ||
from databuilder.models.graph_relationship import GraphRelationship | ||
from databuilder.models.graph_serializable import GraphSerializable | ||
|
||
DESCRIPTION_NODE_LABEL_VAL = 'Description' | ||
DESCRIPTION_NODE_LABEL = DESCRIPTION_NODE_LABEL_VAL | ||
kristenarmes marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
|
||
class DescriptionMetadata(GraphSerializable, AtlasSerializable): | ||
DESCRIPTION_NODE_LABEL = DESCRIPTION_NODE_LABEL_VAL | ||
PROGRAMMATIC_DESCRIPTION_NODE_LABEL = 'Programmatic_Description' | ||
DESCRIPTION_KEY_FORMAT = '{description}' | ||
DESCRIPTION_TEXT = 'description' | ||
DESCRIPTION_SOURCE = 'description_source' | ||
|
||
DESCRIPTION_RELATION_TYPE = 'DESCRIPTION' | ||
INVERSE_DESCRIPTION_RELATION_TYPE = 'DESCRIPTION_OF' | ||
|
||
# The default editable source. | ||
DEFAULT_SOURCE = "description" | ||
|
||
def __init__(self, | ||
text: Optional[str], | ||
source: str = DEFAULT_SOURCE, | ||
description_key: Optional[str] = None, | ||
start_label: Optional[str] = None, # Table, Column, Schema, Subtype | ||
start_key: Optional[str] = None, | ||
): | ||
""" | ||
:param source: The unique source of what is populating this description. | ||
:param text: the description text. Markdown supported. | ||
""" | ||
self.source = source | ||
self.text = text | ||
# There are so many dependencies on Description node, that it is probably easier to just separate the rest out. | ||
if self.source == self.DEFAULT_SOURCE: | ||
self.label = self.DESCRIPTION_NODE_LABEL | ||
else: | ||
self.label = self.PROGRAMMATIC_DESCRIPTION_NODE_LABEL | ||
|
||
self.start_label = start_label | ||
self.start_key = start_key | ||
self.description_key = description_key or self.get_description_default_key(start_key) | ||
|
||
self._node_iter = self._create_node_iterator() | ||
self._relation_iter = self._create_relation_iterator() | ||
|
||
def __eq__(self, other: Any) -> bool: | ||
if isinstance(other, DescriptionMetadata): | ||
return (self.text == other.text and | ||
self.source == other.source and | ||
self.description_key == other.description_key and | ||
self.start_label == other.start_label and | ||
self.start_key == self.start_key) | ||
return False | ||
|
||
@staticmethod | ||
def create_description_metadata(text: Union[None, str], | ||
source: Optional[str] = DEFAULT_SOURCE, | ||
description_key: Optional[str] = None, | ||
start_label: Optional[str] = None, # Table, Column, Schema | ||
start_key: Optional[str] = None, | ||
) -> Optional['DescriptionMetadata']: | ||
# We do not want to create a node if there is no description text! | ||
if text is None: | ||
return None | ||
description_node = DescriptionMetadata(text=text, | ||
source=source or DescriptionMetadata.DEFAULT_SOURCE, | ||
description_key=description_key, | ||
start_label=start_label, | ||
start_key=start_key) | ||
return description_node | ||
|
||
def get_description_id(self) -> str: | ||
if self.source == self.DEFAULT_SOURCE: | ||
return "_description" | ||
else: | ||
return "_" + self.source + "_description" | ||
|
||
def get_description_default_key(self, start_key: Optional[str]) -> Optional[str]: | ||
return f'{start_key}/{self.get_description_id()}' if start_key else None | ||
|
||
def get_node(self, node_key: str) -> GraphNode: | ||
node = GraphNode( | ||
key=node_key, | ||
label=self.label, | ||
attributes={ | ||
DescriptionMetadata.DESCRIPTION_SOURCE: self.source, | ||
DescriptionMetadata.DESCRIPTION_TEXT: self.text | ||
} | ||
) | ||
return node | ||
|
||
def get_relation(self, | ||
start_node: str, | ||
start_key: str, | ||
end_key: str, | ||
) -> GraphRelationship: | ||
relationship = GraphRelationship( | ||
start_label=start_node, | ||
start_key=start_key, | ||
end_label=self.label, | ||
end_key=end_key, | ||
type=DescriptionMetadata.DESCRIPTION_RELATION_TYPE, | ||
reverse_type=DescriptionMetadata.INVERSE_DESCRIPTION_RELATION_TYPE, | ||
attributes={} | ||
) | ||
return relationship | ||
|
||
def create_next_node(self) -> Optional[GraphNode]: | ||
# return the string representation of the data | ||
try: | ||
return next(self._node_iter) | ||
except StopIteration: | ||
return None | ||
|
||
def create_next_relation(self) -> Optional[GraphRelationship]: | ||
try: | ||
return next(self._relation_iter) | ||
except StopIteration: | ||
return None | ||
|
||
def _create_node_iterator(self) -> Iterator[GraphNode]: | ||
if not self.description_key: | ||
raise Exception('Required description node key cannot be None') | ||
yield self.get_node(self.description_key) | ||
|
||
def _create_relation_iterator(self) -> Iterator[GraphRelationship]: | ||
if not self.start_label: | ||
raise Exception('Required relation start node label cannot be None') | ||
if not self.start_key: | ||
raise Exception('Required relation start key cannot be None') | ||
if not self.description_key: | ||
raise Exception('Required relation end key cannot be None') | ||
yield self.get_relation( | ||
start_node=self.start_label, | ||
start_key=self.start_key, | ||
end_key=self.description_key | ||
) | ||
|
||
def create_next_atlas_relation(self) -> Union[AtlasRelationship, None]: | ||
pass | ||
|
||
def create_next_atlas_entity(self) -> Union[AtlasEntity, None]: | ||
pass | ||
|
||
def __repr__(self) -> str: | ||
return f'DescriptionMetadata({self.source!r}, {self.text!r})' |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Separated
DescriptionMetadata
fromTableMetadata
due to circular import issues withTypeMetadata