-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add test for consume name to name mapping #867
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we need to compile the pipeline to actually test this? I believe it will fail then.
continue | ||
|
||
if key in spec_mapping: | ||
mapping[value] = Field(name=value, type=mapping.pop(key).type) | ||
if operations_column_name in operations_schema: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are able to add a check for addtionalProperties
here, and ignore this exception if additionalProperties
are allowed. However, in that case we have to change the string mapping to a string: pa.DataType
mapping to avoid issues later on. The ComponentOp
would need to access the dataset schema.
I think it might be cleaner to implement this behaviour before we are initialising the ComponentOp
, infer the consumes, validate and fix it (if needed). The ComponentOp
will be initialised with a valid consumes definition afterwards.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, see my comment below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @mrchtr! I think we can indeed refactor a bit more to split the functionality more cleanly.
@@ -145,13 +146,18 @@ def __init__( | |||
cache: t.Optional[bool] = True, | |||
resources: t.Optional[Resources] = None, | |||
component_dir: t.Optional[Path] = None, | |||
dataset_fields: t.Optional[t.Mapping[str, Field]] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if we shouldn't split all this consumes
/ produces
logic into a separate class, eg. a ComponentOpFactory
. I think it could help with maintainability and testability to keep the ComponentOp
limited to a representation of the operation, close to a dataclass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes sense. I'll create an issue for it that we can tackle separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, but would prefer to do it as a follow-up right away. I think this change tips it to the point where this is needed, since we're starting to mingle the operation and dataset.
continue | ||
|
||
if key in spec_mapping: | ||
mapping[value] = Field(name=value, type=mapping.pop(key).type) | ||
if operations_column_name in operations_schema: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, see my comment below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @mrchtr.
Looks good, but let's make sure all the naming is consistent. I left some suggestions.
self._operation_consumes: t.Optional[t.Mapping[str, Field]] = None | ||
self._consumes_from_dataset: t.Optional[t.Mapping[str, Field]] = None | ||
self._operation_produces: t.Optional[t.Mapping[str, Field]] = None | ||
self._produces_to_dataset: t.Optional[t.Mapping[str, Field]] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like these names 👍
@@ -145,13 +146,18 @@ def __init__( | |||
cache: t.Optional[bool] = True, | |||
resources: t.Optional[Resources] = None, | |||
component_dir: t.Optional[Path] = None, | |||
dataset_fields: t.Optional[t.Mapping[str, Field]] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, but would prefer to do it as a follow-up right away. I think this change tips it to the point where this is needed, since we're starting to mingle the operation and dataset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @mrchtr!
Fix #863