Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add test for consume name to name mapping #867

Merged
merged 8 commits into from
Feb 27, 2024

Conversation

mrchtr
Copy link
Contributor

@mrchtr mrchtr commented Feb 20, 2024

Fix #863

Copy link
Member

@RobbeSneyders RobbeSneyders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we need to compile the pipeline to actually test this? I believe it will fail then.

continue

if key in spec_mapping:
mapping[value] = Field(name=value, type=mapping.pop(key).type)
if operations_column_name in operations_schema:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are able to add a check for addtionalProperties here, and ignore this exception if additionalProperties are allowed. However, in that case we have to change the string mapping to a string: pa.DataType mapping to avoid issues later on. The ComponentOp would need to access the dataset schema.
I think it might be cleaner to implement this behaviour before we are initialising the ComponentOp, infer the consumes, validate and fix it (if needed). The ComponentOp will be initialised with a valid consumes definition afterwards.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, see my comment below.

Copy link
Member

@RobbeSneyders RobbeSneyders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mrchtr! I think we can indeed refactor a bit more to split the functionality more cleanly.

tests/pipeline/test_pipeline.py Outdated Show resolved Hide resolved
tests/pipeline/test_pipeline.py Outdated Show resolved Hide resolved
tests/pipeline/test_pipeline.py Outdated Show resolved Hide resolved
tests/pipeline/test_pipeline.py Outdated Show resolved Hide resolved
@@ -145,13 +146,18 @@ def __init__(
cache: t.Optional[bool] = True,
resources: t.Optional[Resources] = None,
component_dir: t.Optional[Path] = None,
dataset_fields: t.Optional[t.Mapping[str, Field]] = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we shouldn't split all this consumes / produces logic into a separate class, eg. a ComponentOpFactory. I think it could help with maintainability and testability to keep the ComponentOp limited to a representation of the operation, close to a dataclass.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense. I'll create an issue for it that we can tackle separately.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, but would prefer to do it as a follow-up right away. I think this change tips it to the point where this is needed, since we're starting to mingle the operation and dataset.

continue

if key in spec_mapping:
mapping[value] = Field(name=value, type=mapping.pop(key).type)
if operations_column_name in operations_schema:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, see my comment below.

@mrchtr mrchtr mentioned this pull request Feb 22, 2024
@mrchtr mrchtr changed the base branch from feature/renaming-inner-outer-consumes-produces to main February 22, 2024 13:53
Copy link
Member

@RobbeSneyders RobbeSneyders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mrchtr.

Looks good, but let's make sure all the naming is consistent. I left some suggestions.

Comment on lines +356 to +359
self._operation_consumes: t.Optional[t.Mapping[str, Field]] = None
self._consumes_from_dataset: t.Optional[t.Mapping[str, Field]] = None
self._operation_produces: t.Optional[t.Mapping[str, Field]] = None
self._produces_to_dataset: t.Optional[t.Mapping[str, Field]] = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like these names 👍

src/fondant/core/component_spec.py Outdated Show resolved Hide resolved
src/fondant/core/component_spec.py Outdated Show resolved Hide resolved
src/fondant/core/component_spec.py Outdated Show resolved Hide resolved
src/fondant/core/component_spec.py Outdated Show resolved Hide resolved
src/fondant/core/component_spec.py Outdated Show resolved Hide resolved
@@ -145,13 +146,18 @@ def __init__(
cache: t.Optional[bool] = True,
resources: t.Optional[Resources] = None,
component_dir: t.Optional[Path] = None,
dataset_fields: t.Optional[t.Mapping[str, Field]] = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, but would prefer to do it as a follow-up right away. I think this change tips it to the point where this is needed, since we're starting to mingle the operation and dataset.

src/fondant/core/component_spec.py Outdated Show resolved Hide resolved
Copy link
Member

@RobbeSneyders RobbeSneyders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mrchtr!

@RobbeSneyders RobbeSneyders merged commit 88c6ea8 into main Feb 27, 2024
8 of 9 checks passed
@RobbeSneyders RobbeSneyders deleted the feature/consumes-field-name-mapping branch February 27, 2024 16:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow mapping of dataset field names with additionalProperties: true
2 participants