Add test for consume name to name mapping #867

mrchtr · 2024-02-20T13:55:22Z

RobbeSneyders

Don't we need to compile the pipeline to actually test this? I believe it will fail then.

mrchtr · 2024-02-21T08:35:05Z

src/fondant/core/component_spec.py

                continue

-            if key in spec_mapping:
-                mapping[value] = Field(name=value, type=mapping.pop(key).type)
+            if operations_column_name in operations_schema:


We are able to add a check for addtionalProperties here, and ignore this exception if additionalProperties are allowed. However, in that case we have to change the string mapping to a string: pa.DataType mapping to avoid issues later on. The ComponentOp would need to access the dataset schema.
I think it might be cleaner to implement this behaviour before we are initialising the ComponentOp, infer the consumes, validate and fix it (if needed). The ComponentOp will be initialised with a valid consumes definition afterwards.

Agree, see my comment below.

RobbeSneyders

Thanks @mrchtr! I think we can indeed refactor a bit more to split the functionality more cleanly.

tests/pipeline/test_pipeline.py

RobbeSneyders · 2024-02-22T09:42:35Z

src/fondant/pipeline/pipeline.py

@@ -145,13 +146,18 @@ def __init__(
        cache: t.Optional[bool] = True,
        resources: t.Optional[Resources] = None,
        component_dir: t.Optional[Path] = None,
+        dataset_fields: t.Optional[t.Mapping[str, Field]] = None,


I'm wondering if we shouldn't split all this consumes / produces logic into a separate class, eg. a ComponentOpFactory. I think it could help with maintainability and testability to keep the ComponentOp limited to a representation of the operation, close to a dataclass.

I think it makes sense. I'll create an issue for it that we can tackle separately.

Ok, but would prefer to do it as a follow-up right away. I think this change tips it to the point where this is needed, since we're starting to mingle the operation and dataset.

RobbeSneyders · 2024-02-22T09:43:58Z

src/fondant/core/component_spec.py

                continue

-            if key in spec_mapping:
-                mapping[value] = Field(name=value, type=mapping.pop(key).type)
+            if operations_column_name in operations_schema:


Agree, see my comment below.

RobbeSneyders

Thanks @mrchtr.

Looks good, but let's make sure all the naming is consistent. I left some suggestions.

RobbeSneyders · 2024-02-23T11:02:21Z

src/fondant/core/component_spec.py

+        self._operation_consumes: t.Optional[t.Mapping[str, Field]] = None
+        self._consumes_from_dataset: t.Optional[t.Mapping[str, Field]] = None
+        self._operation_produces: t.Optional[t.Mapping[str, Field]] = None
+        self._produces_to_dataset: t.Optional[t.Mapping[str, Field]] = None


I like these names 👍

src/fondant/core/component_spec.py

RobbeSneyders · 2024-02-23T11:09:46Z

src/fondant/pipeline/pipeline.py

@@ -145,13 +146,18 @@ def __init__(
        cache: t.Optional[bool] = True,
        resources: t.Optional[Resources] = None,
        component_dir: t.Optional[Path] = None,
+        dataset_fields: t.Optional[t.Mapping[str, Field]] = None,


Ok, but would prefer to do it as a follow-up right away. I think this change tips it to the point where this is needed, since we're starting to mingle the operation and dataset.

src/fondant/core/component_spec.py

RobbeSneyders

Thanks @mrchtr!

mrchtr added 4 commits February 20, 2024 12:01

Renaming inner and outer consumes/produces

f0de86a

Renaming inner and outer consumes/produces

a6266b5

Renaming inner and outer consumes/produces

6d35126

Add test for consume name to name mapping

4b73734

RobbeSneyders reviewed Feb 20, 2024

View reviewed changes

Enable string to string name mapping when additional fields true

c1a271c

mrchtr commented Feb 21, 2024

View reviewed changes

Fixes test case

e7fa36a

RobbeSneyders reviewed Feb 22, 2024

View reviewed changes

Addressing comments

42c3b3c

mrchtr mentioned this pull request Feb 22, 2024

Add ComponentOpFactory #874

Open

mrchtr changed the base branch from feature/renaming-inner-outer-consumes-produces to main February 22, 2024 13:53

mrchtr mentioned this pull request Feb 22, 2024

Renaming inner and outer consumes/produces #865

Closed

RobbeSneyders reviewed Feb 23, 2024

View reviewed changes

Addressing comments

2cefe03

RobbeSneyders approved these changes Feb 27, 2024

View reviewed changes

RobbeSneyders merged commit 88c6ea8 into main Feb 27, 2024
8 of 9 checks passed

RobbeSneyders deleted the feature/consumes-field-name-mapping branch February 27, 2024 16:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add test for consume name to name mapping #867

Add test for consume name to name mapping #867

mrchtr commented Feb 20, 2024

RobbeSneyders left a comment

mrchtr Feb 21, 2024

RobbeSneyders Feb 22, 2024

RobbeSneyders left a comment

RobbeSneyders Feb 22, 2024

mrchtr Feb 22, 2024

RobbeSneyders Feb 23, 2024

RobbeSneyders Feb 22, 2024

RobbeSneyders left a comment

RobbeSneyders Feb 23, 2024

RobbeSneyders Feb 23, 2024

RobbeSneyders left a comment

Add test for consume name to name mapping #867

Add test for consume name to name mapping #867

Conversation

mrchtr commented Feb 20, 2024

RobbeSneyders left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RobbeSneyders left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RobbeSneyders left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RobbeSneyders left a comment

Choose a reason for hiding this comment