Enhancing Kedro Support for Nested Folder Structures #3891
Replies: 13 comments
-
Thanks @Gabriel2409 for the detailed writeup! Does #2701 help with your feature request? |
Beta Was this translation helpful? Give feedback.
-
See also #2543 |
Beta Was this translation helpful? Give feedback.
-
Hi @astrojuanlu Maybe I did not understand them correctly but it seems these issues are related to overwriting the default template. I hope this clarifies my request. |
Beta Was this translation helpful? Give feedback.
-
Thank you for the clear proposal @Gabriel2409. I think it all makes a lot of sense. The only part I'm not 100% sure about is:
To me personally, this makes total sense and is probably a necessary restriction to avoid confusing for e.g. pipeline parameter files. But my uncertainty comes from not knowing enough about how most users structure their pipelines. Would this break existing pipeline set up, or is this nested structure something users haven't generally implemented because Kedro doesn't support it so well? I'd love to get some more thoughts on this. Tagging: @marrrcin, @Galileo-Galilei, @deepyaman, @yetudada |
Beta Was this translation helpful? Give feedback.
-
This would be a super nice feature to have! I was experimenting with the same idea (I use nested folders regularly to bring additional structure in the project), but I haven’t gotten nearly as far as you did. @merelcht as long as this is just an extension of the current way of creating pipelines, i.e. not using subfolder behaves exactly the same, this is not a breaking change. User can stil create their own custom file structure without using |
Beta Was this translation helpful? Give feedback.
-
Sorry that I didn't see this previously! I completely agree that this is a very useful feature to have. If I recall correctly (it's been a few years!), when I used to lead development of CustomerOne (a McKinsey-internal asset), we had extended Kedro to support nested hierarchy, because of some of the reasons you mention. On the whole, I'd personally be very supportive of this.
I personally would rather not restrict this. I think you may artificially need to come up with different names for pipelines that are similar in function, just in different top-level packages. I think the parameters/catalog hierarchies could be created to deal with this pretty easily. |
Beta Was this translation helpful? Give feedback.
-
Hello everyone, @deepyaman Here is my reasonning on the proposed limitations: Parameter files flat structurekedro now has a flat structure for the parameter files: when creating a pipeline, a file Registered name in find_pipelinesIf we want to support different names, we must change I see 3 possibilities for the final design:
What do you all think? |
Beta Was this translation helpful? Give feedback.
-
Hi @Gabriel2409, thanks so much for your further comments. We have discussed this issue as a team and these were some of the points that came up:
We do like this proposal as a team, but would only invest in having it fully implemented if there's enough interest from users. With regards to that, would you be able to help with that and gauge interest through our user-research channel on Slack? We can connect there and I can explain more 🙂 |
Beta Was this translation helpful? Give feedback.
-
@merelcht Thank you for your reply. I am ok with continuing the discussion on slack. Could you please share the link? |
Beta Was this translation helpful? Give feedback.
-
Yes, it's https://slack.kedro.org/. The channel for user research is called #user-research |
Beta Was this translation helpful? Give feedback.
-
Hi @Gabriel2409 it's been a while. Are you still interested in this topic? |
Beta Was this translation helpful? Give feedback.
-
Hi @merelcht I am still interested in the feature but I don't really have time to work on it for the moment. |
Beta Was this translation helpful? Give feedback.
-
No worries! In that case, I suggest moving this to github discussions so interested people can continue the discussion there 🙂 |
Beta Was this translation helpful? Give feedback.
-
Description
This issue stems from #3096 and proposes an enhancement to Kedro to better support working with nested folder structures. While addressing this, we need to ensure that pipeline names remain unique independently of the folder they are in, especially given the new flat parameter structure introduced in
0.18.13
.Context
In big projects, pipelines tend to be grouped into their own folder. However, kedro cli and the new find_pipelines function currently don't support it, making going from flat to nested structure a bit painful sometimes.
Possible Implementation
I've outlined the proposed implementation for this feature in the bullet points below, and I'm eager to hear your thoughts and suggestions. PR is available here: #3106 . Thank you very much.
Create Pipeline
kedro pipeline create myfolder.mypipeline
should:src/<project_name>/pipelines/myfolder/mypipeline
.src/tests/pipelines/myfolder/mypipeline
.mypipeline.yml
inconf/base/parameters_mypipeline.yml
to adhere to the new flat structure. Note thatmyfolder
does not appear in the name of the file to avoid having very long file names.kedro pipeline create myfolder.mypipeline
followed bykedro pipeline create myotherfolder.mypipeline
should result in an errorkedro pipeline create parent_pipeline
followed bykedro pipeline create parent_pipeline.child_pipeline
to work. Running the commands in the opposite order should work as well. Note that on this specific point, we could also force all the pipelines to be at the same level in the hierarchy.Delete Pipeline
kedro pipeline delete myfolder.mypipeline
should delete the pipeline, parameter, and test files as before. However it should fail if a child pipeline is detected.For example, running
kedro pipeline create myfolder.mypipeline
, thenkedro pipeline create myfolder.mypipeline.musubpipeline
and thenkedro pipeline delete myfolder.mypipeline
should fail and ask the user to first deletemyfolder.mypipeline.musubpipeline
Integration with find_pipelines
find_pipelines
should support nested structures, with pipeline names excluding parent folders to avoid excessively long commands. If a pipeline name appears twice, it should raise an error.kedro pipeline create myfolder.child_pipeline
and then manually creating a pipeline inanother_folder.child_pipeline
should trigger a failure infind_pipelines
. Note that this is an extra security as it should already be impossible to create said pipeline withkedro pipeline create another_folder.child_pipeline
Beta Was this translation helpful? Give feedback.
All reactions