Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should find_pipelines work with nested folder structure? #3096

Closed
Gabriel2409 opened this issue Sep 29, 2023 · 4 comments
Closed

Should find_pipelines work with nested folder structure? #3096

Gabriel2409 opened this issue Sep 29, 2023 · 4 comments
Labels
Issue: Feature Request New feature or improvement to existing feature

Comments

@Gabriel2409
Copy link

Gabriel2409 commented Sep 29, 2023

Description

The new find_pipelines feature is really great and works very well in simple cases.
I would like to propose to extend the functionality so that it also works for nested structure

Context

In most of our kedro projects, we resort to creating subfolders of the src/pipelines folder, for ex

src/myproject/pipelines/
├── __init__.py
├── myfolder
│   └── mysubfolder
│       ├── README.md
│       ├── __init__.py
│       ├── nodes.py
│       └── pipeline.py

This is incompatible with find_pipelines which means we have to register the pipelines manually. It works but I feel like this is extra work that could be avoided.

Possible Implementation

Fortunately I think it is easy to implement. We just need to replace the following lines

https://github.com/kedro-org/kedro/blob/aa3e2106f1491e83004a19a87f1d4e839476a33a/kedro/framework/project/__init__.py#L347C5-L355C75

with something like:

for pipeline_dir in pipelines_package.glob("**/"): # navigates dirs and subdirs
	if not pipeline_dir.is_dir():
		continue
	# new check, we want to ignore folders that don't include a pipeline.py file
	if not (pipeline_dir / "pipeline.py").is_file(): # maybe it is better to check for __init__.py
		continue

	...
	# keep pycache check
	...

	pipeline_relative_path = pipeline_dir.relative_to(pipelines_package) # path/to/dir
	full_pipeline_name = os.path.normpath(pipeline_relative_path).replace(os.path.sep, '.') #path.to.dir, also works with windows path
	pipeline_module_name = f"{PACKAGE_NAME}.pipelines.{full_pipeline_name}" # same as before
	...

If you are interested in this change, I can create a pull request.
If you think find_pipelines should not support nested folder structure, I am also interested in your arguments.

Thanks a lot.

@Gabriel2409 Gabriel2409 added the Issue: Feature Request New feature or improvement to existing feature label Sep 29, 2023
@noklam
Copy link
Contributor

noklam commented Sep 29, 2023

@deepyaman

@astrojuanlu
Copy link
Member

Thanks for this idea @Gabriel2409 ! I think this makes a ton of sense. We are focusing on getting the 0.19 release out of the door so it's unlikely we look at this in the next month or so, but we'll make sure we consider it.

@Gabriel2409
Copy link
Author

Hi @astrojuanlu thank you for your reply.
I understand that this is not an urgent change.
Still I added a draft PR as a support for the discussion. Do not hesitate to add comments if and when you have the time.

Thank you very much.

@Gabriel2409
Copy link
Author

closed in favor of #3105

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue: Feature Request New feature or improvement to existing feature
Projects
Archived in project
Development

No branches or pull requests

3 participants