Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid systematic deepcopy of inference datasets #133

Closed
takikadiri opened this issue Dec 9, 2020 · 0 comments · Fixed by #152
Closed

Avoid systematic deepcopy of inference datasets #133

takikadiri opened this issue Dec 9, 2020 · 0 comments · Fixed by #152
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@takikadiri
Copy link
Collaborator

takikadiri commented Dec 9, 2020

Description

Actually kedro_mlflow_model create a new catalog called loaded_catalog where it declare all the pipeline_ml artifacts with the new filepath. see here
Our current problem is that each of these datasets are deep-copied between the kedro nodes, and some artifacts/datasets take a long time to be deep-copied (keras model for example), and this is not suitable in an API serving pattern.

We need to be able to avoid deepcopy of some (or all) datasets in the inference pipeline

Possible Implementation

Defining inference datasets type to MemoryDataset(name, copy_mode="assign") solve the problem, but can break some inference pipeline that mutate the dataset state between the nodes. We can configure the copy_mode in one of these two levels :

  • Propose an option to redefine inference datasets type at PipelineML level
  • Propose an options to redefine inference datasets type at kedro_pipeline_model level
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: ✅ Done
2 participants