Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/pipeline ml inputs #101

Merged
merged 2 commits into from
Oct 27, 2020
Merged

Feature/pipeline ml inputs #101

merged 2 commits into from
Oct 27, 2020

Conversation

Galileo-Galilei
Copy link
Owner

Description

Closes #71 and #100.

Development notes

  • PipelineML.extract_pipeline_catalog is renamed PipelineML._extract_pipeline_catalog to show it is private
  • Change the doc to deprecate using extract_pipeline_catalog in favor of extract_pipeline_artifacts
  • PipelineML now has a logger property
  • PipelineML now accepts that inference inputs may be in training inputs (and not only in all outputs

Checklist

  • Read the contributing guidelines
  • Open this PR as a 'Draft Pull Request' if it is work-in-progress
  • Update the documentation to reflect the code changes
  • Add a description of this change and add your name to the list of supporting contributions in the CHANGELOG.md file. Please respect Keep a Changelog guidelines.
  • Add tests to cover your changes

Notice

  • I acknowledge and agree that, by checking this box and clicking "Submit Pull Request":

  • I submit this contribution under the Apache 2.0 license and represent that I am entitled to do so on behalf of myself, my employer, or relevant third parties, as applicable.

  • I certify that (a) this contribution is my original creation and / or (b) to the extent it is not my original creation, I am authorised to submit this contribution on behalf of the original creator(s) or their licensees.

  • I certify that the use of this contribution as authorised by the Apache 2.0 license does not violate the intellectual property rights of anyone else.

@codecov-io
Copy link

codecov-io commented Oct 20, 2020

Codecov Report

Merging #101 into develop will decrease coverage by 1.22%.
The diff coverage is 100.00%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #101      +/-   ##
===========================================
- Coverage    98.53%   97.31%   -1.23%     
===========================================
  Files           20       20              
  Lines          616      632      +16     
===========================================
+ Hits           607      615       +8     
- Misses           9       17       +8     
Impacted Files Coverage Δ
kedro_mlflow/framework/hooks/pipeline_hook.py 98.79% <100.00%> (ø)
kedro_mlflow/mlflow/kedro_pipeline_model.py 100.00% <100.00%> (ø)
kedro_mlflow/pipeline/pipeline_ml.py 92.30% <100.00%> (-7.70%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e12a74c...ca7d520. Read the comment docs.

@Galileo-Galilei
Copy link
Owner Author

Galileo-Galilei commented Oct 20, 2020

Known pb with test coverage which includes test folder. It will be solved once #98 is merged, so we should merge it before and I'll rebase on it.

Copy link
Collaborator

@takikadiri takikadiri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I point out some typo errors and the management of the kedro parameters by the PipelieML

kedro_mlflow/pipeline/pipeline_ml.py Outdated Show resolved Hide resolved
self._input_name = name

def extract_pipeline_catalog(self, catalog: DataCatalog) -> DataCatalog:
def _extract_pipeline_catalog(self, catalog: DataCatalog) -> DataCatalog:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we allow the use of parameters as inference / training inputs?
kedro create params:xxx inputs as a MemoryDataSet. The following PipelineML code exclude them from our inference pipelines :

if isinstance(data_set, MemoryDataSet):
     raise KedroMlflowPipelineMLDatasetsError(...)

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hesitated to deal with parameters automatically, but:

  • it is quite complicated: there are a some edge case situation to deal with, we have to decide how / when / where to persist them
  • it is error prone: I don't want to persist parameters that are not explictly intended to.

On the other hand, it is very easy for a user to enforce a parameter just by persisting either as an input or output of a "training" node, e.g. by creating a YAMLDataSet, so I think we can just let it to the user to be sure that it voluntary.

kedro_mlflow/pipeline/pipeline_ml.py Outdated Show resolved Hide resolved
@Galileo-Galilei
Copy link
Owner Author

It seems coverage has decreased when I reabsed, I may have skipped a test. Do not merge it yet.

@Galileo-Galilei Galileo-Galilei marked this pull request as draft October 25, 2020 22:03
@takikadiri
Copy link
Collaborator

takikadiri commented Oct 25, 2020

Ok ! For merging multi FIX PRs, do you prefer to pack the commits with a PR merge commit, or i just rebase and merge?

@Galileo-Galilei
Copy link
Owner Author

I always "rebase and merge" to the develop branch. The only merges are from develop to master.

@Galileo-Galilei Galileo-Galilei marked this pull request as ready for review October 26, 2020 20:04
@Galileo-Galilei
Copy link
Owner Author

@takikadiri It's good to go!

@takikadiri takikadiri merged commit 08f0645 into develop Oct 27, 2020
@Galileo-Galilei Galileo-Galilei deleted the feature/pipeline-ml-inputs branch October 27, 2020 07:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants