mlwhatif

Data-Centric What-If Analysis for Native Machine Learning Pipelines.

This project uses the mlinspect project as a foundation, mainly for its plan extraction from native ML pipelines.

Run mlwhatif locally

Prerequisite: Python 3.9

Clone this repository (optionally, with Git LFS, to also download the datasets for the scalability experiment)
Set up the environment

cd mlwhatif
python -m venv venv
source venv/bin/activate
If you want to use the visualisation functions we provide, install graphviz which can not be installed via pip

Linux: apt-get install graphviz
MAC OS: brew install graphviz
Install pip dependencies

SETUPTOOLS_USE_DISTUTILS=stdlib pip install -e ."[dev]"
To ensure everything works, you can run the tests (without graphviz, the visualisation test will fail)

python setup.py test

How to use mlwhatif

mlwhatif makes it easy to analyze your pipeline and automatically run what-if analyses.

from mlwhatif import PipelineAnalyzer
from mlwhatif.analysis import DataCleaning, ErrorType

IPYNB_PATH = ...
cleanlearn = DataCleaning({'category': ErrorType.CAT_MISSING_VALUES,
                           'vine': ErrorType.CAT_MISSING_VALUES,
                           'star_rating': ErrorType.NUM_MISSING_VALUES,
                           'total_votes': ErrorType.OUTLIERS,
                           'review_id': ErrorType.DUPLICATES,
                           None: ErrorType.MISLABEL
                         })

analysis_result = PipelineAnalyzer \
    .on_pipeline_from_ipynb_file(IPYNB_PATH)\
    .add_what_if_analysis(cleanlearn) \
    .execute()

cleanlearn_report = analysis_result.analysis_to_result_reports[cleanlearn]

Detailed Example

We prepared a demo notebook to showcase mlwhatif and its features.

Notes

For debugging in PyCharm, set the pytest flag --no-cov (Link)
If you want to see log output in PyCharm, you can also set the pytest flags --log-cli-level=10 -s. The -s is needed because otherwise pytest breaks the stdout capturing.

Publications

Stefan Grafberger, Shubha Guha, Paul Groth, Sebastian Schelter (2023). mlwhatif: What If You Could Stop Re-Implementing Your Machine Learning Pipeline Analyses Over and Over? VLDB (demo).
Stefan Grafberger, Paul Groth, Sebastian Schelter (2023). Automating and Optimizing Data-Centric What-If Analyses on Native Machine Learning Pipelines. ACM SIGMOD.
Stefan Grafberger, Paul Groth, Sebastian Schelter (2022). Towards Data-Centric What-If Analysis for Native Machine Learning Pipelines. Data Management for End-to-End Machine Learning workshop at ACM SIGMOD.

License

This library is licensed under the Apache 2.0 License.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.github		.github
demo		demo
example_pipelines		example_pipelines
experiments		experiments
mlwhatif		mlwhatif
requirements		requirements
test		test
.gitattributes		.gitattributes
.gitignore		.gitignore
.pylintrc		.pylintrc
LICENSE		LICENSE
README.md		README.md
codecov.yml		codecov.yml
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

mlwhatif

Run mlwhatif locally

How to use mlwhatif

Detailed Example

Notes

Publications

License

About

Releases

Packages

Languages

License

stefan-grafberger/mlwhatif

Folders and files

Latest commit

History

Repository files navigation

mlwhatif

Run mlwhatif locally

How to use mlwhatif

Detailed Example

Notes

Publications

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages