Skip to content

Latest commit

Β 

History

History
executable file
Β·
656 lines (431 loc) Β· 18.5 KB

contents.md

File metadata and controls

executable file
Β·
656 lines (431 loc) Β· 18.5 KB

The Institute for Ethical AI & ML

The state of Production ML in 2020



Alejandro Saucedo | a@ethical.institute

Twitter: @AxSaucedo

[NEXT]

The Institute for Ethical AI & ML

The state of Production ML in 2020


![portrait](images/aletechuk.png)
Alejandro Saucedo
Twitter: @AxSaucedo
    <br>
    Chief Scientist
    <br>
    <a style="color: cyan" href="http://e-x.io">The Institute for Ethical AI & ML</a
    <br>
    <br>
    <br>
    Engineering Director
    <br>
    <a style="color: cyan" href="#">Seldon Technologies</a>
    <br>
    <br>
    <hr>
    <br>
    Head of Solutions Eng. & Sci.
    <br>
    <a style="color: cyan" href="http://eigentech.com">Eigen Technologies</a>
    <br>
    <br>
    Software Engineer
    <br>
    <a style="color: cyan" href="#">Bloomberg LP.</a>

</td>

[NEXT]

classification_large

OSS ML Serving in k8s

classification_large

We're hiring: seldon.io

[NEXT]

The Institute for Ethical AI & Machine Learning

classification_large

[NEXT]

We are part of the Linux Foundation AI

classification_large

[NEXT]

Small data science projects

classification_large

Works relatively well

[NEXT]

However

As our data science requirements grow...

We face new issues

[NEXT]

Increasing complexity in flow of data

classification_large

[NEXT]

Each data scientist has their own set of tools

  • Some β™₯ Tensorflow
  • Some β™₯ R
  • Some β™₯ Spark
![classification_large](images/mlibs.jpg)

### Some β™₯ all of them

[NEXT]

Serving models becomes increasinly harder

classification_large

[NEXT]

When stuff goes wrong it's hard to trace back

classification_large

[NEXT]

As your technical functions grow...

classification_large

[NEXT]

So should your infrastructure

classification_large

[NEXT]

It's challenging

full_height

[NEXT]

Mapping the Ecosystem

[NEXT]

Principles today


  • Orchestration
  • Explainability
  • Reproducibility

[NEXT SECTION]

2.1 Model Orchestration

classification_large

Training & serving at scale

[NEXT]

Computational Resource allocation

Services with different computational requirements

With often complex computational graphs

We need to be able to allocate the right resources


### This is a hard problem

[NEXT]

Adding Governance/Compliance

classification_large

[NEXT]

Standardisation of metrics

classification_large

[NEXT]

Standardisation of errors

classification_large

[NEXT]

Complex Deployment Strategies

classification_large

[NEXT]

Hands on example using:

Seldon core is an OSS library for machine learning orchestration and monitoring in production

[NEXT]

Basic Example:

Wrapping an income classifier Python model

classification_large

[NEXT]

GitOps Strategies for ML

classification_large

[NEXT]

More advanced Example:

PyTorch Hub Deployment: https://bit.ly/pytorchseldon

classification_large

[NEXT]

Other libraries to watch

[NEXT]

KFServing

Serverness for machine learning in kubernetes based on Knative

classification_large

[NEXT]

DeepDetect

Unifying multiple external machine learning libraries on a single API

classification_large

[NEXT SECTION]

2.2 Explainability

Tackling "black box model" situations

classification_large

[NEXT]

Going beyond the algorithms

Explainability through tools, process and domain expertise. classification_large

[Our talk on Explainability of Tensorflow Models]

[NEXT]

Data assessment


  • Class imbalances
  • Protected features
  • Correlations
  • Data representability

[NEXT]

Model assessment


  • Feature importance
  • Model specific methods
  • Domain knowledge abstraction
  • Model metrics analysis

[NEXT]

Production monitoring

  • Evaluation of metrics
  • Manual human review
  • Monitoring of anomalies
  • Setting thresholds for divergence

[NEXT]

Infrastructure level XAI Design patterns

classification_large

[NEXT]

Hands on example using:

Alibi is a library that contains production-level black box model explainability techniques

[NEXT]

Example

Deploying Explainer Modules: http://bit.ly/seldonexplainer

classification_large

[NEXT]

Other OSS libraries to watch

[NEXT]

ELI5

classification_large

[NEXT]

SHAP

Unifying multiple model explainability techniques

classification_large

[NEXT]

XAI

Analyse datasets, evaluate models and monitor production

classification_large

[NEXT SECTION]

2.3 Reproducibility

classification_large

Model & data versioning

[NEXT]

Abstracting individual steps

classification_large

Data in


$ cat data-input.csv

>            Date    Open    High     Low   Close     Market Cap
> 1608 2013-04-28  135.30  135.98  132.10  134.21  1,500,520,000
> 1607 2013-04-29  134.44  147.49  134.00  144.54  1,491,160,000
> 1606 2013-04-30  144.00  146.93  134.05  139.00  1,597,780,000

Code / Config


$ cat feature-extractor.py

def open_norm_feature_extractor(df): feature = some_lib.get_open(df) return feature

Data out


$ cat data-output.csv

Open 0.57 0.59 0.47

[NEXT]

![classification_large](images/versioning.jpg)
## Going one level higher

We can abstract our entire pipeline and data flows

classification_large

[NEXT]

Hands on example using:

Kubeflow is a Cloud Native platform for reusable machine learning pipelines in kubernetes

[NEXT]

Example

Reusable NLP Pipelines: https://bit.ly/seldon-kf-nlp

classification_large

[NEXT]

Other OSS libraries to watch

[NEXT]

Data Version Control (DVC)

Add your data

dvc add images.zip

commit data input, model output and code

dvc run -d images.zip -o model.p ./cnn.py

Add repository location (here is s3)

dvc remote add myrepo s3://mybucket

Push to the location specified

dvc push

Check it out at dvc.org

[NEXT]

MLFlow

classification_large

[NEXT]

Pachyderm

classification_large

[NEXT SECTION]

Much more content

πŸ” Explainability πŸ” Privacy πŸ“œ Versioning
🏁 Orchestration πŸŒ€ FeaturEng πŸ€– AutoML
πŸ““ Notebooks πŸ“Š Visualisation πŸ”  NLP
🧡 ETL πŸ—žοΈ Storage πŸ“‘ FaaS
πŸ—ΊοΈ Computation πŸ“₯ Serialisation 🎁 Compiler
πŸ’Έ CommercialML πŸ’° CommercialETL

### Check it out & add more libraries

[NEXT]

The Institute for Ethical AI & ML

The state of Production ML in 2020


![portrait](images/aletechuk.png)
Alejandro Saucedo
Twitter: @AxSaucedo
    <br>
    Chief Scientist
    <br>
    <a style="color: cyan" href="http://e-x.io">The Institute for Ethical AI & ML</a
    <br>
    <br>
    <br>
    Engineering Director
    <br>
    <a style="color: cyan" href="#">Seldon Technologies</a>
    <br>
    <br>
    <hr>
    <br>
    Head of Solutions Eng. & Sci.
    <br>
    <a style="color: cyan" href="http://eigentech.com">Eigen Technologies</a>
    <br>
    <br>
    Software Engineer
    <br>
    <a style="color: cyan" href="#">Bloomberg LP.</a>

</td>