This is my own work following the "Deploy Machine Learning Models with Django" tutorial by Piotr Płoński at deploymachinelearning.com.
The original tutorial briefly covers a large number of topics and has a few issues, so this version improves upon the original in a number of ways.
- Newer versions of various software packages are used over the ones that were available in 2019 when the original tutorial was written. Namely, Django 4.0.3 is used instead of the specified 2.2.4 from the tutorial. See
requirements.txt
for other software versions. - This project does not create a
backend
directory for theserver
Django project as it appears in the tutorial that thebackend
directory contains nothing other than the Django project itself. - The Django apps
endpoints
andml
exist at the top of theserver
Django directory instead of creating a newapp
directory just to hold Django apps.
- The data training notebook
Data Training.ipynb
uses anOrdinalEncoder
to encode categorical data instead of theLabelEncoder
used in the tutorial. The sklearn docs forLabelEncoder
explains why:This transformer should be used to encode target values, i.e.
y
, and not the inputX
. - The data training notebook
Data Training.ipynb
trains theOrdinalEncoder
on the full set of inputsX
instead of only usingX_train
as in the tutorial. This solves the issue that occurs whenX_test
contains unique values that are not also found inX_train
. So the encoder must be trained on the full set of possible values for all input features. - The data training notebook
Data Training.ipynb
takes one additional step after training the algorithms to evaluate their accuracy withsklearn.metrics.confusion_matrix
.
- A number of
CharField
attributes inendpoints/models.py
were changed toTextField
which is more appropriate for strings of significant length, andmax_length
parameters were removed fromMLAlgoithm.description
andMLAlgorithm.code
. __str__
methods were written in various model classes to improve readability on the Django site admin.- A
Meta
class was added to a number of models to improve readability in the generated pages. - Docstrings were added in various places to improve understanding.
- Replaced hardcoding of relative paths with
pathlib.Path
in places such asml.income_classifier.random_forest
. - Replaced hardcoding of categorical features in
ml.income_classifier
withOrdinalEncoder.feature_names_in_
to allow for more dynamic processing. - Instead of using a
RandomForestClassifier
fromml.income_classifier
and creating a new class for each new type of classifier algorithm, a generalIncomeClassifier
class was created to hold algorithm data for different income classier models. - The
MLRegistry.endpoints
property was changed toMLRegistry.__endpoint_algorithms
to indicate it should not be directly manipulated outside the class. - A
get_algorithm
method was added to theMLRegistry
which finds and returns algorithm objects from the registry. This method is then used in thePredictView
to make requested predictions. - A
__str__
method was added to theMLRegistry
class which returns a string representation of the endpoint algorithms dictionary. This method is used to test the absence (len == 2) or presence (len > 2) of algorithms in the registry. - The
MLRegistry
was reworked to strictly associate DB objects withIncomeClassifier
instances. The__init__
method instantiates anIncomeClassifier
for eachMLAlgorithm
object in the database and adds it to the registry list. IncomeClassifier
now keeps track of which artifacts belong to which algorithm and no longer requires the file name upon instantiation.
- The
PredictView
now consistently expects JSON input fromresponse.data
. It usesjson.loads
on the data before sending it to the classifier for prediction. - The
IncomeClassifier
now correctly fills in missing values in the input data with training mode values by usingDataFrame.fillna()
with theinplace=True
parameter. - The
MLRequest
model has a new field calledprediction
to use as a disambiguous record of the prediction label that can be compared to thefeedback
field to calculate accuracy. MLRequest
objects usejson.loads
instead ofjson.dumps
to record input data since it is already expected to be a JSON string.- The
EndpointTests.test_predict_view
test case now dumps the test data dict to a JSON string withjson.dumps()
instead of posting the dict directly to the predict view.