From 3f01cbfaa0e414bf0b69a5a63ef7a312d95ec8e3 Mon Sep 17 00:00:00 2001 From: "Documenter.jl" Date: Wed, 13 Nov 2024 20:07:47 +0000 Subject: [PATCH] build based on 318a61e --- dev/.documenter-siteinfo.json | 2 +- dev/about/index.html | 2 +- .../architecture_search/README/index.html | 2 +- .../architecture_search/notebook/index.html | 4 +- .../comparison/README/index.html | 2 +- .../comparison/notebook/index.html | 2 +- .../composition/README/index.html | 2 +- .../composition/notebook/index.html | 2 +- .../early_stopping/README/index.html | 2 +- .../notebook/{1bc2ace8.svg => f1978e4e.svg} | 68 ++++++------- .../early_stopping/notebook/index.html | 2 +- .../hyperparameter_tuning/README/index.html | 2 +- .../notebook/{b81b7212.svg => be01b12a.svg} | 52 +++++----- .../hyperparameter_tuning/notebook/index.html | 2 +- .../incremental_training/README/index.html | 2 +- .../incremental_training/notebook/index.html | 6 +- .../live_training/README/index.html | 2 +- .../live_training/notebook/index.html | 6 +- dev/contributing/index.html | 2 +- dev/extended_examples/Boston/index.html | 2 +- dev/extended_examples/MNIST/README/index.html | 2 +- .../notebook/{ef67068c.svg => ad20afb5.svg} | 72 ++++++------- .../notebook/{dc63cda3.svg => d68b7b1e.svg} | 96 +++++++++--------- .../MNIST/notebook/index.html | 6 +- .../spam_detection/README/index.html | 2 +- .../spam_detection/notebook/index.html | 10 +- dev/index.html | 2 +- dev/interface/Builders/index.html | 4 +- dev/interface/Classification/index.html | 4 +- dev/interface/Custom Builders/index.html | 2 +- dev/interface/Image Classification/index.html | 2 +- .../Multitarget Regression/index.html | 2 +- dev/interface/Regression/index.html | 2 +- dev/interface/Summary/index.html | 2 +- dev/objects.inv | Bin 1786 -> 1786 bytes 35 files changed, 187 insertions(+), 187 deletions(-) rename dev/common_workflows/early_stopping/notebook/{1bc2ace8.svg => f1978e4e.svg} (86%) rename dev/common_workflows/hyperparameter_tuning/notebook/{b81b7212.svg => be01b12a.svg} (85%) rename dev/extended_examples/MNIST/notebook/{ef67068c.svg => ad20afb5.svg} (85%) rename dev/extended_examples/MNIST/notebook/{dc63cda3.svg => d68b7b1e.svg} (85%) diff --git a/dev/.documenter-siteinfo.json b/dev/.documenter-siteinfo.json index 700e4d7..1ffa001 100644 --- a/dev/.documenter-siteinfo.json +++ b/dev/.documenter-siteinfo.json @@ -1 +1 @@ -{"documenter":{"julia_version":"1.10.6","generation_timestamp":"2024-11-13T20:06:13","documenter_version":"1.8.0"}} \ No newline at end of file +{"documenter":{"julia_version":"1.10.6","generation_timestamp":"2024-11-13T20:07:43","documenter_version":"1.8.0"}} \ No newline at end of file diff --git a/dev/about/index.html b/dev/about/index.html index b7d40b0..5ffd775 100644 --- a/dev/about/index.html +++ b/dev/about/index.html @@ -1,2 +1,2 @@ -- · MLJFlux
+- · MLJFlux
diff --git a/dev/common_workflows/architecture_search/README/index.html b/dev/common_workflows/architecture_search/README/index.html index 2bcaf8d..6510817 100644 --- a/dev/common_workflows/architecture_search/README/index.html +++ b/dev/common_workflows/architecture_search/README/index.html @@ -1,2 +1,2 @@ -Contents · MLJFlux

Contents

filedescription
notebook.ipynbJuptyer notebook (executed)
notebook.unexecuted.ipynbJupyter notebook (unexecuted)
notebook.mdstatic markdown (included in MLJFlux.jl docs)
notebook.jlexecutable Julia script annotated with comments
generate.jlmaintainers only: execute to generate first 3 from 4th

Important

Scripts or notebooks in this folder cannot be reliably executed without the accompanying Manifest.toml and Project.toml files.

+Contents · MLJFlux

Contents

filedescription
notebook.ipynbJuptyer notebook (executed)
notebook.unexecuted.ipynbJupyter notebook (unexecuted)
notebook.mdstatic markdown (included in MLJFlux.jl docs)
notebook.jlexecutable Julia script annotated with comments
generate.jlmaintainers only: execute to generate first 3 from 4th

Important

Scripts or notebooks in this folder cannot be reliably executed without the accompanying Manifest.toml and Project.toml files.

diff --git a/dev/common_workflows/architecture_search/notebook/index.html b/dev/common_workflows/architecture_search/notebook/index.html index 229e568..37d8ff4 100644 --- a/dev/common_workflows/architecture_search/notebook/index.html +++ b/dev/common_workflows/architecture_search/notebook/index.html @@ -86,7 +86,7 @@ fit!(mach, verbosity = 0); fitted_params(mach).best_model
NeuralNetworkClassifier(
   builder = MLP(
-        hidden = (9, 37, 25), 
+        hidden = (9, 5, 37), 
         σ = NNlib.relu), 
   finaliser = NNlib.softmax, 
   optimiser = Adam(0.01, (0.9, 0.999), 1.0e-8), 
@@ -103,4 +103,4 @@
     mlp = [x[:model].builder for x in history],
     measurement = [x[:measurement][1] for x in history],
 )
-first(sort!(history_df, [order(:measurement)]), 10)
10×2 DataFrame
Rowmlpmeasurement
MLP…Float64
1MLP(hidden = (9, 37, 25), …)0.0623247
2MLP(hidden = (21, 45, 37), …)0.0713132
3MLP(hidden = (41, 17, 13), …)0.0857736
4MLP(hidden = (61, 21, 33), …)0.093103
5MLP(hidden = (61, 53, 21), …)0.0959491
6MLP(hidden = (21, 9, 13), …)0.0966514
7MLP(hidden = (45, 17, 21), …)0.0970939
8MLP(hidden = (25, 17, 29), …)0.0992405
9MLP(hidden = (29, 61, 17), …)0.100194
10MLP(hidden = (33, 57, 17), …)0.100631

This page was generated using Literate.jl.

+first(sort!(history_df, [order(:measurement)]), 10)
10×2 DataFrame
Rowmlpmeasurement
MLP…Float64
1MLP(hidden = (9, 5, 37), …)0.0702663
2MLP(hidden = (25, 9, 49), …)0.0867743
3MLP(hidden = (33, 9, 49), …)0.0892747
4MLP(hidden = (25, 45, 49), …)0.0894714
5MLP(hidden = (21, 9, 45), …)0.0905676
6MLP(hidden = (25, 17, 29), …)0.0992405
7MLP(hidden = (29, 9, 9), …)0.0995201
8MLP(hidden = (53, 9, 33), …)0.101136
9MLP(hidden = (57, 45, 37), …)0.101165
10MLP(hidden = (45, 49, 49), …)0.103885

This page was generated using Literate.jl.

diff --git a/dev/common_workflows/comparison/README/index.html b/dev/common_workflows/comparison/README/index.html index 4fa3e08..970574d 100644 --- a/dev/common_workflows/comparison/README/index.html +++ b/dev/common_workflows/comparison/README/index.html @@ -1,2 +1,2 @@ -Contents · MLJFlux

Contents

filedescription
notebook.ipynbJuptyer notebook (executed)
notebook.unexecuted.ipynbJupyter notebook (unexecuted)
notebook.mdstatic markdown (included in MLJFlux.jl docs)
notebook.jlexecutable Julia script annotated with comments
generate.jlmaintainers only: execute to generate first 3 from 4th

Important

Scripts or notebooks in this folder cannot be reliably executed without the accompanying Manifest.toml and Project.toml files.

+Contents · MLJFlux

Contents

filedescription
notebook.ipynbJuptyer notebook (executed)
notebook.unexecuted.ipynbJupyter notebook (unexecuted)
notebook.mdstatic markdown (included in MLJFlux.jl docs)
notebook.jlexecutable Julia script annotated with comments
generate.jlmaintainers only: execute to generate first 3 from 4th

Important

Scripts or notebooks in this folder cannot be reliably executed without the accompanying Manifest.toml and Project.toml files.

diff --git a/dev/common_workflows/comparison/notebook/index.html b/dev/common_workflows/comparison/notebook/index.html index 73b71b1..a0ba64c 100644 --- a/dev/common_workflows/comparison/notebook/index.html +++ b/dev/common_workflows/comparison/notebook/index.html @@ -54,4 +54,4 @@ mlp = [x[:model] for x in history], measurement = [x[:measurement][1] for x in history], ) -sort!(history_df, [order(:measurement)])
4×2 DataFrame
Rowmlpmeasurement
Probabil…Float64
1BayesianLDA(method = gevd, …)0.0610826
2NeuralNetworkClassifier(builder = MLP(hidden = (5, 4), …), …)0.0857014
3RandomForestClassifier(max_depth = -1, …)0.101502
4ProbabilisticTunedModel(model = XGBoostClassifier(test = 1, …), …)0.221056

This is Occam's razor in practice.


This page was generated using Literate.jl.

+sort!(history_df, [order(:measurement)])
4×2 DataFrame
Rowmlpmeasurement
Probabil…Float64
1BayesianLDA(method = gevd, …)0.0610826
2NeuralNetworkClassifier(builder = MLP(hidden = (5, 4), …), …)0.0857014
3RandomForestClassifier(max_depth = -1, …)0.103571
4ProbabilisticTunedModel(model = XGBoostClassifier(test = 1, …), …)0.221056

This is Occam's razor in practice.


This page was generated using Literate.jl.

diff --git a/dev/common_workflows/composition/README/index.html b/dev/common_workflows/composition/README/index.html index f0b15e8..fef5848 100644 --- a/dev/common_workflows/composition/README/index.html +++ b/dev/common_workflows/composition/README/index.html @@ -1,2 +1,2 @@ -Contents · MLJFlux

Contents

filedescription
notebook.ipynbJuptyer notebook (executed)
notebook.unexecuted.ipynbJupyter notebook (unexecuted)
notebook.mdstatic markdown (included in MLJFlux.jl docs)
notebook.jlexecutable Julia script annotated with comments
generate.jlmaintainers only: execute to generate first 3 from 4th

Important

Scripts or notebooks in this folder cannot be reliably executed without the accompanying Manifest.toml and Project.toml files.

+Contents · MLJFlux

Contents

filedescription
notebook.ipynbJuptyer notebook (executed)
notebook.unexecuted.ipynbJupyter notebook (unexecuted)
notebook.mdstatic markdown (included in MLJFlux.jl docs)
notebook.jlexecutable Julia script annotated with comments
generate.jlmaintainers only: execute to generate first 3 from 4th

Important

Scripts or notebooks in this folder cannot be reliably executed without the accompanying Manifest.toml and Project.toml files.

diff --git a/dev/common_workflows/composition/notebook/index.html b/dev/common_workflows/composition/notebook/index.html index b866293..6801a26 100644 --- a/dev/common_workflows/composition/notebook/index.html +++ b/dev/common_workflows/composition/notebook/index.html @@ -66,4 +66,4 @@ ├────────────────────────────┼─────────┤ │ [1.0, 1.0, 0.95, 1.0, 1.0] │ 0.0219 │ └────────────────────────────┴─────────┘ -

This page was generated using Literate.jl.

+

This page was generated using Literate.jl.

diff --git a/dev/common_workflows/early_stopping/README/index.html b/dev/common_workflows/early_stopping/README/index.html index a6c1aec..b655239 100644 --- a/dev/common_workflows/early_stopping/README/index.html +++ b/dev/common_workflows/early_stopping/README/index.html @@ -1,2 +1,2 @@ -Contents · MLJFlux

Contents

filedescription
notebook.ipynbJuptyer notebook (executed)
notebook.unexecuted.ipynbJupyter notebook (unexecuted)
notebook.mdstatic markdown (included in MLJFlux.jl docs)
notebook.jlexecutable Julia script annotated with comments
generate.jlmaintainers only: execute to generate first 3 from 4th

Important

Scripts or notebooks in this folder cannot be reliably executed without the accompanying Manifest.toml and Project.toml files.

+Contents · MLJFlux

Contents

filedescription
notebook.ipynbJuptyer notebook (executed)
notebook.unexecuted.ipynbJupyter notebook (unexecuted)
notebook.mdstatic markdown (included in MLJFlux.jl docs)
notebook.jlexecutable Julia script annotated with comments
generate.jlmaintainers only: execute to generate first 3 from 4th

Important

Scripts or notebooks in this folder cannot be reliably executed without the accompanying Manifest.toml and Project.toml files.

diff --git a/dev/common_workflows/early_stopping/notebook/1bc2ace8.svg b/dev/common_workflows/early_stopping/notebook/f1978e4e.svg similarity index 86% rename from dev/common_workflows/early_stopping/notebook/1bc2ace8.svg rename to dev/common_workflows/early_stopping/notebook/f1978e4e.svg index 72d607f..3edc618 100644 --- a/dev/common_workflows/early_stopping/notebook/1bc2ace8.svg +++ b/dev/common_workflows/early_stopping/notebook/f1978e4e.svg @@ -1,48 +1,48 @@ - + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/dev/common_workflows/early_stopping/notebook/index.html b/dev/common_workflows/early_stopping/notebook/index.html index 571b237..38bc6ed 100644 --- a/dev/common_workflows/early_stopping/notebook/index.html +++ b/dev/common_workflows/early_stopping/notebook/index.html @@ -57,4 +57,4 @@ [ Info: final training loss: 0.045833383 [ Info: Stop triggered by EarlyStopping.NumberLimit(100) stopping criterion. [ Info: Total of 100 iterations.

Results

We can see that the model converged after 100 iterations.

plot(training_losses, label="Training Loss", linewidth=2)
-plot!(validation_losses, label="Validation Loss", linewidth=2, size=(800,400))
Example block output

This page was generated using Literate.jl.

+plot!(validation_losses, label="Validation Loss", linewidth=2, size=(800,400))Example block output

This page was generated using Literate.jl.

diff --git a/dev/common_workflows/hyperparameter_tuning/README/index.html b/dev/common_workflows/hyperparameter_tuning/README/index.html index 8eb2ed2..cb34077 100644 --- a/dev/common_workflows/hyperparameter_tuning/README/index.html +++ b/dev/common_workflows/hyperparameter_tuning/README/index.html @@ -1,2 +1,2 @@ -Contents · MLJFlux

Contents

filedescription
notebook.ipynbJuptyer notebook (executed)
notebook.unexecuted.ipynbJupyter notebook (unexecuted)
notebook.mdstatic markdown (included in MLJFlux.jl docs)
notebook.jlexecutable Julia script annotated with comments
generate.jlmaintainers only: execute to generate first 3 from 4th

Important

Scripts or notebooks in this folder cannot be reliably executed without the accompanying Manifest.toml and Project.toml files.

+Contents · MLJFlux

Contents

filedescription
notebook.ipynbJuptyer notebook (executed)
notebook.unexecuted.ipynbJupyter notebook (unexecuted)
notebook.mdstatic markdown (included in MLJFlux.jl docs)
notebook.jlexecutable Julia script annotated with comments
generate.jlmaintainers only: execute to generate first 3 from 4th

Important

Scripts or notebooks in this folder cannot be reliably executed without the accompanying Manifest.toml and Project.toml files.

diff --git a/dev/common_workflows/hyperparameter_tuning/notebook/b81b7212.svg b/dev/common_workflows/hyperparameter_tuning/notebook/be01b12a.svg similarity index 85% rename from dev/common_workflows/hyperparameter_tuning/notebook/b81b7212.svg rename to dev/common_workflows/hyperparameter_tuning/notebook/be01b12a.svg index 99e7796..68ba729 100644 --- a/dev/common_workflows/hyperparameter_tuning/notebook/b81b7212.svg +++ b/dev/common_workflows/hyperparameter_tuning/notebook/be01b12a.svg @@ -1,40 +1,40 @@ - + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + diff --git a/dev/common_workflows/hyperparameter_tuning/notebook/index.html b/dev/common_workflows/hyperparameter_tuning/notebook/index.html index e4e5242..efd23e8 100644 --- a/dev/common_workflows/hyperparameter_tuning/notebook/index.html +++ b/dev/common_workflows/hyperparameter_tuning/notebook/index.html @@ -67,4 +67,4 @@ xlab=curve.parameter_name, xscale=curve.parameter_scale, ylab = "Cross Entropy", -)Example block output

This page was generated using Literate.jl.

+)Example block output

This page was generated using Literate.jl.

diff --git a/dev/common_workflows/incremental_training/README/index.html b/dev/common_workflows/incremental_training/README/index.html index b8aec5a..5e1dded 100644 --- a/dev/common_workflows/incremental_training/README/index.html +++ b/dev/common_workflows/incremental_training/README/index.html @@ -1,2 +1,2 @@ -Contents · MLJFlux

Contents

filedescription
notebook.ipynbJuptyer notebook (executed)
notebook.unexecuted.ipynbJupyter notebook (unexecuted)
notebook.mdstatic markdown (included in MLJFlux.jl docs)
notebook.jlexecutable Julia script annotated with comments
generate.jlmaintainers only: execute to generate first 3 from 4th

Important

Scripts or notebooks in this folder cannot be reliably executed without the accompanying Manifest.toml and Project.toml files.

+Contents · MLJFlux

Contents

filedescription
notebook.ipynbJuptyer notebook (executed)
notebook.unexecuted.ipynbJupyter notebook (unexecuted)
notebook.mdstatic markdown (included in MLJFlux.jl docs)
notebook.jlexecutable Julia script annotated with comments
generate.jlmaintainers only: execute to generate first 3 from 4th

Important

Scripts or notebooks in this folder cannot be reliably executed without the accompanying Manifest.toml and Project.toml files.

diff --git a/dev/common_workflows/incremental_training/notebook/index.html b/dev/common_workflows/incremental_training/notebook/index.html index 7b4ffd9..404d948 100644 --- a/dev/common_workflows/incremental_training/notebook/index.html +++ b/dev/common_workflows/incremental_training/notebook/index.html @@ -35,8 +35,8 @@ fit!(mach)
trained Machine; caches model-specific representations of data
   model: NeuralNetworkClassifier(builder = MLP(hidden = (5, 4), …), …)
   args: 
-    1:	Source @492 ⏎ ScientificTypesBase.Table{AbstractVector{ScientificTypesBase.Continuous}}
-    2:	Source @820 ⏎ AbstractVector{ScientificTypesBase.Multiclass{3}}
+    1:	Source @484 ⏎ ScientificTypesBase.Table{AbstractVector{ScientificTypesBase.Continuous}}
+    2:	Source @180 ⏎ AbstractVector{ScientificTypesBase.Multiclass{3}}
 

Let's evaluate the training loss and validation accuracy

training_loss = cross_entropy(predict(mach, X_train), y_train)
0.4392339631006042
val_acc = accuracy(predict_mode(mach, X_test), y_test)
0.9

Poor performance it seems.

Incremental Training

Now let's train it for another 30 epochs at half the original learning rate. All we need to do is changes these hyperparameters and call fit again. It won't reset the model parameters before training.

clf.optimiser = Optimisers.Adam(clf.optimiser.eta/2)
 clf.epochs = clf.epochs + 30
 fit!(mach, verbosity=2);
[ Info: Updating machine(NeuralNetworkClassifier(builder = MLP(hidden = (5, 4), …), …), …).
@@ -69,4 +69,4 @@
 [ Info: Loss is 0.1353
 [ Info: Loss is 0.1251
 [ Info: Loss is 0.1173
-[ Info: Loss is 0.1102

Let's evaluate the training loss and validation accuracy

training_loss = cross_entropy(predict(mach, X_train), y_train)
0.10519664737051289
training_acc = accuracy(predict_mode(mach, X_test), y_test)
0.9666666666666667

That's much better. If we are rather interested in resetting the model parameters before fitting, we can do fit(mach, force=true).


This page was generated using Literate.jl.

+[ Info: Loss is 0.1102

Let's evaluate the training loss and validation accuracy

training_loss = cross_entropy(predict(mach, X_train), y_train)
0.10519664737051289
training_acc = accuracy(predict_mode(mach, X_test), y_test)
0.9666666666666667

That's much better. If we are rather interested in resetting the model parameters before fitting, we can do fit(mach, force=true).


This page was generated using Literate.jl.

diff --git a/dev/common_workflows/live_training/README/index.html b/dev/common_workflows/live_training/README/index.html index 7d4f5a7..52d7ef8 100644 --- a/dev/common_workflows/live_training/README/index.html +++ b/dev/common_workflows/live_training/README/index.html @@ -1,2 +1,2 @@ -Contents · MLJFlux

Contents

filedescription
notebook.ipynbJuptyer notebook (executed)
notebook.unexecuted.ipynbJupyter notebook (unexecuted)
notebook.mdstatic markdown (included in MLJFlux.jl docs)
notebook.jlexecutable Julia script annotated with comments
generate.jlmaintainers only: execute to generate first 3 from 4th

Important

Scripts or notebooks in this folder cannot be reliably executed without the accompanying Manifest.toml and Project.toml files.

+Contents · MLJFlux

Contents

filedescription
notebook.ipynbJuptyer notebook (executed)
notebook.unexecuted.ipynbJupyter notebook (unexecuted)
notebook.mdstatic markdown (included in MLJFlux.jl docs)
notebook.jlexecutable Julia script annotated with comments
generate.jlmaintainers only: execute to generate first 3 from 4th

Important

Scripts or notebooks in this folder cannot be reliably executed without the accompanying Manifest.toml and Project.toml files.

diff --git a/dev/common_workflows/live_training/notebook/index.html b/dev/common_workflows/live_training/notebook/index.html index c7a8570..7a87155 100644 --- a/dev/common_workflows/live_training/notebook/index.html +++ b/dev/common_workflows/live_training/notebook/index.html @@ -78,6 +78,6 @@ fit!(mach, force=true)
trained Machine; does not cache data
   model: ProbabilisticIteratedModel(model = NeuralNetworkClassifier(builder = MLP(hidden = (5, 4), …), …), …)
   args: 
-    1:	Source @583 ⏎ ScientificTypesBase.Table{AbstractVector{ScientificTypesBase.Continuous}}
-    2:	Source @848 ⏎ AbstractVector{ScientificTypesBase.Multiclass{3}}
-

This page was generated using Literate.jl.

+ 1: Source @725 ⏎ ScientificTypesBase.Table{AbstractVector{ScientificTypesBase.Continuous}} + 2: Source @489 ⏎ AbstractVector{ScientificTypesBase.Multiclass{3}} +

This page was generated using Literate.jl.

diff --git a/dev/contributing/index.html b/dev/contributing/index.html index 03141ae..ca166b1 100644 --- a/dev/contributing/index.html +++ b/dev/contributing/index.html @@ -1,2 +1,2 @@ -Contributing · MLJFlux

Adding new models to MLJFlux

This section assumes familiarity with the MLJ model API

If one subtypes a new model type as either MLJFlux.MLJFluxProbabilistic or MLJFlux.MLJFluxDeterministic, then instead of defining new methods for MLJModelInterface.fit and MLJModelInterface.update one can make use of fallbacks by implementing the lower level methods shape, build, and fitresult. See the classifier source code for an example.

One still needs to implement a new predict method.

+Contributing · MLJFlux

Adding new models to MLJFlux

This section assumes familiarity with the MLJ model API

If one subtypes a new model type as either MLJFlux.MLJFluxProbabilistic or MLJFlux.MLJFluxDeterministic, then instead of defining new methods for MLJModelInterface.fit and MLJModelInterface.update one can make use of fallbacks by implementing the lower level methods shape, build, and fitresult. See the classifier source code for an example.

One still needs to implement a new predict method.

diff --git a/dev/extended_examples/Boston/index.html b/dev/extended_examples/Boston/index.html index 1a1c0cd..891119b 100644 --- a/dev/extended_examples/Boston/index.html +++ b/dev/extended_examples/Boston/index.html @@ -1,2 +1,2 @@ -- · MLJFlux
+- · MLJFlux
diff --git a/dev/extended_examples/MNIST/README/index.html b/dev/extended_examples/MNIST/README/index.html index e2d7255..ac3cef1 100644 --- a/dev/extended_examples/MNIST/README/index.html +++ b/dev/extended_examples/MNIST/README/index.html @@ -1,2 +1,2 @@ -Contents · MLJFlux

Contents

filedescription
notebook.ipynbJuptyer notebook (executed)
notebook.unexecuted.ipynbJupyter notebook (unexecuted)
notebook.mdstatic markdown (included in MLJFlux.jl docs)
notebook.jlexecutable Julia script annotated with comments
generate.jlmaintainers only: execute to generate first 3 from 4th

Important

Scripts or notebooks in this folder cannot be reliably executed without the accompanying Manifest.toml and Project.toml files.

+Contents · MLJFlux

Contents

filedescription
notebook.ipynbJuptyer notebook (executed)
notebook.unexecuted.ipynbJupyter notebook (unexecuted)
notebook.mdstatic markdown (included in MLJFlux.jl docs)
notebook.jlexecutable Julia script annotated with comments
generate.jlmaintainers only: execute to generate first 3 from 4th

Important

Scripts or notebooks in this folder cannot be reliably executed without the accompanying Manifest.toml and Project.toml files.

diff --git a/dev/extended_examples/MNIST/notebook/ef67068c.svg b/dev/extended_examples/MNIST/notebook/ad20afb5.svg similarity index 85% rename from dev/extended_examples/MNIST/notebook/ef67068c.svg rename to dev/extended_examples/MNIST/notebook/ad20afb5.svg index 7c3016d..ad3b43f 100644 --- a/dev/extended_examples/MNIST/notebook/ef67068c.svg +++ b/dev/extended_examples/MNIST/notebook/ad20afb5.svg @@ -1,50 +1,50 @@ - + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/dev/extended_examples/MNIST/notebook/dc63cda3.svg b/dev/extended_examples/MNIST/notebook/d68b7b1e.svg similarity index 85% rename from dev/extended_examples/MNIST/notebook/dc63cda3.svg rename to dev/extended_examples/MNIST/notebook/d68b7b1e.svg index 2e33064..901e6d2 100644 --- a/dev/extended_examples/MNIST/notebook/dc63cda3.svg +++ b/dev/extended_examples/MNIST/notebook/d68b7b1e.svg @@ -1,62 +1,62 @@ - + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/dev/extended_examples/MNIST/notebook/index.html b/dev/extended_examples/MNIST/notebook/index.html index 90b4553..fc8a6fd 100644 --- a/dev/extended_examples/MNIST/notebook/index.html +++ b/dev/extended_examples/MNIST/notebook/index.html @@ -78,7 +78,7 @@ 0.055122323 0.057923194

Adding 20 more epochs:

clf.epochs = clf.epochs + 20
 fit!(mach, rows=train);
[ Info: Updating machine(ImageClassifier(builder = Main.MyConvBuilder(3, 16, 32, 32), …), …).
-
Optimising neural net:  10%[==>                      ]  ETA: 0:00:07
Optimising neural net:  14%[===>                     ]  ETA: 0:00:08
Optimising neural net:  19%[====>                    ]  ETA: 0:00:08
Optimising neural net:  24%[=====>                   ]  ETA: 0:00:08
Optimising neural net:  29%[=======>                 ]  ETA: 0:00:07
Optimising neural net:  33%[========>                ]  ETA: 0:00:07
Optimising neural net:  38%[=========>               ]  ETA: 0:00:06
Optimising neural net:  43%[==========>              ]  ETA: 0:00:06
Optimising neural net:  48%[===========>             ]  ETA: 0:00:05
Optimising neural net:  52%[=============>           ]  ETA: 0:00:05
Optimising neural net:  57%[==============>          ]  ETA: 0:00:05
Optimising neural net:  62%[===============>         ]  ETA: 0:00:04
Optimising neural net:  67%[================>        ]  ETA: 0:00:04
Optimising neural net:  71%[=================>       ]  ETA: 0:00:03
Optimising neural net:  76%[===================>     ]  ETA: 0:00:03
Optimising neural net:  81%[====================>    ]  ETA: 0:00:02
Optimising neural net:  86%[=====================>   ]  ETA: 0:00:02
Optimising neural net:  90%[======================>  ]  ETA: 0:00:01
Optimising neural net:  95%[=======================> ]  ETA: 0:00:01
Optimising neural net: 100%[=========================] Time: 0:00:10

Computing an out-of-sample estimate of the loss:

predicted_labels = predict(mach, rows=test);
+
Optimising neural net:  10%[==>                      ]  ETA: 0:00:08
Optimising neural net:  14%[===>                     ]  ETA: 0:00:11
Optimising neural net:  19%[====>                    ]  ETA: 0:00:11
Optimising neural net:  24%[=====>                   ]  ETA: 0:00:11
Optimising neural net:  29%[=======>                 ]  ETA: 0:00:11
Optimising neural net:  33%[========>                ]  ETA: 0:00:10
Optimising neural net:  38%[=========>               ]  ETA: 0:00:09
Optimising neural net:  43%[==========>              ]  ETA: 0:00:08
Optimising neural net:  48%[===========>             ]  ETA: 0:00:08
Optimising neural net:  52%[=============>           ]  ETA: 0:00:07
Optimising neural net:  57%[==============>          ]  ETA: 0:00:06
Optimising neural net:  62%[===============>         ]  ETA: 0:00:06
Optimising neural net:  67%[================>        ]  ETA: 0:00:05
Optimising neural net:  71%[=================>       ]  ETA: 0:00:04
Optimising neural net:  76%[===================>     ]  ETA: 0:00:03
Optimising neural net:  81%[====================>    ]  ETA: 0:00:03
Optimising neural net:  86%[=====================>   ]  ETA: 0:00:02
Optimising neural net:  90%[======================>  ]  ETA: 0:00:01
Optimising neural net:  95%[=======================> ]  ETA: 0:00:01
Optimising neural net: 100%[=========================] Time: 0:00:14

Computing an out-of-sample estimate of the loss:

predicted_labels = predict(mach, rows=test);
 cross_entropy(predicted_labels, labels[test])
0.4883231265583621

Or to fit and predict, in one line:

evaluate!(mach,
           resampling=Holdout(fraction_train=0.5),
           measure=cross_entropy,
@@ -183,7 +183,7 @@
     parameter_means2,
     title="Flux parameter mean weights",
     xlab = "epoch",
-)
Example block output

Note. The higher the number in the plot legend, the deeper the layer we are **weight-averaging.

savefig(joinpath(tempdir(), "weights.png"))
"/tmp/weights.png"

Retrieving a snapshot for a prediction:

mach2 = machine(joinpath(tempdir(), "mnist3.jls"))
+)
Example block output

Note. The higher the number in the plot legend, the deeper the layer we are **weight-averaging.

savefig(joinpath(tempdir(), "weights.png"))
"/tmp/weights.png"

Retrieving a snapshot for a prediction:

mach2 = machine(joinpath(tempdir(), "mnist3.jls"))
 predict_mode(mach2, images[501:503])
3-element CategoricalArrays.CategoricalArray{Int64,1,UInt32}:
  7
  9
@@ -197,4 +197,4 @@
     ylab = "cross entropy",
     label="out-of-sample",
 )
-plot!(epochs, training_losses, label="training")
Example block output

This page was generated using Literate.jl.

+plot!(epochs, training_losses, label="training")Example block output

This page was generated using Literate.jl.

diff --git a/dev/extended_examples/spam_detection/README/index.html b/dev/extended_examples/spam_detection/README/index.html index 7c515b9..bc32b38 100644 --- a/dev/extended_examples/spam_detection/README/index.html +++ b/dev/extended_examples/spam_detection/README/index.html @@ -1,2 +1,2 @@ -Contents · MLJFlux

Contents

filedescription
notebook.ipynbJuptyer notebook (executed)
notebook.unexecuted.ipynbJupyter notebook (unexecuted)
notebook.mdstatic markdown (included in MLJFlux.jl docs)
notebook.jlexecutable Julia script annotated with comments
generate.jlmaintainers only: execute to generate first 3 from 4th

Important

Scripts or notebooks in this folder cannot be reliably executed without the accompanying Manifest.toml and Project.toml files.

+Contents · MLJFlux

Contents

filedescription
notebook.ipynbJuptyer notebook (executed)
notebook.unexecuted.ipynbJupyter notebook (unexecuted)
notebook.mdstatic markdown (included in MLJFlux.jl docs)
notebook.jlexecutable Julia script annotated with comments
generate.jlmaintainers only: execute to generate first 3 from 4th

Important

Scripts or notebooks in this folder cannot be reliably executed without the accompanying Manifest.toml and Project.toml files.

diff --git a/dev/extended_examples/spam_detection/notebook/index.html b/dev/extended_examples/spam_detection/notebook/index.html index 4c129b3..7867937 100644 --- a/dev/extended_examples/spam_detection/notebook/index.html +++ b/dev/extended_examples/spam_detection/notebook/index.html @@ -93,13 +93,13 @@ mach = machine(clf, x_train_processed_equalized_fixed, y_train)
untrained Machine; caches model-specific representations of data
   model: NeuralNetworkClassifier(builder = GenericBuilder(apply = #15), …)
   args: 
-    1:	Source @120 ⏎ AbstractMatrix{ScientificTypesBase.Continuous}
-    2:	Source @800 ⏎ AbstractVector{ScientificTypesBase.Multiclass{2}}
+    1:	Source @931 ⏎ AbstractMatrix{ScientificTypesBase.Continuous}
+    2:	Source @417 ⏎ AbstractVector{ScientificTypesBase.Multiclass{2}}
 

Train the Model

fit!(mach)
trained Machine; caches model-specific representations of data
   model: NeuralNetworkClassifier(builder = GenericBuilder(apply = #15), …)
   args: 
-    1:	Source @120 ⏎ AbstractMatrix{ScientificTypesBase.Continuous}
-    2:	Source @800 ⏎ AbstractVector{ScientificTypesBase.Multiclass{2}}
+    1:	Source @931 ⏎ AbstractMatrix{ScientificTypesBase.Continuous}
+    2:	Source @417 ⏎ AbstractVector{ScientificTypesBase.Multiclass{2}}
 

Evaluate the Model

ŷ = predict_mode(mach, x_val_processed_equalized_fixed)
 balanced_accuracy(ŷ, y_val)
0.8840999384477648

Acceptable performance. Let's see some live examples:

using Random: Random;
 Random.seed!(99);
@@ -112,4 +112,4 @@
 z_encoded_equalized_fixed = coerce(z_encoded_equalized_fixed, Continuous)
 z_pred = predict_mode(mach, z_encoded_equalized_fixed)
 
-print("SMS: `$(z)` and the prediction is `$(z_pred)`")
SMS: `Hi elaine, is today's meeting confirmed?` and the prediction is `CategoricalArrays.CategoricalValue{InlineStrings.String7, UInt32}[InlineStrings.String7("ham")]`

This page was generated using Literate.jl.

+print("SMS: `$(z)` and the prediction is `$(z_pred)`")
SMS: `Hi elaine, is today's meeting confirmed?` and the prediction is `CategoricalArrays.CategoricalValue{InlineStrings.String7, UInt32}[InlineStrings.String7("ham")]`

This page was generated using Literate.jl.

diff --git a/dev/index.html b/dev/index.html index 5079321..bc0ffba 100644 --- a/dev/index.html +++ b/dev/index.html @@ -40,4 +40,4 @@ ├─────────────────────────────┼─────────┤ │ [1.0, 1.0, 0.967, 0.9, 1.0] │ 0.0426 │ └─────────────────────────────┴─────────┘ -

As you can see we are able to use MLJ meta-functionality (i.e., cross validation) with a Flux deep learning model. All arguments provided have defaults.

Notice that we are also able to define the neural network in a high-level fashion by only specifying the number of neurons in each hidden layer and the activation function. Meanwhile, MLJFlux is able to infer the input and output layer as well as use a suitable default for the loss function and output activation given the classification task. Notice as well that we did not need to manually implement a training or prediction loop.

Basic idea: "builders" for data-dependent architecture

As in the example above, any MLJFlux model has a builder hyperparameter, an object encoding instructions for creating a neural network given the data that the model eventually sees (e.g., the number of classes in a classification problem). While each MLJ model has a simple default builder, users may need to define custom builders to get optimal results (see Defining Custom Builders and this will require familiarity with the Flux API for defining a neural network chain.

Flux or MLJFlux?

Flux is a deep learning framework in Julia that comes with everything you need to build deep learning models (i.e., GPU support, automatic differentiation, layers, activations, losses, optimizers, etc.). MLJFlux wraps models built with Flux which provides a more high-level interface for building and training such models. More importantly, it empowers Flux models by extending their support to many common machine learning workflows that are possible via MLJ such as:

  • Estimating performance of your model using a holdout set or other resampling strategy (e.g., cross-validation) as measured by one or more metrics (e.g., loss functions) that may not have been used in training

  • Optimizing hyper-parameters such as a regularization parameter (e.g., dropout) or a width/height/nchannnels of convolution layer

  • Compose with other models such as introducing data pre-processing steps (e.g., missing data imputation) into a pipeline. It might make sense to include non-deep learning models in this pipeline. Other kinds of model composition could include blending predictions of a deep learner with some other kind of model (as in “model stacking”). Models composed with MLJ can be also tuned as a single unit.

  • Controlling iteration by adding an early stopping criterion based on an out-of-sample estimate of the loss, dynamically changing the learning rate (eg, cyclic learning rates), periodically save snapshots of the model, generate live plots of sample weights to judge training progress (as in tensor board)

  • Comparing your model with a non-deep learning models

A comparable project, FastAI/FluxTraining, also provides a high-level interface for interacting with Flux models and supports a set of features that may overlap with (but not include all of) those supported by MLJFlux.

Many of the features mentioned above are showcased in the workflow examples that you can access from the sidebar.

+

As you can see we are able to use MLJ meta-functionality (i.e., cross validation) with a Flux deep learning model. All arguments provided have defaults.

Notice that we are also able to define the neural network in a high-level fashion by only specifying the number of neurons in each hidden layer and the activation function. Meanwhile, MLJFlux is able to infer the input and output layer as well as use a suitable default for the loss function and output activation given the classification task. Notice as well that we did not need to manually implement a training or prediction loop.

Basic idea: "builders" for data-dependent architecture

As in the example above, any MLJFlux model has a builder hyperparameter, an object encoding instructions for creating a neural network given the data that the model eventually sees (e.g., the number of classes in a classification problem). While each MLJ model has a simple default builder, users may need to define custom builders to get optimal results (see Defining Custom Builders and this will require familiarity with the Flux API for defining a neural network chain.

Flux or MLJFlux?

Flux is a deep learning framework in Julia that comes with everything you need to build deep learning models (i.e., GPU support, automatic differentiation, layers, activations, losses, optimizers, etc.). MLJFlux wraps models built with Flux which provides a more high-level interface for building and training such models. More importantly, it empowers Flux models by extending their support to many common machine learning workflows that are possible via MLJ such as:

  • Estimating performance of your model using a holdout set or other resampling strategy (e.g., cross-validation) as measured by one or more metrics (e.g., loss functions) that may not have been used in training

  • Optimizing hyper-parameters such as a regularization parameter (e.g., dropout) or a width/height/nchannnels of convolution layer

  • Compose with other models such as introducing data pre-processing steps (e.g., missing data imputation) into a pipeline. It might make sense to include non-deep learning models in this pipeline. Other kinds of model composition could include blending predictions of a deep learner with some other kind of model (as in “model stacking”). Models composed with MLJ can be also tuned as a single unit.

  • Controlling iteration by adding an early stopping criterion based on an out-of-sample estimate of the loss, dynamically changing the learning rate (eg, cyclic learning rates), periodically save snapshots of the model, generate live plots of sample weights to judge training progress (as in tensor board)

  • Comparing your model with a non-deep learning models

A comparable project, FastAI/FluxTraining, also provides a high-level interface for interacting with Flux models and supports a set of features that may overlap with (but not include all of) those supported by MLJFlux.

Many of the features mentioned above are showcased in the workflow examples that you can access from the sidebar.

diff --git a/dev/interface/Builders/index.html b/dev/interface/Builders/index.html index e545b01..7d50c28 100644 --- a/dev/interface/Builders/index.html +++ b/dev/interface/Builders/index.html @@ -1,5 +1,5 @@ -Builders · MLJFlux
MLJFlux.LinearType
Linear(; σ=Flux.relu)

MLJFlux builder that constructs a fully connected two layer network with activation function σ. The number of input and output nodes is determined from the data. Weights are initialized using Flux.glorot_uniform(rng), where rng is inferred from the rng field of the MLJFlux model.

source
MLJFlux.ShortType
Short(; n_hidden=0, dropout=0.5, σ=Flux.sigmoid)

MLJFlux builder that constructs a full-connected three-layer network using n_hidden nodes in the hidden layer and the specified dropout (defaulting to 0.5). An activation function σ is applied between the hidden and final layers. If n_hidden=0 (the default) then n_hidden is the geometric mean of the number of input and output nodes. The number of input and output nodes is determined from the data.

Each layer is initialized using Flux.glorot_uniform(rng), where rng is inferred from the rng field of the MLJFlux model.

source
MLJFlux.MLPType
MLP(; hidden=(100,), σ=Flux.relu)

MLJFlux builder that constructs a Multi-layer perceptron network. The ith element of hidden represents the number of neurons in the ith hidden layer. An activation function σ is applied between each layer.

Each layer is initialized using Flux.glorot_uniform(rng), where rng is inferred from the rng field of the MLJFlux model.

source
MLJFlux.@builderMacro
@builder neural_net

Creates a builder for neural_net. The variables rng, n_in, n_out and n_channels can be used to create builders for any random number generator rng, input and output sizes n_in and n_out and number of input channels n_channels.

Examples

julia> import MLJFlux: @builder;
+Builders · MLJFlux
MLJFlux.LinearType
Linear(; σ=Flux.relu)

MLJFlux builder that constructs a fully connected two layer network with activation function σ. The number of input and output nodes is determined from the data. Weights are initialized using Flux.glorot_uniform(rng), where rng is inferred from the rng field of the MLJFlux model.

source
MLJFlux.ShortType
Short(; n_hidden=0, dropout=0.5, σ=Flux.sigmoid)

MLJFlux builder that constructs a full-connected three-layer network using n_hidden nodes in the hidden layer and the specified dropout (defaulting to 0.5). An activation function σ is applied between the hidden and final layers. If n_hidden=0 (the default) then n_hidden is the geometric mean of the number of input and output nodes. The number of input and output nodes is determined from the data.

Each layer is initialized using Flux.glorot_uniform(rng), where rng is inferred from the rng field of the MLJFlux model.

source
MLJFlux.MLPType
MLP(; hidden=(100,), σ=Flux.relu)

MLJFlux builder that constructs a Multi-layer perceptron network. The ith element of hidden represents the number of neurons in the ith hidden layer. An activation function σ is applied between each layer.

Each layer is initialized using Flux.glorot_uniform(rng), where rng is inferred from the rng field of the MLJFlux model.

source
MLJFlux.@builderMacro
@builder neural_net

Creates a builder for neural_net. The variables rng, n_in, n_out and n_channels can be used to create builders for any random number generator rng, input and output sizes n_in and n_out and number of input channels n_channels.

Examples

julia> import MLJFlux: @builder;
 
 julia> nn = NeuralNetworkRegressor(builder = @builder(Chain(Dense(n_in, 64, relu),
                                                             Dense(64, 32, relu),
@@ -11,4 +11,4 @@
            Chain(front, Dense(d, n_out));
        end
 
-julia> conv_nn = NeuralNetworkRegressor(builder = conv_builder);
source
+julia> conv_nn = NeuralNetworkRegressor(builder = conv_builder);
source
diff --git a/dev/interface/Classification/index.html b/dev/interface/Classification/index.html index 542f2a2..6549c8d 100644 --- a/dev/interface/Classification/index.html +++ b/dev/interface/Classification/index.html @@ -20,7 +20,7 @@ xlab=curve.parameter_name, xscale=curve.parameter_scale, ylab = "Cross Entropy") -

See also ImageClassifier, NeuralNetworkBinaryClassifier.

source
MLJFlux.NeuralNetworkBinaryClassifierType
NeuralNetworkBinaryClassifier

A model type for constructing a neural network binary classifier, based on MLJFlux.jl, and implementing the MLJ model interface.

From MLJ, the type can be imported using

NeuralNetworkBinaryClassifier = @load NeuralNetworkBinaryClassifier pkg=MLJFlux

Do model = NeuralNetworkBinaryClassifier() to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in NeuralNetworkBinaryClassifier(builder=...).

NeuralNetworkBinaryClassifier is for training a data-dependent Flux.jl neural network for making probabilistic predictions of a binary (Multiclass{2} or OrderedFactor{2}) target, given a table of Continuous features. Users provide a recipe for constructing the network, based on properties of the data that is encountered, by specifying an appropriate builder. See MLJFlux documentation for more on builders.

In addition to features with Continuous scientific element type, this model supports categorical features in the input table. If present, such features are embedded into dense vectors by the use of an additional EntityEmbedder layer after the input, as described in Entity Embeddings of Categorical Variables by Cheng Guo, Felix Berkhahn arXiv, 2016.

Training data

In MLJ or MLJBase, bind an instance model to data with

mach = machine(model, X, y)

Here:

  • X provides input features and is either: (i) a Matrix with Continuous element scitype (typically Float32); or (ii) a table of input features (eg, a DataFrame) whose columns have Continuous, Multiclass or OrderedFactor element scitype; check column scitypes with schema(X). If any Multiclass or OrderedFactor features appear, the constructed network will use an EntityEmbedder layer to transform them into dense vectors. If X is a Matrix, it is assumed that columns correspond to features and rows corresponding to observations.
  • y is the target, which can be any AbstractVector whose element scitype is Multiclass{2} or OrderedFactor{2}; check the scitype with scitype(y)

Train the machine with fit!(mach, rows=...).

Hyper-parameters

  • builder=MLJFlux.Short(): An MLJFlux builder that constructs a neural network. Possible builders include: MLJFlux.Linear, MLJFlux.Short, and MLJFlux.MLP. See MLJFlux.jl documentation for examples of user-defined builders. See also finaliser below.

  • optimiser::Flux.Adam(): A Flux.Optimise optimiser. The optimiser performs the updating of the weights of the network. For further reference, see the Flux optimiser documentation. To choose a learning rate (the update rate of the optimizer), a good rule of thumb is to start out at 10e-3, and tune using powers of 10 between 1 and 1e-7.

  • loss=Flux.binarycrossentropy: The loss function which the network will optimize. Should be a function which can be called in the form loss(yhat, y). Possible loss functions are listed in the Flux loss function documentation. For a classification task, the most natural loss functions are:

    • Flux.binarycrossentropy: Standard binary classification loss, also known as the log loss.

    • Flux.logitbinarycrossentropy: Mathematically equal to crossentropy, but numerically more stable than finalising the outputs with σ and then calculating crossentropy. You will need to specify finaliser=identity to remove MLJFlux's default sigmoid finaliser, and understand that the output of predict is then unnormalized (no longer probabilistic).

    • Flux.tversky_loss: Used with imbalanced data to give more weight to false negatives.

    • Flux.binary_focal_loss: Used with highly imbalanced data. Weights harder examples more than easier examples.

    Currently MLJ measures are not supported values of loss.

  • epochs::Int=10: The duration of training, in epochs. Typically, one epoch represents one pass through the complete the training dataset.

  • batch_size::int=1: the batch size to be used for training, representing the number of samples per update of the network weights. Typically, batch size is between 8 and 512. Increassing batch size may accelerate training if acceleration=CUDALibs() and a GPU is available.

  • lambda::Float64=0: The strength of the weight regularization penalty. Can be any value in the range [0, ∞).

  • alpha::Float64=0: The L2/L1 mix of regularization, in the range [0, 1]. A value of 0 represents L2 regularization, and a value of 1 represents L1 regularization.

  • rng::Union{AbstractRNG, Int64}: The random number generator or seed used during training.

  • optimizer_changes_trigger_retraining::Bool=false: Defines what happens when re-fitting a machine if the associated optimiser has changed. If true, the associated machine will retrain from scratch on fit! call, otherwise it will not.

  • acceleration::AbstractResource=CPU1(): Defines on what hardware training is done. For Training on GPU, use CUDALibs().

  • finaliser=Flux.σ: The final activation function of the neural network (applied after the network defined by builder). Defaults to Flux.σ.

  • embedding_dims: a Dict whose keys are names of categorical features, given as symbols, and whose values are numbers representing the desired dimensionality of the entity embeddings of such features: an integer value of 7, say, sets the embedding dimensionality to 7; a float value of 0.5, say, sets the embedding dimensionality to ceil(0.5 * c), where c is the number of feature levels. Unspecified feature dimensionality defaults to min(c - 1, 10).

Operations

  • predict(mach, Xnew): return predictions of the target given new features Xnew, which should have the same scitype as X above. Predictions are probabilistic but uncalibrated.

  • predict_mode(mach, Xnew): Return the modes of the probabilistic predictions returned above.

  • transform(mach, Xnew): Assuming Xnew has the same schema as X, transform the categorical features of Xnew into dense Continuous vectors using the MLJFlux.EntityEmbedder layer present in the network. Does nothing in case the model was trained on an input X that lacks categorical features.

Fitted parameters

The fields of fitted_params(mach) are:

  • chain: The trained "chain" (Flux.jl model), namely the series of layers, functions, and activations which make up the neural network. This includes the final layer specified by finaliser (eg, softmax).

Report

The fields of report(mach) are:

  • training_losses: A vector of training losses (penalised if lambda != 0) in historical order, of length epochs + 1. The first element is the pre-training loss.

Examples

In this example we build a classification model using the Iris dataset. This is a very basic example, using a default builder and no standardization. For a more advanced illustration, see NeuralNetworkRegressor or ImageClassifier, and examples in the MLJFlux.jl documentation.

using MLJ, Flux
+

See also ImageClassifier, NeuralNetworkBinaryClassifier.

source
MLJFlux.NeuralNetworkBinaryClassifierType
NeuralNetworkBinaryClassifier

A model type for constructing a neural network binary classifier, based on MLJFlux.jl, and implementing the MLJ model interface.

From MLJ, the type can be imported using

NeuralNetworkBinaryClassifier = @load NeuralNetworkBinaryClassifier pkg=MLJFlux

Do model = NeuralNetworkBinaryClassifier() to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in NeuralNetworkBinaryClassifier(builder=...).

NeuralNetworkBinaryClassifier is for training a data-dependent Flux.jl neural network for making probabilistic predictions of a binary (Multiclass{2} or OrderedFactor{2}) target, given a table of Continuous features. Users provide a recipe for constructing the network, based on properties of the data that is encountered, by specifying an appropriate builder. See MLJFlux documentation for more on builders.

In addition to features with Continuous scientific element type, this model supports categorical features in the input table. If present, such features are embedded into dense vectors by the use of an additional EntityEmbedder layer after the input, as described in Entity Embeddings of Categorical Variables by Cheng Guo, Felix Berkhahn arXiv, 2016.

Training data

In MLJ or MLJBase, bind an instance model to data with

mach = machine(model, X, y)

Here:

  • X provides input features and is either: (i) a Matrix with Continuous element scitype (typically Float32); or (ii) a table of input features (eg, a DataFrame) whose columns have Continuous, Multiclass or OrderedFactor element scitype; check column scitypes with schema(X). If any Multiclass or OrderedFactor features appear, the constructed network will use an EntityEmbedder layer to transform them into dense vectors. If X is a Matrix, it is assumed that columns correspond to features and rows corresponding to observations.
  • y is the target, which can be any AbstractVector whose element scitype is Multiclass{2} or OrderedFactor{2}; check the scitype with scitype(y)

Train the machine with fit!(mach, rows=...).

Hyper-parameters

  • builder=MLJFlux.Short(): An MLJFlux builder that constructs a neural network. Possible builders include: MLJFlux.Linear, MLJFlux.Short, and MLJFlux.MLP. See MLJFlux.jl documentation for examples of user-defined builders. See also finaliser below.

  • optimiser::Flux.Adam(): A Flux.Optimise optimiser. The optimiser performs the updating of the weights of the network. For further reference, see the Flux optimiser documentation. To choose a learning rate (the update rate of the optimizer), a good rule of thumb is to start out at 10e-3, and tune using powers of 10 between 1 and 1e-7.

  • loss=Flux.binarycrossentropy: The loss function which the network will optimize. Should be a function which can be called in the form loss(yhat, y). Possible loss functions are listed in the Flux loss function documentation. For a classification task, the most natural loss functions are:

    • Flux.binarycrossentropy: Standard binary classification loss, also known as the log loss.

    • Flux.logitbinarycrossentropy: Mathematically equal to crossentropy, but numerically more stable than finalising the outputs with σ and then calculating crossentropy. You will need to specify finaliser=identity to remove MLJFlux's default sigmoid finaliser, and understand that the output of predict is then unnormalized (no longer probabilistic).

    • Flux.tversky_loss: Used with imbalanced data to give more weight to false negatives.

    • Flux.binary_focal_loss: Used with highly imbalanced data. Weights harder examples more than easier examples.

    Currently MLJ measures are not supported values of loss.

  • epochs::Int=10: The duration of training, in epochs. Typically, one epoch represents one pass through the complete the training dataset.

  • batch_size::int=1: the batch size to be used for training, representing the number of samples per update of the network weights. Typically, batch size is between 8 and 512. Increassing batch size may accelerate training if acceleration=CUDALibs() and a GPU is available.

  • lambda::Float64=0: The strength of the weight regularization penalty. Can be any value in the range [0, ∞).

  • alpha::Float64=0: The L2/L1 mix of regularization, in the range [0, 1]. A value of 0 represents L2 regularization, and a value of 1 represents L1 regularization.

  • rng::Union{AbstractRNG, Int64}: The random number generator or seed used during training.

  • optimizer_changes_trigger_retraining::Bool=false: Defines what happens when re-fitting a machine if the associated optimiser has changed. If true, the associated machine will retrain from scratch on fit! call, otherwise it will not.

  • acceleration::AbstractResource=CPU1(): Defines on what hardware training is done. For Training on GPU, use CUDALibs().

  • finaliser=Flux.σ: The final activation function of the neural network (applied after the network defined by builder). Defaults to Flux.σ.

  • embedding_dims: a Dict whose keys are names of categorical features, given as symbols, and whose values are numbers representing the desired dimensionality of the entity embeddings of such features: an integer value of 7, say, sets the embedding dimensionality to 7; a float value of 0.5, say, sets the embedding dimensionality to ceil(0.5 * c), where c is the number of feature levels. Unspecified feature dimensionality defaults to min(c - 1, 10).

Operations

  • predict(mach, Xnew): return predictions of the target given new features Xnew, which should have the same scitype as X above. Predictions are probabilistic but uncalibrated.

  • predict_mode(mach, Xnew): Return the modes of the probabilistic predictions returned above.

  • transform(mach, Xnew): Assuming Xnew has the same schema as X, transform the categorical features of Xnew into dense Continuous vectors using the MLJFlux.EntityEmbedder layer present in the network. Does nothing in case the model was trained on an input X that lacks categorical features.

Fitted parameters

The fields of fitted_params(mach) are:

  • chain: The trained "chain" (Flux.jl model), namely the series of layers, functions, and activations which make up the neural network. This includes the final layer specified by finaliser (eg, softmax).

Report

The fields of report(mach) are:

  • training_losses: A vector of training losses (penalised if lambda != 0) in historical order, of length epochs + 1. The first element is the pre-training loss.

Examples

In this example we build a classification model using the Iris dataset. This is a very basic example, using a default builder and no standardization. For a more advanced illustration, see NeuralNetworkRegressor or ImageClassifier, and examples in the MLJFlux.jl documentation.

using MLJ, Flux
 import Optimisers
 import RDatasets

First, we can load the data:

mtcars = RDatasets.dataset("datasets", "mtcars");
 y, X = unpack(mtcars, ==(:VS), in([:MPG, :Cyl, :Disp, :HP, :WT, :QSec]));

Note that y is a vector and X a table.

y = categorical(y) # classifier takes catogorical input
@@ -48,4 +48,4 @@
    xscale=curve.parameter_scale,
    ylab = "Cross Entropy",
 )
-

See also ImageClassifier.

source
+

See also ImageClassifier.

source diff --git a/dev/interface/Custom Builders/index.html b/dev/interface/Custom Builders/index.html index 7609743..3c1e1a4 100644 --- a/dev/interface/Custom Builders/index.html +++ b/dev/interface/Custom Builders/index.html @@ -12,4 +12,4 @@ Dense(nn.n2, n_out, init=init), ) end

Note here that n_in and n_out depend on the size of the data (see Table 1).

For a concrete image classification example, see Using MLJ to classifiy the MNIST image dataset.

More generally, defining a new builder means defining a new struct sub-typing MLJFlux.Builder and defining a new MLJFlux.build method with one of these signatures:

MLJFlux.build(builder::MyBuilder, rng, n_in, n_out)
-MLJFlux.build(builder::MyBuilder, rng, n_in, n_out, n_channels) # for use with `ImageClassifier`

This method must return a Flux.Chain instance, chain, subject to the following conditions:

  • chain(x) must make sense:

    • for any x <: Array{<:AbstractFloat, 2} of size (n_in, batch_size) where batch_size is any integer (for all models except ImageClassifier); or
    • for any x <: Array{<:Float32, 4} of size (W, H, n_channels, batch_size), where (W, H) = n_in, n_channels is 1 or 3, and batch_size is any integer (for use with ImageClassifier)
  • The object returned by chain(x) must be an AbstractFloat vector of length n_out.

Alternatively, use MLJFlux.@builder(neural_net) to automatically create a builder for any valid Flux chain expression neural_net, where the symbols n_in, n_out, n_channels and rng can appear literally, with the interpretations explained above. For example,

builder = MLJFlux.@builder Chain(Dense(n_in, 128), Dense(128, n_out, tanh))
+MLJFlux.build(builder::MyBuilder, rng, n_in, n_out, n_channels) # for use with `ImageClassifier`

This method must return a Flux.Chain instance, chain, subject to the following conditions:

  • chain(x) must make sense:

    • for any x <: Array{<:AbstractFloat, 2} of size (n_in, batch_size) where batch_size is any integer (for all models except ImageClassifier); or
    • for any x <: Array{<:Float32, 4} of size (W, H, n_channels, batch_size), where (W, H) = n_in, n_channels is 1 or 3, and batch_size is any integer (for use with ImageClassifier)
  • The object returned by chain(x) must be an AbstractFloat vector of length n_out.

Alternatively, use MLJFlux.@builder(neural_net) to automatically create a builder for any valid Flux chain expression neural_net, where the symbols n_in, n_out, n_channels and rng can appear literally, with the interpretations explained above. For example,

builder = MLJFlux.@builder Chain(Dense(n_in, 128), Dense(128, n_out, tanh))
diff --git a/dev/interface/Image Classification/index.html b/dev/interface/Image Classification/index.html index 503396c..b1ec341 100644 --- a/dev/interface/Image Classification/index.html +++ b/dev/interface/Image Classification/index.html @@ -46,4 +46,4 @@ resampling=Holdout(fraction_train=0.5), measure=cross_entropy, rows=1:1000, - verbosity=0)

See also NeuralNetworkClassifier.

source + verbosity=0)

See also NeuralNetworkClassifier.

source diff --git a/dev/interface/Multitarget Regression/index.html b/dev/interface/Multitarget Regression/index.html index 84fb61c..522b601 100644 --- a/dev/interface/Multitarget Regression/index.html +++ b/dev/interface/Multitarget Regression/index.html @@ -23,4 +23,4 @@ # loss for `(Xtest, test)`: fit!(mach) # trains on all data `(X, y)` yhat = predict(mach, Xtest) -multitarget_l2(yhat, ytest)

See also NeuralNetworkRegressor

source +multitarget_l2(yhat, ytest)

See also NeuralNetworkRegressor

source diff --git a/dev/interface/Regression/index.html b/dev/interface/Regression/index.html index ae6d71c..71a30a5 100644 --- a/dev/interface/Regression/index.html +++ b/dev/interface/Regression/index.html @@ -43,4 +43,4 @@ # loss for `(Xtest, test)`: fit!(mach) # train on `(X, y)` yhat = predict(mach, Xtest) -l2(yhat, ytest)

These losses, for the pipeline model, refer to the target on the original, unstandardized, scale.

For implementing stopping criterion and other iteration controls, refer to examples linked from the MLJFlux documentation.

See also MultitargetNeuralNetworkRegressor

source +l2(yhat, ytest)

These losses, for the pipeline model, refer to the target on the original, unstandardized, scale.

For implementing stopping criterion and other iteration controls, refer to examples linked from the MLJFlux documentation.

See also MultitargetNeuralNetworkRegressor

source diff --git a/dev/interface/Summary/index.html b/dev/interface/Summary/index.html index 5c32eaf..8533fc5 100644 --- a/dev/interface/Summary/index.html +++ b/dev/interface/Summary/index.html @@ -2,4 +2,4 @@ Summary · MLJFlux

Models

MLJFlux provides the model types below, for use with input features X and targets y of the scientific type indicated in the table below. The parameters n_in, n_out and n_channels refer to information passed to the builder, as described under Defining Custom Builders.

Model TypePrediction typescitype(X) <: _scitype(y) <: _
NeuralNetworkRegressorDeterministicAbstractMatrix{Continuous} or Table(Continuous) with n_in columnsAbstractVector{<:Continuous) (n_out = 1)
MultitargetNeuralNetworkRegressorDeterministicAbstractMatrix{Continuous} or Table(Continuous) with n_in columns<: Table(Continuous) with n_out columns
NeuralNetworkClassifierProbabilisticAbstractMatrix{Continuous} or Table(Continuous) with n_in columnsAbstractVector{<:Finite} with n_out classes
NeuralNetworkBinaryClassifierProbabilisticAbstractMatrix{Continuous} or Table(Continuous) with n_in columnsAbstractVector{<:Finite{2}} (but n_out = 1)
ImageClassifierProbabilisticAbstractVector(<:Image{W,H}) with n_in = (W, H)AbstractVector{<:Finite} with n_out classes
What exactly is a "model"?

In MLJ a model is a mutable struct storing hyper-parameters for some learning algorithm indicated by the model name, and that's all. In particular, an MLJ model does not store learned parameters.

Difference in Definition

In Flux the term "model" has another meaning. However, as all Flux "models" used in MLJFLux are Flux.Chain objects, we call them chains, and restrict use of "model" to models in the MLJ sense.

Are oberservations rows or columns?

In MLJ the convention for two-dimensional data (tables and matrices) is rows=obervations. For matrices Flux has the opposite convention. If your data is a matrix with whose column index the observation index, then your optimal solution is to present the adjoint or transpose of your matrix to MLJFlux models. Otherwise, you can use the matrix as is, or transform one time with permutedims, and again present the adjoint or transpose as the optimal solution for MLJFlux training.

Instructions for coercing common image formats into some AbstractVector{<:Image} are here.

Fitting and warm restarts

MLJ machines cache state enabling the "warm restart" of model training, as demonstrated in the incremental training example. In the case of MLJFlux models, fit!(mach) will use a warm restart if:

  • only model.epochs has changed since the last call; or

  • only model.epochs or model.optimiser have changed since the last call and model.optimiser_changes_trigger_retraining == false (the default) (the "state" part of the optimiser is ignored in this comparison). This allows one to dynamically modify learning rates, for example.

Here model=mach.model is the associated MLJ model.

The warm restart feature makes it possible to externally control iteration. See, for example, Early Stopping with MLJFlux and Using MLJ to classifiy the MNIST image dataset.

Model Hyperparameters.

All models share the following hyper-parameters. See individual model docstrings for a full list.

Hyper-parameterDescriptionDefault
builderDefault builder for models.MLJFlux.Linear(σ=Flux.relu) (regressors) or MLJFlux.Short(n_hidden=0, dropout=0.5, σ=Flux.σ) (classifiers)
optimiserThe optimiser to use for training.Optimiser.Adam()
lossThe loss function used for training.Flux.mse (regressors) and Flux.crossentropy (classifiers)
n_epochsNumber of epochs to train for.10
batch_sizeThe batch size for the data.1
lambdaThe regularization strength. Range = [0, ∞).0
alphaThe L2/L1 mix of regularization. Range = [0, 1].0
rngThe random number generator (RNG) passed to builders, for weight initialization, for example. Can be any AbstractRNG or the seed (integer) for a Xoshirio that is reset on every cold restart of model (machine) training.GLOBAL_RNG
accelerationUse CUDALibs() for training on GPU; default is CPU1().CPU1()
optimiser_changes_trigger_retrainingTrue if fitting an associated machine should trigger retraining from scratch whenever the optimiser changes.false

The classifiers have an additional hyperparameter finaliser (default is Flux.softmax, or Flux.σ in the binary case) which is the operation applied to the unnormalized output of the final layer to obtain probabilities (outputs summing to one). It should return a vector of the same length as its input.

Loss Functions

Currently, the loss function specified by loss=... is applied internally by Flux and needs to conform to the Flux API. You cannot, for example, supply one of MLJ's probabilistic loss functions, such as MLJ.cross_entropy to one of the classifier constructors.

That said, you can only use MLJ loss functions or metrics in evaluation meta-algorithms (such as cross validation) and they will work even if the underlying model comes from MLJFlux.

More on accelerated training with GPUs

As in the table, when instantiating a model for training on a GPU, specify acceleration=CUDALibs(), as in

using MLJ
 ImageClassifier = @load ImageClassifier
 model = ImageClassifier(epochs=10, acceleration=CUDALibs())
-mach = machine(model, X, y) |> fit!

In this example, the data X, y is copied onto the GPU under the hood on the call to fit! and cached for use in any warm restart (see above). The Flux chain used in training is always copied back to the CPU at then conclusion of fit!, and made available as fitted_params(mach).

Builders

BuilderDescription
MLJFlux.MLP(hidden=(10,))General multi-layer perceptron
MLJFlux.Short(n_hidden=0, dropout=0.5, σ=sigmoid)Fully connected network with one hidden layer and dropout
MLJFlux.Linear(σ=relu)Vanilla linear network with no hidden layers and activation function σ
MLJFlux.@builderMacro for customized builders
+mach = machine(model, X, y) |> fit!

In this example, the data X, y is copied onto the GPU under the hood on the call to fit! and cached for use in any warm restart (see above). The Flux chain used in training is always copied back to the CPU at then conclusion of fit!, and made available as fitted_params(mach).

Builders

BuilderDescription
MLJFlux.MLP(hidden=(10,))General multi-layer perceptron
MLJFlux.Short(n_hidden=0, dropout=0.5, σ=sigmoid)Fully connected network with one hidden layer and dropout
MLJFlux.Linear(σ=relu)Vanilla linear network with no hidden layers and activation function σ
MLJFlux.@builderMacro for customized builders
diff --git a/dev/objects.inv b/dev/objects.inv index 6b37ad90d55cafb22f93919e06269006af4923c4..00580284f0457fe0917ea89c6d0247cb77c46949 100644 GIT binary patch delta 12 Tcmeyx`-^viJ)_}9hh1y{BEAH{ delta 12 Tcmeyx`-^viJ)^-!hh1y{BDn;>