Skip to content

Commit

Permalink
adapt api for MLJBase v1 compatibility
Browse files Browse the repository at this point in the history
  • Loading branch information
davnn committed Dec 24, 2023
1 parent 77528ee commit 99249ae
Show file tree
Hide file tree
Showing 14 changed files with 191 additions and 296 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -11,3 +11,4 @@
.history/
notebooks/
*.jld
.CondaPkg/
11 changes: 6 additions & 5 deletions Project.toml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
name = "OutlierDetection"
uuid = "262411bb-c475-4342-ba9e-03b8c0183ca6"
authors = ["David Muhr <muhrdavid+github@gmail.com> and contributors"]
version = "0.3.4"
version = "0.4.0"

[deps]
MLJBase = "a7f614a8-145f-11e9-1d2a-a57a1082229d"
Expand All @@ -10,14 +10,15 @@ SpecialFunctions = "276daf66-3868-5448-9aa4-cd146d93841b"
Statistics = "10745b16-79ce-11e8-11f9-7d13ad32a3b2"

[compat]
MLJBase = "0.21"
OutlierDetectionInterface = "~0.1.8"
MLJBase = "1.0"
OutlierDetectionInterface = "0.2.0"
SpecialFunctions = "1, 2"
julia = "^1.6"
julia = "1.6 - 1"

[extras]
Combinatorics = "861a8166-3701-5b0c-9a16-15d98fcdc6aa"
StatisticalMeasures = "a19d573c-0a75-4610-95b3-7071388c7541"
Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40"

[targets]
test = ["Test", "Combinatorics"]
test = ["Test", "Combinatorics", "StatisticalMeasures"]
3 changes: 2 additions & 1 deletion docs/Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,8 @@ DocumenterMarkdown = "997ab1e6-3595-5248-9280-8efb232c3433"
MLJ = "add582a8-e3ab-11e8-2d5e-e98b27df1bc7"
OutlierDetection = "262411bb-c475-4342-ba9e-03b8c0183ca6"
OutlierDetectionData = "128abb87-8f03-48fe-b7ad-0e2a52164fda"
OutlierDetectionNetworks = "c7f57e37-4fcb-4a0b-a36c-c2204bc839a7"
OutlierDetectionInterface = "1722ece6-f894-4ffc-b6be-6ca1174e2011"
OutlierDetectionNeighbors = "51249a0a-cb36-4849-8e04-30c7f8d311bb"
OutlierDetectionNetworks = "c7f57e37-4fcb-4a0b-a36c-c2204bc839a7"
OutlierDetectionPython = "2449c660-d36c-460e-a68b-92ab3c865b3e"
StatisticalMeasures = "a19d573c-0a75-4610-95b3-7071388c7541"
12 changes: 6 additions & 6 deletions docs/src/API/interface.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,12 @@ OutlierDetectionInterface.Labels
OutlierDetectionInterface.Fit
```

### `FitResult`

```@docs
OutlierDetectionInterface.FitResult
```

## Functions

### `fit`
Expand All @@ -70,12 +76,6 @@ OutlierDetectionInterface.transform

## Macros

### `@detector`

```@docs
OutlierDetectionInterface.@detector
```

### `@default_frontend`

```@docs
Expand Down
8 changes: 1 addition & 7 deletions docs/src/API/score-helpers.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,7 @@

## Transformers

In order to normalize scores or classify them, both the training and testing scores are necessary. We thus provide a helper function called [`augmented_transform`](@ref) that returns a tuple of training and test scores. Transformers can make use of one or more such train/test tuples to convert them into normalized scores, probabilities or classes.

### `augmented_transform`

```@docs
OutlierDetection.augmented_transform
```
In order to normalize scores or classify them, both the training and testing scores are necessary. We thus return a tuple of training and test scores in all [`transform`](@ref) calls. Transformers can make use of one or more such train/test tuples to convert them into normalized scores, probabilities or classes.

### `ScoreTransformer`

Expand Down
31 changes: 15 additions & 16 deletions docs/src/documentation/advanced-usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ The simple usage guide covered how you can use and optimize an existing outlier

## Working with scores

An outlier detection model, whether supervised or unsupervised, typically assigns an *outlier score* to each datapoint. We further differentiate between outlier scores achieved during *training* or *testing*. Because both train and test scores are essential for further score processing, e.g. converting the scores to classes, we provide an [`augmented_transform`](@ref) that returns a tuple of train and test scores.
An outlier detection model, whether supervised or unsupervised, typically assigns an *outlier score* to each datapoint. We further differentiate between outlier scores achieved during *training* or *testing*. Because both train and test scores are essential for further score processing, e.g. converting the scores to classes, we provide a [`transform`](@ref) that returns a tuple of train and test scores.

```@example advanced
using MLJ, OutlierDetection
Expand All @@ -16,12 +16,12 @@ KNN = @iload KNNDetector pkg=OutlierDetectionNeighbors verbosity=0
knn = KNN()
```

Let's bind the detector to data and perform an `augmented_transform`.
Let's bind the detector to data and perform a [`transform`](@ref).

```@example advanced
mach = machine(knn, X, y)
fit!(mach, rows=train)
scores = augmented_transform(mach, rows=test)
scores = transform(mach, rows=test)
scores_train, scores_test = scores
```

Expand Down Expand Up @@ -83,31 +83,30 @@ Sometimes we need more flexibility to define outlier models. Unfortunately MLJ's
Xs, ys = source(X), source(y)
Xstd = transform(machine(Standardizer(), Xs), Xs)
ŷ = predict(machine(knn, Xstd), Xstd)
knn_std = machine(ProbabilisticUnsupervisedDetector(), Xs, ys; predict=ŷ)
```

We can `fit!` and `predict` with the resulting model as usual.
We can `fit!` and predict with the resulting model as usual.

```@example advanced
fit!(knn_std, rows=train)
predict(knn_std, rows=test)
fit!(, rows=train)
ŷ(rows=test)
```

Note that we supplied labels `ys` to an unsupervised algorithm; this is not necessary if you just want to predict, but it *is necessary if you want to evaluate the resulting learning network*. We can easily export such a learning network as a model with `@from_network`.
Furthermore, if the goal is to create a standalone model from a network, we provide a helper macro called [`@surrogate`](@ref), which directly let's you implement a `prefit` function and implicitly generates the required model struct. The standalone model can be bound to data again like any other model. Have a look at [the original learning networks documentation](https://alan-turing-institute.github.io/MLJ.jl/dev/learning_networks/) if you would like to understand how prefit and composite models work.

```@example advanced
@from_network knn_std mutable struct StandardizedKNN end
```

Furthermore, if the goal is to create a standalone model from a network, we could use empty sources (`source()`) for `Xs` and `ys`. The standalone model can be bound to data again like any other model.
@surrogate(StandardizedKNN) do Xs
Xstd = transform(machine(Standardizer(), Xs), Xs)
ŷ = predict(machine(knn, Xstd), Xstd)
return (;predict=ŷ)
end
```@example advanced
knn_std = machine(StandardizedKNN(), X, y)
knn_std = machine(StandardizedKNN(), X)
fit!(knn_std, rows=train)
predict(knn_std, rows=test)
```

There might be occasions, where our [`ProbabilisticDetector`](@ref) or [`DeterministicDetector`](@ref) wrappers are not flexible enough. In such cases we can directly use [`augmented_transform`](@ref) in our learning networks and use a [`ProbabilisticTransformer`](@ref) or [`DeterministicTransformer`](@ref), which takes one or more train/test tuples as inputs returning probabilistic or deterministic predictions.
There might be occasions, where our [`ProbabilisticDetector`](@ref) or [`DeterministicDetector`](@ref) wrappers are not flexible enough. In such cases we can directly use [`transform`](@ref) in your learning networks and use a [`ProbabilisticTransformer`](@ref) or [`DeterministicTransformer`](@ref), which takes one or more train/test tuples as inputs returning probabilistic or deterministic predictions.

## Implementing models

Expand Down Expand Up @@ -142,7 +141,7 @@ Let's further define a helper function to calculate the distance from the center
```@example advanced
function distances_from(center, vectors::AbstractMatrix, p)
deviations = vectors .- center
return [norm(deviations[:, i], p) for i in 1:size(deviations, 2)]
return [norm(deviations[:, i], p) for i in axes(deviations, 2)]
end
```

Expand Down
2 changes: 1 addition & 1 deletion docs/src/documentation/getting-started.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ fit!(knn_probabilistic, rows=train)
fit!(knn_deterministic, rows=train)
```

Transform the test data into raw outlier scores.
Transform the data into raw outlier scores.

```@example ex
transform(knn_raw, rows=test)
Expand Down
7 changes: 5 additions & 2 deletions src/OutlierDetection.jl
Original file line number Diff line number Diff line change
Expand Up @@ -27,8 +27,7 @@ module OutlierDetection
outlier_fraction

# mlj_helpers.jl
export augmented_transform,
to_categorical,
export to_categorical,
to_univariate_finite,
from_categorical,
from_univariate_finite
Expand All @@ -43,6 +42,9 @@ module OutlierDetection
DeterministicDetector,
CompositeDetector

# mlj_surrogate.jl
export @surrogate

# utilities
include("normalization.jl")
include("classification.jl")
Expand All @@ -53,4 +55,5 @@ module OutlierDetection
include("mlj_helpers.jl")
include("mlj_transformers.jl")
include("mlj_wrappers.jl")
include("mlj_surrogate.jl")
end
104 changes: 0 additions & 104 deletions src/mlj_helpers.jl
Original file line number Diff line number Diff line change
Expand Up @@ -82,107 +82,3 @@ A vector of raw classes.
"""
from_categorical(categorical) = MLJ.unwrap.(categorical)
from_categorical(categorical::MLJ.Node) = MLJ.node(from_categorical, categorical)

# transform a fitresult (containing only the model) back to a Fit containing the model and training scores
to_fitresult(mach::MLJ.Machine{<:OD.Detector})::Fit = (mach.fitresult, MLJ.report(mach).scores)

# this includes all composites defined in mlj_wrappers.jl
const DetectorComposites = Union{
MLJ.Machine{<:MLJ.SupervisedDetectorComposite},
MLJ.Machine{<:MLJ.UnsupervisedDetectorComposite},
MLJ.Machine{<:MLJ.ProbabilisticUnsupervisedDetectorComposite},
MLJ.Machine{<:MLJ.DeterministicUnsupervisedDetectorComposite},
MLJ.Machine{<:MLJ.ProbabilisticSupervisedDetectorComposite},
MLJ.Machine{<:MLJ.DeterministicSupervisedDetectorComposite}
}

const DetectorSurrogates = Union{
MLJ.Machine{<:MLJ.SupervisedDetectorSurrogate},
MLJ.Machine{<:MLJ.UnsupervisedDetectorSurrogate},
MLJ.Machine{<:MLJ.ProbabilisticUnsupervisedDetectorSurrogate},
MLJ.Machine{<:MLJ.DeterministicUnsupervisedDetectorSurrogate},
MLJ.Machine{<:MLJ.ProbabilisticSupervisedDetectorSurrogate},
MLJ.Machine{<:MLJ.DeterministicSupervisedDetectorSurrogate}
}

function check_mach(mach)
# catch deserialized machine with no data:
isempty(mach.args) && MLJ._err_serialized(augmented_transform)
# catch not-yet-trained machine:
mach.state > 0 || error("$mach has not been trained.")
end

function _augmented_transform(detector::Detector, fitresult::Fit, X)
model, scores_train = fitresult
scores_test = MLJ.transform(detector, model, X)
return scores_train, scores_test
end

# 0. augmented_transform given rows:
"""
augmented_transform(mach; rows=:)
Extends `transform` by additionally returning the training scores from detectors as a train/test score tuple.
Parameters
----------
mach::MLJ.Machine{<:OD.Detector}
A fitted machine with a detector model.
rows
Test data specified as rows machine-bound data (as in `transform`), but could also provide new test data `X`.
Returns
----------
augmented_scores::Tuple{AbstractVector{<:Real}, AbstractVector{<:Real}}
A tuple of raw training and test scores.
"""
function augmented_transform(mach::MLJ.Machine{<:OD.Detector}; rows=:)
check_mach(mach)
return _augmented_transform(mach.model, to_fitresult(mach), selectrows(mach.model, rows, mach.data[1])...)
end

function get_scores_from_composite_report(mach)
fit_report = MLJ.report_given_method(mach)[:fit]
if haskey(fit_report, :additions) && haskey(fit_report.additions, :scores)
return fit_report.additions.scores
else
return fit_report.scores
end
end

function augmented_transform(mach::DetectorComposites; rows=:)
check_mach(mach)
scores_train = get_scores_from_composite_report(mach)
scores_test = mach.fitresult.transform(selectrows(mach.model, rows, mach.data[1])...)
return scores_train, scores_test
end

function augmented_transform(mach::DetectorSurrogates; rows=:)
check_mach(mach)
scores_train = get_scores_from_composite_report(mach)
scores_test = mach.fitresult.transform(rows=rows)
return scores_train, scores_test
end

# 1. augmented_transform on machines, given *concrete* data:
function augmented_transform(mach::MLJ.Machine{<:OD.Detector}, X)
check_mach(mach)
return _augmented_transform(mach.model, to_fitresult(mach), reformat(mach.model, X)...)
end

function augmented_transform(mach::DetectorComposites, X)
check_mach(mach)
scores_train = get_scores_from_composite_report(mach)
scores_test = mach.fitresult.transform(X)
return scores_train, scores_test
end

# 2. operations on machines, given *dynamic* data (nodes):
function augmented_transform(mach::MLJ.Machine{<:OD.Detector}, X::MLJ.AbstractNode)
MLJ.node(augmented_transform, mach, X)
end

function augmented_transform(mach::DetectorComposites, X::MLJ.AbstractNode)
MLJ.node(augmented_transform, mach, X)
end
25 changes: 25 additions & 0 deletions src/mlj_surrogate.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@

"""
@surrogate(fn, name)
Create a surrogate model from a learning network, implicitly defining a composite
struct using `name` and a `prefit` function using `fn`.
Parameters
----------
fn::Function
A function to reduce a matrix, where each row represents an instance and each column represents the score of specific
detector, to a vector of scores for each instance. See [`combine_mean`](@ref) for a specific implementation.
name::Symbol
The name of the resulting composite model (the surrogate model).
"""
macro surrogate(fn, name)
esc(
quote
mutable struct $name <: $MLJ.AnnotatorNetworkComposite end
function $MLJ.prefit(::$name, ::Integer, data...)
$fn(map($MLJ.source, data)...)
end
end
)
end
3 changes: 1 addition & 2 deletions src/mlj_transformers.jl
Original file line number Diff line number Diff line change
Expand Up @@ -62,8 +62,7 @@ const StaticTransformer = Union{

# returns the augmented train/test scores
function MLJ.transform(ev::StaticTransformer, _, scores::Tuple{Scores, Scores}...) # _ because there is no fitresult
_, scores_test = to_scores(ev.normalize, ev.combine, scores...)
scores_test
to_scores(ev.normalize, ev.combine, scores...)
end

function MLJ.predict(ev::ProbabilisticTransformer, _, scores::Tuple{Scores, Scores}...) # _ because there is no fitresult
Expand Down
Loading

0 comments on commit 99249ae

Please sign in to comment.