Skip to content

Commit

Permalink
Merge pull request #32 from JuliaAI/incremental
Browse files Browse the repository at this point in the history
Add an observation-updatable density estimator to tests
  • Loading branch information
ablaom authored Oct 11, 2024
2 parents 168e0c6 + d82eaa5 commit c1d4220
Show file tree
Hide file tree
Showing 11 changed files with 208 additions and 40 deletions.
2 changes: 2 additions & 0 deletions Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ julia = "1.6"

[extras]
DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
Distributions = "31c24e10-a181-5473-b8eb-7969acd0382f"
LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
MLUtils = "f1d291b0-491e-4a28-83b9-f70985020b54"
Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
Expand All @@ -23,6 +24,7 @@ Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40"
[targets]
test = [
"DataFrames",
"Distributions",
"LinearAlgebra",
"MLUtils",
"Random",
Expand Down
8 changes: 3 additions & 5 deletions docs/src/common_implementation_patterns.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,6 @@
# Common Implementation Patterns

```@raw html
🚧
```
!!! warning

This section is only an implementation guide. The definitive specification of the
Learn API is given in [Reference](@ref reference).
Expand All @@ -25,7 +23,7 @@ implementations fall into one (or more) of the following informally understood p

- [Iterative Algorithms](@ref)

- Incremental Algorithms
- [Incremental Algorithms](@ref): Algorithms that can be updated with new observations.

- [Feature Engineering](@ref): Algorithms for selecting or combining features

Expand All @@ -48,7 +46,7 @@ implementations fall into one (or more) of the following informally understood p

- Survival Analysis

- Density Estimation: Algorithms that learn a probability distribution
- [Density Estimation](@ref): Algorithms that learn a probability distribution

- Bayesian Algorithms

Expand Down
2 changes: 1 addition & 1 deletion docs/src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -98,7 +98,7 @@ A key to enabling toolboxes to enhance LearnAPI.jl algorithm functionality is th
implementation of two key additional methods, beyond the usual `fit` and
`predict`/`transform`. Given any training `data` consumed by `fit` (such as `data = (X,
y)` in the example above) [`LearnAPI.features(algorithm, data)`](@ref input) tells us what
part of `data` comprises *features*, which is something that can be passsed onto to
part of `data` comprises *features*, which is something that can be passed onto to
`predict` or `transform` (`X` in the example) while [`LearnAPI.target(algorithm,
data)`](@ref), if implemented, tells us what part comprises the target (`y` in the
example). By explicitly requiring such methods, we free algorithms to consume data in
Expand Down
4 changes: 4 additions & 0 deletions docs/src/patterns/density_estimation.md
Original file line number Diff line number Diff line change
@@ -1 +1,5 @@
# Density Estimation

See these examples from tests:

- [normal distribution estimator](https://github.com/JuliaAI/LearnAPI.jl/blob/dev/test/patterns/incremental_algorithms.jl)
5 changes: 5 additions & 0 deletions docs/src/patterns/incremental_algorithms.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Incremental Algorithms

See these examples from tests:

- [normal distribution estimator](https://github.com/JuliaAI/LearnAPI.jl/blob/dev/test/patterns/incremental_algorithms.jl)
12 changes: 6 additions & 6 deletions src/predict_transform.jl
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,6 @@ function DOC_IMPLEMENTED_METHODS(name; overloaded=false)
"[`LearnAPI.functions`](@ref) trait. "
end

const OPERATIONS = (:predict, :transform, :inverse_transform)
const DOC_OPERATIONS_LIST_SYMBOL = join(map(op -> "`:$op`", OPERATIONS), ", ")
const DOC_OPERATIONS_LIST_FUNCTION = join(map(op -> "`LearnAPI.$op`", OPERATIONS), ", ")

DOC_MUTATION(op) =
"""
Expand Down Expand Up @@ -66,6 +62,9 @@ which lists all supported target proxies.
The argument `model` is anything returned by a call of the form `fit(algorithm, ...)`.
If `LearnAPI.features(LearnAPI.algorithm(model)) == nothing`, then argument `data` is
omitted. An example is density estimators.
# Example
In the following, `algorithm` is some supervised learning algorithm with
Expand Down Expand Up @@ -105,6 +104,7 @@ $(DOC_DATA_INTERFACE(:predict))
"""
predict(model, data) = predict(model, kinds_of_proxy(algorithm(model)) |> first, data)
predict(model) = predict(model, kinds_of_proxy(algorithm(model)) |> first)

# automatic slurping of multiple data arguments:
predict(model, k::KindOfProxy, data1, data2, datas...; kwargs...) =
Expand Down Expand Up @@ -167,8 +167,8 @@ $(DOC_MUTATION(:transform))
$(DOC_DATA_INTERFACE(:transform))
"""
transform(model, data1, data2...; kwargs...) =
transform(model, (data1, datas...); kwargs...) # automatic slurping
transform(model, data1, data2, datas...; kwargs...) =
transform(model, (data1, data2, datas...); kwargs...) # automatic slurping

"""
inverse_transform(model, data)
Expand Down
54 changes: 27 additions & 27 deletions src/types.jl
Original file line number Diff line number Diff line change
Expand Up @@ -22,27 +22,27 @@ See also [`LearnAPI.KindOfProxy`](@ref).
| type | form of an observation |
|:-------------------------------------:|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `LearnAPI.Point` | same as target observations; may have the interpretation of a 50% quantile, 50% expectile or mode |
| `LearnAPI.Sampleable` | object that can be sampled to obtain object of the same form as target observation |
| `LearnAPI.Distribution` | explicit probability density/mass function whose sample space is all possible target observations |
| `LearnAPI.LogDistribution` | explicit log-probability density/mass function whose sample space is possible target observations |
| `LearnAPI.Probability`¹ | numerical probability or probability vector |
| `LearnAPI.LogProbability`¹ | log-probability or log-probability vector |
| `LearnAPI.Parametric`¹ | a list of parameters (e.g., mean and variance) describing some distribution |
| `LearnAPI.LabelAmbiguous` | collections of labels (in case of multi-class target) but without a known correspondence to the original target labels (and of possibly different number) as in, e.g., clustering |
| `LearnAPI.LabelAmbiguousSampleable` | sampleable version of `LabelAmbiguous`; see `Sampleable` above |
| `LearnAPI.LabelAmbiguousDistribution` | pdf/pmf version of `LabelAmbiguous`; see `Distribution` above |
| `LearnAPI.LabelAmbiguousFuzzy` | same as `LabelAmbiguous` but with multiple values of indeterminant number |
| `LearnAPI.Quantile`² | same as target but with quantile interpretation |
| `LearnAPI.Expectile`² | same as target but with expectile interpretation |
| `LearnAPI.ConfidenceInterval`² | confidence interval |
| `LearnAPI.Fuzzy` | finite but possibly varying number of target observations |
| `LearnAPI.ProbabilisticFuzzy` | as for `Fuzzy` but labeled with probabilities (not necessarily summing to one) |
| `LearnAPI.SurvivalFunction` | survival function |
| `LearnAPI.SurvivalDistribution` | probability distribution for survival time |
| `LearnAPI.SurvivalHazardFunction` | hazard function for survival time |
| `LearnAPI.OutlierScore` | numerical score reflecting degree of outlierness (not necessarily normalized) |
| `LearnAPI.Continuous` | real-valued approximation/interpolation of a discrete-valued target, such as a count (e.g., number of phone calls) |
| `Point` | same as target observations; may have the interpretation of a 50% quantile, 50% expectile or mode |
| `Sampleable` | object that can be sampled to obtain object of the same form as target observation |
| `Distribution` | explicit probability density/mass function whose sample space is all possible target observations |
| `LogDistribution` | explicit log-probability density/mass function whose sample space is possible target observations |
| `Probability`¹ | numerical probability or probability vector |
| `LogProbability`¹ | log-probability or log-probability vector |
| `Parametric`¹ | a list of parameters (e.g., mean and variance) describing some distribution |
| `LabelAmbiguous` | collections of labels (in case of multi-class target) but without a known correspondence to the original target labels (and of possibly different number) as in, e.g., clustering |
| `LabelAmbiguousSampleable` | sampleable version of `LabelAmbiguous`; see `Sampleable` above |
| `LabelAmbiguousDistribution` | pdf/pmf version of `LabelAmbiguous`; see `Distribution` above |
| `LabelAmbiguousFuzzy` | same as `LabelAmbiguous` but with multiple values of indeterminant number |
| `Quantile`² | same as target but with quantile interpretation |
| `Expectile`² | same as target but with expectile interpretation |
| `ConfidenceInterval`² | confidence interval |
| `Fuzzy` | finite but possibly varying number of target observations |
| `ProbabilisticFuzzy` | as for `Fuzzy` but labeled with probabilities (not necessarily summing to one) |
| `SurvivalFunction` | survival function |
| `SurvivalDistribution` | probability distribution for survival time |
| `SurvivalHazardFunction` | hazard function for survival time |
| `OutlierScore` | numerical score reflecting degree of outlierness (not necessarily normalized) |
| `Continuous` | real-valued approximation/interpolation of a discrete-valued target, such as a count (e.g., number of phone calls) |
¹Provided for completeness but discouraged to avoid [ambiguities in
representation](https://github.com/alan-turing-institute/MLJ.jl/blob/dev/paper/paper.md#a-unified-approach-to-probabilistic-predictions-and-their-evaluation).
Expand Down Expand Up @@ -86,9 +86,9 @@ space ``Y^n``, where ``Y`` is the space from which the target variable takes its
| type `T` | form of output of `predict(model, ::T, data)` |
|:-------------------------------:|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `LearnAPI.JointSampleable` | object that can be sampled to obtain a *vector* whose elements have the form of target observations; the vector length matches the number of observations in `data`. |
| `LearnAPI.JointDistribution` | explicit probability density/mass function whose sample space is vectors of target observations; the vector length matches the number of observations in `data` |
| `LearnAPI.JointLogDistribution` | explicit log-probability density/mass function whose sample space is vectors of target observations; the vector length matches the number of observations in `data` |
| `JointSampleable` | object that can be sampled to obtain a *vector* whose elements have the form of target observations; the vector length matches the number of observations in `data`. |
| `JointDistribution` | explicit probability density/mass function whose sample space is vectors of target observations; the vector length matches the number of observations in `data` |
| `JointLogDistribution` | explicit log-probability density/mass function whose sample space is vectors of target observations; the vector length matches the number of observations in `data` |
"""
abstract type Joint <: KindOfProxy end
Expand All @@ -108,9 +108,9 @@ single object representing a probability distribution.
| type `T` | form of output of `predict(model, ::T)` |
|:--------------------------------:|:-----------------------------------------------------------------------|
| `LearnAPI.SingleSampleable` | object that can be sampled to obtain a single target observation |
| `LearnAPI.SingleDistribution` | explicit probability density/mass function for sampling the target |
| `LearnAPI.SingleLogDistribution` | explicit log-probability density/mass function for sampling the target |
| `SingleSampleable` | object that can be sampled to obtain a single target observation |
| `SingleDistribution` | explicit probability density/mass function for sampling the target |
| `SingleLogDistribution` | explicit log-probability density/mass function for sampling the target |
"""
abstract type Single <: KindOfProxy end
Expand Down
Loading

0 comments on commit c1d4220

Please sign in to comment.