Skip to content

Commit

Permalink
Merge pull request #228 from MilesCranmer/units4
Browse files Browse the repository at this point in the history
Dimensional constraints (v4)
  • Loading branch information
MilesCranmer authored Jul 21, 2023
2 parents 727493d + 821a76f commit 905db77
Show file tree
Hide file tree
Showing 24 changed files with 1,536 additions and 415 deletions.
21 changes: 14 additions & 7 deletions Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,11 @@ authors = ["MilesCranmer <miles.cranmer@gmail.com>"]
version = "0.20.0"

[deps]
Compat = "34da2185-b29b-5c13-b0c7-acf172513d20"
Dates = "ade2ca70-3891-5945-98fb-dc099432e06a"
Distributed = "8ba89e20-285c-5b6f-9357-94700520ee1b"
DynamicExpressions = "a40a106e-89c9-4ca8-8020-a735e8728b6b"
DynamicQuantities = "06fc5a27-2a28-4c7c-a15d-362465fb6821"
JSON3 = "0f8b85d8-7281-11e9-16c2-39a750bddbf1"
LineSearches = "d3d80556-e9d4-5f37-9878-2ab0fcc64255"
LossFunctions = "30fc2ffe-d236-52d8-8643-a9d8f7c094a7"
Expand All @@ -23,15 +25,12 @@ Requires = "ae029012-a4dd-5104-9daa-d747884805df"
SpecialFunctions = "276daf66-3868-5448-9aa4-cd146d93841b"
StatsBase = "2913bbd2-ae8a-5f71-8c99-4fb6c76f3a91"
TOML = "fa267f1f-6049-4f14-aa54-33bafae1ed76"

[weakdeps]
SymbolicUtils = "d1185830-fcd6-423d-90d6-eec64667417b"

[extensions]
SymbolicRegressionSymbolicUtilsExt = "SymbolicUtils"
Tricks = "410a4b4d-49e4-4fbc-ab6d-cb71b17b3775"

[compat]
DynamicExpressions = "0.9"
Compat = "^4.2"
DynamicExpressions = "0.10"
DynamicQuantities = "^0.6.2"
JSON3 = "1"
LineSearches = "7"
LossFunctions = "0.6, 0.7, 0.8, 0.10"
Expand All @@ -46,8 +45,12 @@ Requires = "1"
SpecialFunctions = "0.10.1, 1, 2"
StatsBase = "0.33, 0.34"
SymbolicUtils = "0.19, ^1.0.5"
Tricks = "0.1"
julia = "1.6"

[extensions]
SymbolicRegressionSymbolicUtilsExt = "SymbolicUtils"

[extras]
ForwardDiff = "f6369f11-7733-5829-9624-2563aa707210"
LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
Expand All @@ -56,7 +59,11 @@ MLJTestInterface = "72560011-54dd-4dc2-94f3-c5de45b75ecd"
SafeTestsets = "1bc83da4-3b8d-516f-aca4-4fe02f6d838f"
SymbolicUtils = "d1185830-fcd6-423d-90d6-eec64667417b"
Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40"
Unitful = "1986cc42-f94f-5a68-af5c-568840ba703d"
Zygote = "e88e6eb3-aa80-5325-afca-941959d7151f"

[targets]
test = ["Test", "SafeTestsets", "ForwardDiff", "LinearAlgebra", "MLJBase", "MLJTestInterface", "SymbolicUtils", "Zygote"]

[weakdeps]
SymbolicUtils = "d1185830-fcd6-423d-90d6-eec64667417b"
4 changes: 2 additions & 2 deletions docs/src/api.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ MultitargetSRRegressor
equation_search(X::AbstractMatrix{T}, y::AbstractMatrix{T};
niterations::Int=10,
weights::Union{AbstractVector{T}, Nothing}=nothing,
varMap::Union{Array{String, 1}, Nothing}=nothing,
variable_names::Union{Array{String, 1}, Nothing}=nothing,
options::Options=Options(),
numprocs::Union{Int, Nothing}=nothing,
procs::Union{Array{Int, 1}, Nothing}=nothing,
Expand Down Expand Up @@ -59,7 +59,7 @@ eval_grad_tree_array(tree::Node, X::AbstractMatrix, options::Options; kws...)

```@docs
node_to_symbolic(tree::Node, options::Options;
varMap::Union{Array{String, 1}, Nothing}=nothing,
variable_names::Union{Array{String, 1}, Nothing}=nothing,
index_functions::Bool=false)
```

Expand Down
218 changes: 150 additions & 68 deletions docs/src/examples.md
Original file line number Diff line number Diff line change
@@ -1,36 +1,8 @@
# Toy Examples with Code

## Preamble

```julia
using SymbolicRegression
using DataFrames
```

We'll also code up a simple function to print a single expression:

```julia
function get_best(; hof::HallOfFame{T,L}, options) where {T,L}
dominating = calculate_pareto_frontier(hof)

df = DataFrame(;
tree=[m.tree for m in dominating],
loss=[m.loss for m in dominating],
complexity=[compute_complexity(m, options) for m in dominating],
)

df[!, :score] = vcat(
[L(0.0)],
-1 .* log.(df.loss[2:end] ./ df.loss[1:(end - 1)]) ./
(df.complexity[2:end] .- df.complexity[1:(end - 1)]),
)

min_loss = min(df.loss...)

best_idx = argmax(df.score .* (df.loss .<= (2 * min_loss)))

return df.tree[best_idx], df
end
using MLJ
```

## 1. Simple search
Expand All @@ -39,110 +11,220 @@ Here's a simple example where we
find the expression `2 cos(x3) + x0^2 - 2`.

```julia
X = 2randn(5, 1000)
y = @. 2*cos(X[4, :]) + X[1, :]^2 - 2
X = 2randn(1000, 5)
y = @. 2*cos(X[:, 4]) + X[:, 1]^2 - 2

model = SRRegressor(
binary_operators=[+, -, *, /],
unary_operators=[cos],
niterations=30
)
mach = machine(model, X, y)
fit!(mach)
```

Let's look at the returned table:

options = Options(; binary_operators=[+, -, *, /], unary_operators=[cos])
hof = equation_search(X, y; options=options, niterations=30)
```julia
r = report(mach)
r
```

Let's look at the most accurate expression:
We can get the selected best tradeoff expression with:

```julia
best, df = get_best(; hof, options)
println(best)
r.equations[r.best_idx]
```

## 2. Custom operator

Here, we define a custom operator and use it to find an expression:

```julia
X = 2randn(5, 1000)
y = @. 1/X[1, :]

options = Options(; binary_operators=[+, *], unary_operators=[inv])
hof = equation_search(X, y; options=options)
println(get_best(; hof, options)[1])
X = 2randn(1000, 5)
y = @. 1/X[:, 1]

my_inv(x) = 1/x

model = SRRegressor(
binary_operators=[+, *],
unary_operators=[my_inv],
)
mach = machine(model, X, y)
fit!(mach)
r = report(mach)
println(r.equations[r.best_idx])
```

## 3. Multiple outputs

Here, we do the same thing, but with multiple expressions at once,
each requiring a different feature.
each requiring a different feature. This means that we need to use
`MultitargetSRRegressor` instead of `SRRegressor`:

```julia
X = 2rand(5, 1000) .+ 0.1
y = @. 1/X[1:3, :]
options = Options(; binary_operators=[+, *], unary_operators=[inv])
hofs = equation_search(X, y; options=options)
bests = [get_best(; hof=hofs[i], options)[1] for i=1:3]
println(bests)
X = 2rand(1000, 5) .+ 0.1
y = @. 1/X[:, 1:3]

my_inv(x) = 1/x

model = MultitargetSRRegressor(; binary_operators=[+, *], unary_operators=[my_inv])
mach = machine(model, X, y)
fit!(mach)
```

The report gives us lists of expressions instead:

```julia
r = report(mach)
for i in 1:3
println("y[$(i)] = ", r.equations[i][r.best_idx[i]])
end
```

## 4. Plotting an expression

For now, let's consider the expressions for output 1.
We can see the SymbolicUtils version with:
For now, let's consider the expressions for output 1 from the previous example:
We can get a SymbolicUtils version with:

```julia
using SymbolicUtils

eqn = node_to_symbolic(bests[1], options)
eqn = node_to_symbolic(r.equations[1][r.best_idx[1]], model)
```

We can get the LaTeX version with:
We can get the LaTeX version with `Latexify`:

```julia
using Latexify

latexify(string(eqn))
```

Let's plot the prediction against the truth:
We can also plot the prediction against the truth:

```julia
using Plots

scatter(y[1, :], bests[1](X, options), xlabel="Truth", ylabel="Prediction")
ypred = predict(mach, X)
scatter(y[1, :], ypred[1, :], xlabel="Truth", ylabel="Prediction")
```

Here, we have used the convenience function `(::Node{T})(::AbstractMatrix, ::Options)` to evaluate
the expression.

## 5. Other types

SymbolicRegression.jl can handle most numeric types you wish to use.
For example, passing a `Float32` array will result in the search using
32-bit precision everywhere in the codebase:

```julia
X = 2randn(Float32, 5, 1000)
y = @. 2*cos(X[4, :]) + X[1, :]^2 - 2
X = 2randn(Float32, 1000, 5)
y = @. 2*cos(X[:, 4]) + X[:, 1]^2 - 2

options = Options(; binary_operators=[+, -, *, /], unary_operators=[cos])
hof = equation_search(X, y; options=options, niterations=30)
model = SRRegressor(binary_operators=[+, -, *, /], unary_operators=[cos], niterations=30)
mach = machine(model, X, y)
fit!(mach)
```

we can see that the output types are `Float32`:

```julia
best, df = get_best(; hof, options)
r = report(mach)
best = r.equations[r.best_idx]
println(typeof(best))
# Node{Float32}
```

We can also use `Complex` numbers:
We can also use `Complex` numbers (ignore the warning
from MLJ):

```julia
cos_re(x::Complex{T}) where {T} = cos(abs(x)) + 0im

X = 15 .* rand(ComplexF64, 5, 1000) .- 7.5
y = @. 2*cos_re((2+1im) * X[4, :]) + 0.1 * X[1, :]^2 - 2
X = 15 .* rand(ComplexF64, 1000, 5) .- 7.5
y = @. 2*cos_re((2+1im) * X[:, 4]) + 0.1 * X[:, 1]^2 - 2

model = SRRegressor(
binary_operators=[+, -, *, /],
unary_operators=[cos_re],
maxsize=30,
niterations=100
)
mach = machine(model, X, y)
fit!(mach)
```

## 7. Dimensional constraints

options = Options(; binary_operators=[+, -, *, /], unary_operators=[cos_re], maxsize=30)
hof = equation_search(X, y; options=options, niterations=100)
One other feature we can exploit is dimensional analysis.
Say that we know the physical units of each feature and output,
and we want to find an expression that is dimensionally consistent.

We can do this as follows, using `DynamicQuantities` to assign units.
First, let's make some data on Newton's law of gravitation:

```julia
using DynamicQuantities
using SymbolicRegression

M = (rand(100) .+ 0.1) .* Constants.M_sun
m = 100 .* (rand(100) .+ 0.1) .* u"kg"
r = (rand(100) .+ 0.1) .* Constants.R_earth

G = Constants.G

F = @. (G * M * m / r^2)
```

(Note that the `u` macro from `DynamicQuantities` will automatically convert to SI units. To avoid this,
use the `us` macro.)

Now, let's ready the data for MLJ:

```julia
X = (; M=M, m=m, r=r)
y = F
```

Since this data has such a large dynamic range, let's also create a custom loss function
that looks at the error in log-space:

```julia
function loss_fnc(prediction, target)
# Useful loss for large dynamic range
scatter_loss = abs(log((abs(prediction)+1e-20) / (abs(target)+1e-20)))
sign_loss = 10 * (sign(prediction) - sign(target))^2
return scatter_loss + sign_loss
end
```

Now let's define and fit our model:

```julia
model = SRRegressor(
binary_operators=[+, -, *, /],
unary_operators=[square],
elementwise_loss=loss_fnc,
complexity_of_constants=2,
maxsize=25,
niterations=100,
npopulations=50,
dimensional_constraint_penalty=10^5,
)
mach = machine(model, X, y)
fit!(mach)
```

You can observe that all expressions with a loss under
our penalty are dimensionally consistent! (The `"[⋅]"` indicates free units in a constant,
which can cancel out other units in the expression.) For example,

```julia
"y[m s⁻² kg] = (M[kg] * 2.6353e-22[⋅])"
```

would indicate that the expression is dimensionally consistent, with
a constant `"2.6353e-22[m s⁻²]"`.

## 6. Additional features

For the many other features available in SymbolicRegression.jl,
Expand Down
17 changes: 8 additions & 9 deletions docs/src/types.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,15 +75,14 @@ HallOfFame(options::Options, ::Type{T}, ::Type{L}) where {T<:DATA_TYPE,L<:LOSS_T

```@docs
Dataset
Dataset(
X::AbstractMatrix{T},
y::Union{AbstractVector{T},Nothing}=nothing;
weights::Union{AbstractVector{T},Nothing}=nothing,
variable_names::Union{Array{String,1},Nothing}=nothing,
extra::NamedTuple=NamedTuple(),
loss_type::Type=Nothing,
# Deprecated:
varMap=nothing,
Dataset(X::AbstractMatrix{T}, y::Union{AbstractVector{T},Nothing}=nothing;
weights::Union{AbstractVector{T}, Nothing}=nothing,
variable_names::Union{Array{String, 1}, Nothing}=nothing,
y_variable_name::Union{String,Nothing}=nothing,
extra::NamedTuple=NamedTuple(),
loss_type::Type=Nothing,
X_units::Union{AbstractVector, Nothing}=nothing,
y_units=nothing,
) where {T<:DATA_TYPE}
update_baseline_loss!
```
Loading

0 comments on commit 905db77

Please sign in to comment.