diff --git a/dev/composition/index.html b/dev/composition/index.html index 4c478e52..2c8175bf 100644 --- a/dev/composition/index.html +++ b/dev/composition/index.html @@ -1,2 +1,2 @@ -
Settings
This document was generated with Documenter.jl version 0.27.25 on Thursday 21 September 2023. Using Julia version 1.9.3.
Settings
This document was generated with Documenter.jl version 0.27.25 on Monday 25 September 2023. Using Julia version 1.9.3.
Load it with DelimitedFiles and Tables
data_raw, data_header = readdlm(fpath, ',', header=true)
data_table = Tables.table(data_raw; header=Symbol.(vec(data_header)))
Retrieve the conversions:
for (n, st) in zip(names(data), scitype_union.(eachcol(data)))
println(":$n=>$st,")
-end
Copy and paste the result in a coerce
data_table = coerce(data_table, ...)
MLJBase.load_dataset
— Methodload_dataset(fpath, coercions)
Load one of standard dataset like Boston etc assuming the file is a comma separated file with a header.
MLJBase.load_sunspots
— MethodLoad a well-known sunspot time series (table with one column). [https://www.sws.bom.gov.au/Educational/2/3/6]](https://www.sws.bom.gov.au/Educational/2/3/6)
MLJBase.@load_ames
— MacroLoad the full version of the well-known Ames Housing task.
MLJBase.@load_boston
— MacroLoad a well-known public regression dataset with Continuous
features.
MLJBase.@load_crabs
— MacroLoad a well-known crab classification dataset with nominal features.
MLJBase.@load_iris
— MacroLoad a well-known public classification task with nominal features.
MLJBase.@load_reduced_ames
— MacroLoad a reduced version of the well-known Ames Housing task
MLJBase.@load_smarket
— MacroLoad S&P Stock Market dataset, as used in (An Introduction to Statistical Learning with applications in R)https://rdrr.io/cran/ISLR/man/Smarket.html, by Witten et al (2013), Springer-Verlag, New York.
MLJBase.@load_sunspots
— MacroLoad a well-known sunspot time series (single table with one column).
MLJBase.x
— Constantfinalize_Xy(X, y, shuffle, as_table, eltype, rng; clf)
Internal function to finalize the make_*
functions.
MLJBase.augment_X
— Methodaugment_X(X, fit_intercept)
Given a matrix X
, append a column of ones if fit_intercept
is true. See make_regression
.
MLJBase.make_blobs
— FunctionX, y = make_blobs(n=100, p=2; kwargs...)
Generate Gaussian blobs for clustering and classification problems.
Return value
By default, a table X
with p
columns (features) and n
rows (observations), together with a corresponding vector of n
Multiclass
target observations y
, indicating blob membership.
Keyword arguments
shuffle=true
: whether to shuffle the resulting points,
centers=3
: either a number of centers or a c x p
matrix with c
pre-determined centers,
cluster_std=1.0
: the standard deviation(s) of each blob,
center_box=(-10. => 10.)
: the limits of the p
-dimensional cube within which the cluster centers are drawn if they are not provided,
eltype=Float64
: machine type of points (any subtype of AbstractFloat
).
rng=Random.GLOBAL_RNG
: any AbstractRNG
object, or integer to seed a MersenneTwister
(for reproducibility).
as_table=true
: whether to return the points as a table (true) or a matrix (false). If false
the target y
has integer element type.
Example
X, y = make_blobs(100, 3; centers=2, cluster_std=[1.0, 3.0])
MLJBase.make_circles
— FunctionX, y = make_circles(n=100; kwargs...)
Generate n
labeled points close to two concentric circles for classification and clustering models.
Return value
By default, a table X
with 2
columns and n
rows (observations), together with a corresponding vector of n
Multiclass
target observations y
. The target is either 0
or 1
, corresponding to membership to the smaller or larger circle, respectively.
Keyword arguments
shuffle=true
: whether to shuffle the resulting points,
noise=0
: standard deviation of the Gaussian noise added to the data,
factor=0.8
: ratio of the smaller radius over the larger one,
eltype=Float64
: machine type of points (any subtype of AbstractFloat
).
rng=Random.GLOBAL_RNG
: any AbstractRNG
object, or integer to seed a MersenneTwister
(for reproducibility).
as_table=true
: whether to return the points as a table (true) or a matrix (false). If false
the target y
has integer element type.
Example
X, y = make_circles(100; noise=0.5, factor=0.3)
MLJBase.make_moons
— Function make_moons(n::Int=100; kwargs...)
Generates labeled two-dimensional points lying close to two interleaved semi-circles, for use with classification and clustering models.
Return value
By default, a table X
with 2
columns and n
rows (observations), together with a corresponding vector of n
Multiclass
target observations y
. The target is either 0
or 1
, corresponding to membership to the left or right semi-circle.
Keyword arguments
shuffle=true
: whether to shuffle the resulting points,
noise=0.1
: standard deviation of the Gaussian noise added to the data,
xshift=1.0
: horizontal translation of the second center with respect to the first one.
yshift=0.3
: vertical translation of the second center with respect to the first one.
eltype=Float64
: machine type of points (any subtype of AbstractFloat
).
rng=Random.GLOBAL_RNG
: any AbstractRNG
object, or integer to seed a MersenneTwister
(for reproducibility).
as_table=true
: whether to return the points as a table (true) or a matrix (false). If false
the target y
has integer element type.
Example
X, y = make_moons(100; noise=0.5)
MLJBase.make_regression
— Functionmake_regression(n, p; kwargs...)
Generate Gaussian input features and a linear response with Gaussian noise, for use with regression models.
Return value
By default, a tuple (X, y)
where table X
has p
columns and n
rows (observations), together with a corresponding vector of n
Continuous
target observations y
.
Keywords
intercept=true
: Whether to generate data from a model with intercept.
n_targets=1
: Number of columns in the target.
sparse=0
: Proportion of the generating weight vector that is sparse.
noise=0.1
: Standard deviation of the Gaussian noise added to the response (target).
outliers=0
: Proportion of the response vector to make as outliers by adding a random quantity with high variance. (Only applied if binary
is false
.)
as_table=true
: Whether X
(and y
, if n_targets > 1
) should be a table or a matrix.
eltype=Float64
: Element type for X
and y
. Must subtype AbstractFloat
.
binary=false
: Whether the target should be binarized (via a sigmoid).
eltype=Float64
: machine type of points (any subtype of AbstractFloat
).
rng=Random.GLOBAL_RNG
: any AbstractRNG
object, or integer to seed a MersenneTwister
(for reproducibility).
as_table=true
: whether to return the points as a table (true) or a matrix (false).
Example
X, y = make_regression(100, 5; noise=0.5, sparse=0.2, outliers=0.1)
MLJBase.outlify!
— MethodAdd outliers to portion s of vector.
MLJBase.runif_ab
— Methodrunif_ab(rng, n, p, a, b)
Internal function to generate n
points in [a, b]ᵖ
uniformly at random.
MLJBase.sigmoid
— Methodsigmoid(x)
Return the sigmoid computed in a numerically stable way:
$σ(x) = 1/(1+exp(-x))$
MLJBase.sparsify!
— Methodsparsify!(rng, θ, s)
Make portion s
of vector θ
exactly 0.
MLJBase.complement
— Methodcomplement(folds, i)
The complement of the i
th fold of folds
in the concatenation of all elements of folds
. Here folds
is a vector or tuple of integer vectors, typically representing row indices or a vector, matrix or table.
complement(([1,2], [3,], [4, 5]), 2) # [1 ,2, 4, 5]
MLJBase.corestrict
— Methodcorestrict(X, folds, i)
The restriction of X
, a vector, matrix or table, to the complement of the i
th fold of folds
, where folds
is a tuple of vectors of row indices.
The method is curried, so that corestrict(folds, i)
is the operator on data defined by corestrict(folds, i)(X) = corestrict(X, folds, i)
.
Example
folds = ([1, 2], [3, 4, 5], [6,])
-corestrict([:x1, :x2, :x3, :x4, :x5, :x6], folds, 2) # [:x1, :x2, :x6]
MLJBase.partition
— Methodpartition(X, fractions...;
+end
Copy and paste the result in a coerce
data_table = coerce(data_table, ...)
MLJBase.load_dataset
— Methodload_dataset(fpath, coercions)
Load one of standard dataset like Boston etc assuming the file is a comma separated file with a header.
MLJBase.load_sunspots
— MethodLoad a well-known sunspot time series (table with one column). [https://www.sws.bom.gov.au/Educational/2/3/6]](https://www.sws.bom.gov.au/Educational/2/3/6)
MLJBase.@load_ames
— MacroLoad the full version of the well-known Ames Housing task.
MLJBase.@load_boston
— MacroLoad a well-known public regression dataset with Continuous
features.
MLJBase.@load_crabs
— MacroLoad a well-known crab classification dataset with nominal features.
MLJBase.@load_iris
— MacroLoad a well-known public classification task with nominal features.
MLJBase.@load_reduced_ames
— MacroLoad a reduced version of the well-known Ames Housing task
MLJBase.@load_smarket
— MacroLoad S&P Stock Market dataset, as used in (An Introduction to Statistical Learning with applications in R)https://rdrr.io/cran/ISLR/man/Smarket.html, by Witten et al (2013), Springer-Verlag, New York.
MLJBase.@load_sunspots
— MacroLoad a well-known sunspot time series (single table with one column).
MLJBase.x
— Constantfinalize_Xy(X, y, shuffle, as_table, eltype, rng; clf)
Internal function to finalize the make_*
functions.
MLJBase.augment_X
— Methodaugment_X(X, fit_intercept)
Given a matrix X
, append a column of ones if fit_intercept
is true. See make_regression
.
MLJBase.make_blobs
— FunctionX, y = make_blobs(n=100, p=2; kwargs...)
Generate Gaussian blobs for clustering and classification problems.
Return value
By default, a table X
with p
columns (features) and n
rows (observations), together with a corresponding vector of n
Multiclass
target observations y
, indicating blob membership.
Keyword arguments
shuffle=true
: whether to shuffle the resulting points,
centers=3
: either a number of centers or a c x p
matrix with c
pre-determined centers,
cluster_std=1.0
: the standard deviation(s) of each blob,
center_box=(-10. => 10.)
: the limits of the p
-dimensional cube within which the cluster centers are drawn if they are not provided,
eltype=Float64
: machine type of points (any subtype of AbstractFloat
).
rng=Random.GLOBAL_RNG
: any AbstractRNG
object, or integer to seed a MersenneTwister
(for reproducibility).
as_table=true
: whether to return the points as a table (true) or a matrix (false). If false
the target y
has integer element type.
Example
X, y = make_blobs(100, 3; centers=2, cluster_std=[1.0, 3.0])
MLJBase.make_circles
— FunctionX, y = make_circles(n=100; kwargs...)
Generate n
labeled points close to two concentric circles for classification and clustering models.
Return value
By default, a table X
with 2
columns and n
rows (observations), together with a corresponding vector of n
Multiclass
target observations y
. The target is either 0
or 1
, corresponding to membership to the smaller or larger circle, respectively.
Keyword arguments
shuffle=true
: whether to shuffle the resulting points,
noise=0
: standard deviation of the Gaussian noise added to the data,
factor=0.8
: ratio of the smaller radius over the larger one,
eltype=Float64
: machine type of points (any subtype of AbstractFloat
).
rng=Random.GLOBAL_RNG
: any AbstractRNG
object, or integer to seed a MersenneTwister
(for reproducibility).
as_table=true
: whether to return the points as a table (true) or a matrix (false). If false
the target y
has integer element type.
Example
X, y = make_circles(100; noise=0.5, factor=0.3)
MLJBase.make_moons
— Function make_moons(n::Int=100; kwargs...)
Generates labeled two-dimensional points lying close to two interleaved semi-circles, for use with classification and clustering models.
Return value
By default, a table X
with 2
columns and n
rows (observations), together with a corresponding vector of n
Multiclass
target observations y
. The target is either 0
or 1
, corresponding to membership to the left or right semi-circle.
Keyword arguments
shuffle=true
: whether to shuffle the resulting points,
noise=0.1
: standard deviation of the Gaussian noise added to the data,
xshift=1.0
: horizontal translation of the second center with respect to the first one.
yshift=0.3
: vertical translation of the second center with respect to the first one.
eltype=Float64
: machine type of points (any subtype of AbstractFloat
).
rng=Random.GLOBAL_RNG
: any AbstractRNG
object, or integer to seed a MersenneTwister
(for reproducibility).
as_table=true
: whether to return the points as a table (true) or a matrix (false). If false
the target y
has integer element type.
Example
X, y = make_moons(100; noise=0.5)
MLJBase.make_regression
— Functionmake_regression(n, p; kwargs...)
Generate Gaussian input features and a linear response with Gaussian noise, for use with regression models.
Return value
By default, a tuple (X, y)
where table X
has p
columns and n
rows (observations), together with a corresponding vector of n
Continuous
target observations y
.
Keywords
intercept=true
: Whether to generate data from a model with intercept.
n_targets=1
: Number of columns in the target.
sparse=0
: Proportion of the generating weight vector that is sparse.
noise=0.1
: Standard deviation of the Gaussian noise added to the response (target).
outliers=0
: Proportion of the response vector to make as outliers by adding a random quantity with high variance. (Only applied if binary
is false
.)
as_table=true
: Whether X
(and y
, if n_targets > 1
) should be a table or a matrix.
eltype=Float64
: Element type for X
and y
. Must subtype AbstractFloat
.
binary=false
: Whether the target should be binarized (via a sigmoid).
eltype=Float64
: machine type of points (any subtype of AbstractFloat
).
rng=Random.GLOBAL_RNG
: any AbstractRNG
object, or integer to seed a MersenneTwister
(for reproducibility).
as_table=true
: whether to return the points as a table (true) or a matrix (false).
Example
X, y = make_regression(100, 5; noise=0.5, sparse=0.2, outliers=0.1)
MLJBase.outlify!
— MethodAdd outliers to portion s of vector.
MLJBase.runif_ab
— Methodrunif_ab(rng, n, p, a, b)
Internal function to generate n
points in [a, b]ᵖ
uniformly at random.
MLJBase.sigmoid
— Methodsigmoid(x)
Return the sigmoid computed in a numerically stable way:
$σ(x) = 1/(1+exp(-x))$
MLJBase.sparsify!
— Methodsparsify!(rng, θ, s)
Make portion s
of vector θ
exactly 0.
MLJBase.complement
— Methodcomplement(folds, i)
The complement of the i
th fold of folds
in the concatenation of all elements of folds
. Here folds
is a vector or tuple of integer vectors, typically representing row indices or a vector, matrix or table.
complement(([1,2], [3,], [4, 5]), 2) # [1 ,2, 4, 5]
MLJBase.corestrict
— Methodcorestrict(X, folds, i)
The restriction of X
, a vector, matrix or table, to the complement of the i
th fold of folds
, where folds
is a tuple of vectors of row indices.
The method is curried, so that corestrict(folds, i)
is the operator on data defined by corestrict(folds, i)(X) = corestrict(X, folds, i)
.
Example
folds = ([1, 2], [3, 4, 5], [6,])
+corestrict([:x1, :x2, :x3, :x4, :x5, :x6], folds, 2) # [:x1, :x2, :x6]
MLJBase.partition
— Methodpartition(X, fractions...;
shuffle=nothing,
rng=Random.GLOBAL_RNG,
stratify=nothing,
@@ -21,8 +21,8 @@
X, y = make_blobs() # a table and vector
Xtrain, Xtest = partition(X, 0.8, stratify=y)
-(Xtrain, Xtest), (ytrain, ytest) = partition((X, y), 0.8, rng=123, multi=true)
Keywords
shuffle=nothing
: if set to true
, shuffles the rows before taking fractions.
rng=Random.GLOBAL_RNG
: specifies the random number generator to be used, can be an integer seed. If specified, and shuffle === nothing
is interpreted as true.
stratify=nothing
: if a vector is specified, the partition will match the stratification of the given vector. In that case, shuffle
cannot be false
.
multi=false
: if true
then X
is expected to be a tuple
of objects sharing a common length, which are each partitioned separately using the same specified fractions
and the same row shuffling. Returns a tuple of partitions (a tuple of tuples).
MLJBase.restrict
— Methodrestrict(X, folds, i)
The restriction of X
, a vector, matrix or table, to the i
th fold of folds
, where folds
is a tuple of vectors of row indices.
The method is curried, so that restrict(folds, i)
is the operator on data defined by restrict(folds, i)(X) = restrict(X, folds, i)
.
Example
folds = ([1, 2], [3, 4, 5], [6,])
-restrict([:x1, :x2, :x3, :x4, :x5, :x6], folds, 2) # [:x3, :x4, :x5]
See also corestrict
MLJBase.skipinvalid
— Methodskipinvalid(itr)
Return an iterator over the elements in itr
skipping missing
and NaN
values. Behaviour is similar to skipmissing
.
skipinvalid(A, B)
For vectors A
and B
of the same length, return a tuple of vectors (A[mask], B[mask])
where mask[i]
is true
if and only if A[i]
and B[i]
are both valid (non-missing
and non-NaN
). Can also called on other iterators of matching length, such as arrays, but always returns a vector. Does not remove Missing
from the element types if present in the original iterators.
MLJBase.unpack
— Methodunpack(table, f1, f2, ... fk;
+(Xtrain, Xtest), (ytrain, ytest) = partition((X, y), 0.8, rng=123, multi=true)
Keywords
shuffle=nothing
: if set to true
, shuffles the rows before taking fractions.
rng=Random.GLOBAL_RNG
: specifies the random number generator to be used, can be an integer seed. If specified, and shuffle === nothing
is interpreted as true.
stratify=nothing
: if a vector is specified, the partition will match the stratification of the given vector. In that case, shuffle
cannot be false
.
multi=false
: if true
then X
is expected to be a tuple
of objects sharing a common length, which are each partitioned separately using the same specified fractions
and the same row shuffling. Returns a tuple of partitions (a tuple of tuples).
MLJBase.restrict
— Methodrestrict(X, folds, i)
The restriction of X
, a vector, matrix or table, to the i
th fold of folds
, where folds
is a tuple of vectors of row indices.
The method is curried, so that restrict(folds, i)
is the operator on data defined by restrict(folds, i)(X) = restrict(X, folds, i)
.
Example
folds = ([1, 2], [3, 4, 5], [6,])
+restrict([:x1, :x2, :x3, :x4, :x5, :x6], folds, 2) # [:x3, :x4, :x5]
See also corestrict
MLJBase.skipinvalid
— Methodskipinvalid(itr)
Return an iterator over the elements in itr
skipping missing
and NaN
values. Behaviour is similar to skipmissing
.
skipinvalid(A, B)
For vectors A
and B
of the same length, return a tuple of vectors (A[mask], B[mask])
where mask[i]
is true
if and only if A[i]
and B[i]
are both valid (non-missing
and non-NaN
). Can also called on other iterators of matching length, such as arrays, but always returns a vector. Does not remove Missing
from the element types if present in the original iterators.
MLJBase.unpack
— Methodunpack(table, f1, f2, ... fk;
wrap_singles=false,
shuffle=false,
rng::Union{AbstractRNG,Int,Nothing}=nothing,
@@ -51,4 +51,4 @@
julia> W # the column(s) left over
2-element Vector{String}:
"A"
- "B"
Whenever a returned table contains a single column, it is converted to a vector unless wrap_singles=true
.
If coerce_options
are specified then table
is first replaced with coerce(table, coerce_options)
. See ScientificTypes.coerce
for details.
If shuffle=true
then the rows of table
are first shuffled, using the global RNG, unless rng
is specified; if rng
is an integer, it specifies the seed of an automatically generated Mersenne twister. If rng
is specified then shuffle=true
is implicit.
Settings
This document was generated with Documenter.jl version 0.27.25 on Thursday 21 September 2023. Using Julia version 1.9.3.