diff --git a/dev/.documenter-siteinfo.json b/dev/.documenter-siteinfo.json index a800b2c0..c6c7a66d 100644 --- a/dev/.documenter-siteinfo.json +++ b/dev/.documenter-siteinfo.json @@ -1 +1 @@ -{"documenter":{"julia_version":"1.6.7","generation_timestamp":"2024-11-04T12:53:23","documenter_version":"1.7.0"}} \ No newline at end of file +{"documenter":{"julia_version":"1.6.7","generation_timestamp":"2024-12-15T17:22:58","documenter_version":"1.8.0"}} \ No newline at end of file diff --git a/dev/api/densenet/index.html b/dev/api/densenet/index.html index 3e106626..2ec834b2 100644 --- a/dev/api/densenet/index.html +++ b/dev/api/densenet/index.html @@ -1,5 +1,5 @@ DenseNet · Metalhead.jl

DenseNet

This is the API reference for the DenseNet model present in Metalhead.jl.

The higher level model

Metalhead.DenseNetType
DenseNet(config::Int; pretrain = false, growth_rate = 32,
-         reduction = 0.5, inchannels = 3, nclasses = 1000)

Create a DenseNet model with specified configuration. Currently supported values are (121, 161, 169, 201) (reference).

Arguments

  • config: the configuration of the model
  • pretrain: whether to load the model with pre-trained weights for ImageNet.
  • growth_rate: the output feature map growth probability of dense blocks (i.e. k in the ref)
  • reduction: the factor by which the number of feature maps is scaled across each transition
  • inchannels: the number of input channels
  • nclasses: the number of output classes
Warning

DenseNet does not currently support pretrained weights.

See also Metalhead.densenet.

source

The core function

Metalhead.densenetFunction
densenet(nblocks::AbstractVector{Int}; growth_rate = 32,
+         reduction = 0.5, inchannels = 3, nclasses = 1000)

Create a DenseNet model with specified configuration. Currently supported values are (121, 161, 169, 201) (reference).

Arguments

  • config: the configuration of the model
  • pretrain: whether to load the model with pre-trained weights for ImageNet.
  • growth_rate: the output feature map growth probability of dense blocks (i.e. k in the ref)
  • reduction: the factor by which the number of feature maps is scaled across each transition
  • inchannels: the number of input channels
  • nclasses: the number of output classes
Warning

DenseNet does not currently support pretrained weights.

See also Metalhead.densenet.

source

The core function

Metalhead.densenetFunction
densenet(nblocks::AbstractVector{Int}; growth_rate = 32,
          reduction = 0.5, dropout_prob = nothing, inchannels = 3,
-         nclasses = 1000)

Create a DenseNet model (reference).

Arguments

  • nblocks: number of dense blocks between transitions
  • growth_rate: the output feature map growth probability of dense blocks (i.e. k in the ref)
  • reduction: the factor by which the number of feature maps is scaled across each transition
  • dropout_prob: the dropout probability for the classifier head. Set to nothing to disable dropout
  • inchannels: the number of input channels
  • nclasses: the number of output classes
source
+ nclasses = 1000)

Create a DenseNet model (reference).

Arguments

source diff --git a/dev/api/efficientnet/index.html b/dev/api/efficientnet/index.html index 0f2872c6..f0da212d 100644 --- a/dev/api/efficientnet/index.html +++ b/dev/api/efficientnet/index.html @@ -1,6 +1,6 @@ EfficientNet family of models · Metalhead.jl

EfficientNet family of models

This is the API reference for the EfficientNet family of models supported by Metalhead.jl.

The higher-level model constructors

Metalhead.EfficientNetType
EfficientNet(config::Symbol; pretrain::Bool = false, inchannels::Integer = 3,
-             nclasses::Integer = 1000)

Create an EfficientNet model (reference).

Arguments

  • config: size of the model. Can be one of [:b0, :b1, :b2, :b3, :b4, :b5, :b6, :b7, :b8].
  • pretrain: set to true to load the pre-trained weights for ImageNet
  • inchannels: number of input channels.
  • nclasses: number of output classes.
Warning

EfficientNet does not currently support pretrained weights.

See also Metalhead.efficientnet.

source
Metalhead.EfficientNetv2Type
EfficientNetv2(config::Symbol; pretrain::Bool = false, inchannels::Integer = 3,
-               nclasses::Integer = 1000)

Create an EfficientNetv2 model (reference).

Arguments

  • config: size of the network (one of [:small, :medium, :large, :xlarge])
  • pretrain: whether to load the pre-trained weights for ImageNet
  • inchannels: number of input channels
  • nclasses: number of output classes
Warning

EfficientNetv2 does not currently support pretrained weights.

See also efficientnet.

source

The mid-level functions

Metalhead.efficientnetFunction
efficientnet(config::Symbol; norm_layer = BatchNorm, stochastic_depth_prob = 0.2,
-             dropout_prob = nothing, inchannels::Integer = 3, nclasses::Integer = 1000)

Create an EfficientNet model. (reference).

Arguments

  • config: size of the model. Can be one of [:b0, :b1, :b2, :b3, :b4, :b5, :b6, :b7, :b8].
  • norm_layer: normalization layer to use.
  • stochastic_depth_prob: probability of stochastic depth. Set to nothing to disable stochastic depth.
  • dropout_prob: probability of dropout in the classifier head. Set to nothing to disable dropout.
  • inchannels: number of input channels.
  • nclasses: number of output classes.
source
Metalhead.efficientnetv2Function
efficientnetv2(config::Symbol; norm_layer = BatchNorm, stochastic_depth_prob = 0.2,
-               dropout_prob = nothing, inchannels::Integer = 3, nclasses::Integer = 1000)

Create an EfficientNetv2 model. (reference).

Arguments

  • config: size of the network (one of [:small, :medium, :large, :xlarge])
  • norm_layer: normalization layer to use.
  • stochastic_depth_prob: probability of stochastic depth. Set to nothing to disable stochastic depth.
  • dropout_prob: probability of dropout in the classifier head. Set to nothing to disable dropout.
  • inchannels: number of input channels.
  • nclasses: number of output classes.
source
+ nclasses::Integer = 1000)

Create an EfficientNet model (reference).

Arguments

Warning

EfficientNet does not currently support pretrained weights.

See also Metalhead.efficientnet.

source
Metalhead.EfficientNetv2Type
EfficientNetv2(config::Symbol; pretrain::Bool = false, inchannels::Integer = 3,
+               nclasses::Integer = 1000)

Create an EfficientNetv2 model (reference).

Arguments

  • config: size of the network (one of [:small, :medium, :large, :xlarge])
  • pretrain: whether to load the pre-trained weights for ImageNet
  • inchannels: number of input channels
  • nclasses: number of output classes
Warning

EfficientNetv2 does not currently support pretrained weights.

See also efficientnet.

source

The mid-level functions

Metalhead.efficientnetFunction
efficientnet(config::Symbol; norm_layer = BatchNorm, stochastic_depth_prob = 0.2,
+             dropout_prob = nothing, inchannels::Integer = 3, nclasses::Integer = 1000)

Create an EfficientNet model. (reference).

Arguments

  • config: size of the model. Can be one of [:b0, :b1, :b2, :b3, :b4, :b5, :b6, :b7, :b8].
  • norm_layer: normalization layer to use.
  • stochastic_depth_prob: probability of stochastic depth. Set to nothing to disable stochastic depth.
  • dropout_prob: probability of dropout in the classifier head. Set to nothing to disable dropout.
  • inchannels: number of input channels.
  • nclasses: number of output classes.
source
Metalhead.efficientnetv2Function
efficientnetv2(config::Symbol; norm_layer = BatchNorm, stochastic_depth_prob = 0.2,
+               dropout_prob = nothing, inchannels::Integer = 3, nclasses::Integer = 1000)

Create an EfficientNetv2 model. (reference).

Arguments

  • config: size of the network (one of [:small, :medium, :large, :xlarge])
  • norm_layer: normalization layer to use.
  • stochastic_depth_prob: probability of stochastic depth. Set to nothing to disable stochastic depth.
  • dropout_prob: probability of dropout in the classifier head. Set to nothing to disable dropout.
  • inchannels: number of input channels.
  • nclasses: number of output classes.
source
diff --git a/dev/api/hybrid/index.html b/dev/api/hybrid/index.html index 164d25fd..3ad7aa0c 100644 --- a/dev/api/hybrid/index.html +++ b/dev/api/hybrid/index.html @@ -1,7 +1,7 @@ Hybrid CNN architectures · Metalhead.jl

Hybrid CNN architectures

These models are hybrid CNN architectures that borrow certain ideas from vision transformer models.

The higher-level model constructors

Metalhead.ConvMixerType
ConvMixer(config::Symbol; pretrain::Bool = false, inchannels::Integer = 3,
-          nclasses::Integer = 1000)

Creates a ConvMixer model. (reference)

Arguments

  • config: the size of the model, either :base, :small or :large
  • pretrain: whether to load the pre-trained weights for ImageNet
  • inchannels: number of input channels
  • nclasses: number of classes in the output
Warning

ConvMixer does not currently support pretrained weights.

See also Metalhead.convmixer.

source
Metalhead.ConvNeXtType
ConvNeXt(config::Symbol; pretrain::Bool = true, inchannels::Integer = 3,
-         nclasses::Integer = 1000)

Creates a ConvNeXt model. (reference)

Arguments

  • config: The size of the model, one of tiny, small, base, large or xlarge.
  • pretrain: set to true to load pre-trained weights for ImageNet
  • inchannels: number of input channels
  • nclasses: number of output classes
Warning

ConvNeXt does not currently support pretrained weights.

See also Metalhead.convnext.

source

The mid-level functions

Metalhead.convmixerFunction
convmixer(planes::Integer, depth::Integer; kernel_size::Dims{2} = (9, 9),
+          nclasses::Integer = 1000)

Creates a ConvMixer model. (reference)

Arguments

  • config: the size of the model, either :base, :small or :large
  • pretrain: whether to load the pre-trained weights for ImageNet
  • inchannels: number of input channels
  • nclasses: number of classes in the output
Warning

ConvMixer does not currently support pretrained weights.

See also Metalhead.convmixer.

source
Metalhead.ConvNeXtType
ConvNeXt(config::Symbol; pretrain::Bool = true, inchannels::Integer = 3,
+         nclasses::Integer = 1000)

Creates a ConvNeXt model. (reference)

Arguments

  • config: The size of the model, one of tiny, small, base, large or xlarge.
  • pretrain: set to true to load pre-trained weights for ImageNet
  • inchannels: number of input channels
  • nclasses: number of output classes
Warning

ConvNeXt does not currently support pretrained weights.

See also Metalhead.convnext.

source

The mid-level functions

Metalhead.convmixerFunction
convmixer(planes::Integer, depth::Integer; kernel_size::Dims{2} = (9, 9),
           patch_size::Dims{2} = (7, 7), activation = gelu,
-          inchannels::Integer = 3, nclasses::Integer = 1000)

Creates a ConvMixer model. (reference)

Arguments

  • planes: number of planes in the output of each block
  • depth: number of layers
  • kernel_size: kernel size of the convolutional layers
  • patch_size: size of the patches
  • activation: activation function used after the convolutional layers
  • inchannels: number of input channels
  • nclasses: number of classes in the output
source
Metalhead.convnextFunction
convnext(config::Symbol; stochastic_depth_prob = 0.0, layerscale_init = 1.0f-6,
-         inchannels::Integer = 3, nclasses::Integer = 1000)

Creates a ConvNeXt model. (reference)

Arguments

  • config: The size of the model, one of tiny, small, base, large or xlarge.
  • stochastic_depth_prob: Stochastic depth probability.
  • layerscale_init: Initial value for LayerScale (reference)
  • inchannels: number of input channels.
  • nclasses: number of output classes
source
+ inchannels::Integer = 3, nclasses::Integer = 1000)

Creates a ConvMixer model. (reference)

Arguments

source
Metalhead.convnextFunction
convnext(config::Symbol; stochastic_depth_prob = 0.0, layerscale_init = 1.0f-6,
+         inchannels::Integer = 3, nclasses::Integer = 1000)

Creates a ConvNeXt model. (reference)

Arguments

  • config: The size of the model, one of tiny, small, base, large or xlarge.
  • stochastic_depth_prob: Stochastic depth probability.
  • layerscale_init: Initial value for LayerScale (reference)
  • inchannels: number of input channels.
  • nclasses: number of output classes
source
diff --git a/dev/api/inception/index.html b/dev/api/inception/index.html index f1df9543..0ac66adf 100644 --- a/dev/api/inception/index.html +++ b/dev/api/inception/index.html @@ -1,4 +1,4 @@ -Inception family of models · Metalhead.jl

Inception family of models

This is the API reference for the Inception family of models supported by Metalhead.jl.

The higher-level model constructors

Metalhead.GoogLeNetType
GoogLeNet(; pretrain::Bool = false, inchannels::Integer = 3, nclasses::Integer = 1000)

Create an Inception-v1 model (commonly referred to as GoogLeNet) (reference).

Arguments

  • pretrain: set to true to load the model with pre-trained weights for ImageNet
  • nclasses: the number of output classes
  • batchnorm: set to true to use batch normalization after each convolution
  • bias: set to true to use bias in the convolution layers
Warning

GoogLeNet does not currently support pretrained weights.

See also Metalhead.googlenet.

source
Metalhead.Inceptionv3Type
Inceptionv3(; pretrain::Bool = false, inchannels::Integer = 3, nclasses::Integer = 1000)

Create an Inception-v3 model (reference).

Arguments

  • pretrain: set to true to load the pre-trained weights for ImageNet
  • inchannels: number of input channels
  • nclasses: the number of output classes
Warning

Inceptionv3 does not currently support pretrained weights.

See also Metalhead.inceptionv3.

source
Metalhead.Inceptionv4Type
Inceptionv4(; pretrain::Bool = false, inchannels::Integer = 3,
-            nclasses::Integer = 1000)

Creates an Inceptionv4 model. (reference)

Arguments

  • pretrain: set to true to load the pre-trained weights for ImageNet
  • inchannels: number of input channels.
  • nclasses: the number of output classes.
Warning

Inceptionv4 does not currently support pretrained weights.

See also Metalhead.inceptionv4.

source
Metalhead.InceptionResNetv2Type
InceptionResNetv2(; pretrain::Bool = false, inchannels::Integer = 3, 
-                  nclasses::Integer = 1000)

Creates an InceptionResNetv2 model. (reference)

Arguments

  • pretrain: set to true to load the pre-trained weights for ImageNet
  • inchannels: number of input channels.
  • nclasses: the number of output classes.
Warning

InceptionResNetv2 does not currently support pretrained weights.

See also Metalhead.inceptionresnetv2.

source
Metalhead.XceptionType
Xception(; pretrain::Bool = false, inchannels::Integer = 3, nclasses::Integer = 1000)

Creates an Xception model. (reference)

Arguments

  • pretrain: set to true to load the pre-trained weights for ImageNet.
  • inchannels: number of input channels.
  • nclasses: the number of output classes.
Warning

Xception does not currently support pretrained weights.

See also Metalhead.xception.

source

The mid-level functions

Metalhead.googlenetFunction
googlenet(; dropout_prob = 0.4, inchannels::Integer = 3, nclasses::Integer = 1000)

Create an Inception-v1 model (commonly referred to as GoogLeNet) (reference).

Arguments

  • dropout_prob: the dropout probability in the classifier head. Set to nothing to disable dropout.
  • inchannels: the number of input channels
  • nclasses: the number of output classes
  • batchnorm: set to true to include batch normalization after each convolution
  • bias: set to true to use bias in the convolution layers
source
Metalhead.inceptionv3Function
inceptionv3(; dropout_prob = 0.2, inchannels::Integer = 3, nclasses::Integer = 1000)

Create an Inception-v3 model (reference).

Arguments

  • dropout_prob: the dropout probability in the classifier head. Set to nothing to disable dropout.
  • inchannels: number of input feature maps
  • nclasses: the number of output classes
source
Metalhead.inceptionv4Function
inceptionv4(; dropout_prob = nothing, inchannels::Integer = 3, nclasses::Integer = 1000)

Create an Inceptionv4 model. (reference)

Arguments

  • dropout_prob: probability of dropout in classifier head. Set to nothing to disable dropout.
  • inchannels: number of input channels.
  • nclasses: the number of output classes.
source
Metalhead.inceptionresnetv2Function
inceptionresnetv2(; inchannels::Integer = 3, dropout_prob = nothing, nclasses::Integer = 1000)

Creates an InceptionResNetv2 model. (reference)

Arguments

  • dropout_prob: probability of dropout in classifier head. Set to nothing to disable dropout.
  • inchannels: number of input channels.
  • nclasses: the number of output classes.
source
Metalhead.xceptionFunction
xception(; dropout_prob = nothing, inchannels::Integer = 3, nclasses::Integer = 1000)

Creates an Xception model. (reference)

Arguments

  • dropout_prob: probability of dropout in classifier head. Set to nothing to disable dropout.
  • inchannels: number of input channels.
  • nclasses: the number of output classes.
source
+Inception family of models · Metalhead.jl

Inception family of models

This is the API reference for the Inception family of models supported by Metalhead.jl.

The higher-level model constructors

Metalhead.GoogLeNetType
GoogLeNet(; pretrain::Bool = false, inchannels::Integer = 3, nclasses::Integer = 1000)

Create an Inception-v1 model (commonly referred to as GoogLeNet) (reference).

Arguments

  • pretrain: set to true to load the model with pre-trained weights for ImageNet
  • nclasses: the number of output classes
  • batchnorm: set to true to use batch normalization after each convolution
  • bias: set to true to use bias in the convolution layers
Warning

GoogLeNet does not currently support pretrained weights.

See also Metalhead.googlenet.

source
Metalhead.Inceptionv3Type
Inceptionv3(; pretrain::Bool = false, inchannels::Integer = 3, nclasses::Integer = 1000)

Create an Inception-v3 model (reference).

Arguments

  • pretrain: set to true to load the pre-trained weights for ImageNet
  • inchannels: number of input channels
  • nclasses: the number of output classes
Warning

Inceptionv3 does not currently support pretrained weights.

See also Metalhead.inceptionv3.

source
Metalhead.Inceptionv4Type
Inceptionv4(; pretrain::Bool = false, inchannels::Integer = 3,
+            nclasses::Integer = 1000)

Creates an Inceptionv4 model. (reference)

Arguments

  • pretrain: set to true to load the pre-trained weights for ImageNet
  • inchannels: number of input channels.
  • nclasses: the number of output classes.
Warning

Inceptionv4 does not currently support pretrained weights.

See also Metalhead.inceptionv4.

source
Metalhead.InceptionResNetv2Type
InceptionResNetv2(; pretrain::Bool = false, inchannels::Integer = 3, 
+                  nclasses::Integer = 1000)

Creates an InceptionResNetv2 model. (reference)

Arguments

  • pretrain: set to true to load the pre-trained weights for ImageNet
  • inchannels: number of input channels.
  • nclasses: the number of output classes.
Warning

InceptionResNetv2 does not currently support pretrained weights.

See also Metalhead.inceptionresnetv2.

source
Metalhead.XceptionType
Xception(; pretrain::Bool = false, inchannels::Integer = 3, nclasses::Integer = 1000)

Creates an Xception model. (reference)

Arguments

  • pretrain: set to true to load the pre-trained weights for ImageNet.
  • inchannels: number of input channels.
  • nclasses: the number of output classes.
Warning

Xception does not currently support pretrained weights.

See also Metalhead.xception.

source

The mid-level functions

Metalhead.googlenetFunction
googlenet(; dropout_prob = 0.4, inchannels::Integer = 3, nclasses::Integer = 1000)

Create an Inception-v1 model (commonly referred to as GoogLeNet) (reference).

Arguments

  • dropout_prob: the dropout probability in the classifier head. Set to nothing to disable dropout.
  • inchannels: the number of input channels
  • nclasses: the number of output classes
  • batchnorm: set to true to include batch normalization after each convolution
  • bias: set to true to use bias in the convolution layers
source
Metalhead.inceptionv3Function
inceptionv3(; dropout_prob = 0.2, inchannels::Integer = 3, nclasses::Integer = 1000)

Create an Inception-v3 model (reference).

Arguments

  • dropout_prob: the dropout probability in the classifier head. Set to nothing to disable dropout.
  • inchannels: number of input feature maps
  • nclasses: the number of output classes
source
Metalhead.inceptionv4Function
inceptionv4(; dropout_prob = nothing, inchannels::Integer = 3, nclasses::Integer = 1000)

Create an Inceptionv4 model. (reference)

Arguments

  • dropout_prob: probability of dropout in classifier head. Set to nothing to disable dropout.
  • inchannels: number of input channels.
  • nclasses: the number of output classes.
source
Metalhead.inceptionresnetv2Function
inceptionresnetv2(; inchannels::Integer = 3, dropout_prob = nothing, nclasses::Integer = 1000)

Creates an InceptionResNetv2 model. (reference)

Arguments

  • dropout_prob: probability of dropout in classifier head. Set to nothing to disable dropout.
  • inchannels: number of input channels.
  • nclasses: the number of output classes.
source
Metalhead.xceptionFunction
xception(; dropout_prob = nothing, inchannels::Integer = 3, nclasses::Integer = 1000)

Creates an Xception model. (reference)

Arguments

  • dropout_prob: probability of dropout in classifier head. Set to nothing to disable dropout.
  • inchannels: number of input channels.
  • nclasses: the number of output classes.
source
diff --git a/dev/api/layers_adv/index.html b/dev/api/layers_adv/index.html index dd63b8dd..36f7cbaa 100644 --- a/dev/api/layers_adv/index.html +++ b/dev/api/layers_adv/index.html @@ -1,17 +1,17 @@ More advanced layers · Metalhead.jl

More advanced layers

This page contains the API reference for some more advanced layers present in the Layers module. These layers are used in Metalhead.jl to build more complex models, and can also be used by the user to build custom models. For a more basic introduction to the Layers module, please refer to the introduction guide for the Layers module.

Squeeze-and-excitation blocks

These are used in models like SE-ResNet and SE-ResNeXt, as well as in the design of inverted residual blocks used in the MobileNet and EfficientNet family of models.

Metalhead.Layers.squeeze_exciteFunction
squeeze_excite(inplanes::Integer; reduction::Real = 16, round_fn = _round_channels, 
-               norm_layer = identity, activation = relu, gate_activation = sigmoid)

Creates a squeeze-and-excitation layer used in MobileNets, EfficientNets and SE-ResNets.

Arguments

  • inplanes: The number of input feature maps
  • reduction: The reduction factor for the number of hidden feature maps in the squeeze and excite layer. The number of hidden feature maps is calculated as round_fn(inplanes / reduction).
  • round_fn: The function to round the number of reduced feature maps.
  • activation: The activation function for the first convolution layer
  • gate_activation: The activation function for the gate layer
  • norm_layer: The normalization layer to be used after the convolution layers
  • rd_planes: The number of hidden feature maps in a squeeze and excite layer
source

Inverted residual blocks

These blocks are designed to be used in the MobileNet and EfficientNet family of convolutional neural networks.

Metalhead.Layers.dwsep_conv_normFunction
dwsep_conv_norm(kernel_size::Dims{2}, inplanes::Integer, outplanes::Integer,
+               norm_layer = identity, activation = relu, gate_activation = sigmoid)

Creates a squeeze-and-excitation layer used in MobileNets, EfficientNets and SE-ResNets.

Arguments

  • inplanes: The number of input feature maps
  • reduction: The reduction factor for the number of hidden feature maps in the squeeze and excite layer. The number of hidden feature maps is calculated as round_fn(inplanes / reduction).
  • round_fn: The function to round the number of reduced feature maps.
  • activation: The activation function for the first convolution layer
  • gate_activation: The activation function for the gate layer
  • norm_layer: The normalization layer to be used after the convolution layers
  • rd_planes: The number of hidden feature maps in a squeeze and excite layer
source

Inverted residual blocks

These blocks are designed to be used in the MobileNet and EfficientNet family of convolutional neural networks.

Metalhead.Layers.dwsep_conv_normFunction
dwsep_conv_norm(kernel_size::Dims{2}, inplanes::Integer, outplanes::Integer,
                 activation = relu; norm_layer = BatchNorm, stride::Integer = 1,
-                bias::Bool = !(norm_layer !== identity), pad::Integer = 0, [bias, weight, init])

Create a depthwise separable convolution chain as used in MobileNetv1. This is sequence of layers:

  • a kernel_size depthwise convolution from inplanes => inplanes
  • a (batch) normalisation layer + activation (if norm_layer !== identity; otherwise activation is applied to the convolution output)
  • a kernel_size convolution from inplanes => outplanes
  • a (batch) normalisation layer + activation (if norm_layer !== identity; otherwise activation is applied to the convolution output)

See Fig. 3 in reference.

Arguments

  • kernel_size: size of the convolution kernel (tuple)
  • inplanes: number of input feature maps
  • outplanes: number of output feature maps
  • activation: the activation function for the final layer
  • norm_layer: the normalisation layer used. Note that using identity as the normalisation layer will result in no normalisation being applied.
  • bias: whether to use bias in the convolution layers.
  • stride: stride of the first convolution kernel
  • pad: padding of the first convolution kernel
  • weight, init: initialization for the convolution kernel (see Flux.Conv)
source
Metalhead.Layers.mbconvFunction
mbconv(kernel_size::Dims{2}, inplanes::Integer, explanes::Integer,
+                bias::Bool = !(norm_layer !== identity), pad::Integer = 0, [bias, weight, init])

Create a depthwise separable convolution chain as used in MobileNetv1. This is sequence of layers:

  • a kernel_size depthwise convolution from inplanes => inplanes
  • a (batch) normalisation layer + activation (if norm_layer !== identity; otherwise activation is applied to the convolution output)
  • a kernel_size convolution from inplanes => outplanes
  • a (batch) normalisation layer + activation (if norm_layer !== identity; otherwise activation is applied to the convolution output)

See Fig. 3 in reference.

Arguments

  • kernel_size: size of the convolution kernel (tuple)
  • inplanes: number of input feature maps
  • outplanes: number of output feature maps
  • activation: the activation function for the final layer
  • norm_layer: the normalisation layer used. Note that using identity as the normalisation layer will result in no normalisation being applied.
  • bias: whether to use bias in the convolution layers.
  • stride: stride of the first convolution kernel
  • pad: padding of the first convolution kernel
  • weight, init: initialization for the convolution kernel (see Flux.Conv)
source
Metalhead.Layers.mbconvFunction
mbconv(kernel_size::Dims{2}, inplanes::Integer, explanes::Integer,
        outplanes::Integer, activation = relu; stride::Integer,
        reduction::Union{Nothing, Real} = nothing,
-       se_round_fn = x -> round(Int, x), norm_layer = BatchNorm, kwargs...)

Create a basic inverted residual block for MobileNet and Efficient variants. This is a sequence of layers:

  • a 1x1 convolution from inplanes => explanes followed by a (batch) normalisation layer

  • activation if inplanes != explanes

  • a kernel_size depthwise separable convolution from explanes => explanes

  • a (batch) normalisation layer

  • a squeeze-and-excitation block (if reduction != nothing) from explanes => se_round_fn(explanes / reduction) and back to explanes

  • a 1x1 convolution from explanes => outplanes

  • a (batch) normalisation layer + activation

Warning

This function does not handle the residual connection by default. The user must add this manually to use this block as a standalone. To construct a model, check out the builders, which handle the residual connection and other details.

First introduced in the MobileNetv2 paper. (See Fig. 3 in reference.)

Arguments

  • kernel_size: kernel size of the convolutional layers
  • inplanes: number of input feature maps
  • explanes: The number of expanded feature maps. This is the number of feature maps after the first 1x1 convolution.
  • outplanes: The number of output feature maps
  • activation: The activation function for the first two convolution layer
  • stride: The stride of the convolutional kernel, has to be either 1 or 2
  • reduction: The reduction factor for the number of hidden feature maps in a squeeze and excite layer (see squeeze_excite)
  • se_round_fn: The function to round the number of reduced feature maps in the squeeze and excite layer
  • norm_layer: The normalization layer to use
source
Metalhead.Layers.fused_mbconvFunction
fused_mbconv(kernel_size::Dims{2}, inplanes::Integer, explanes::Integer,
+       se_round_fn = x -> round(Int, x), norm_layer = BatchNorm, kwargs...)

Create a basic inverted residual block for MobileNet and Efficient variants. This is a sequence of layers:

  • a 1x1 convolution from inplanes => explanes followed by a (batch) normalisation layer

  • activation if inplanes != explanes

  • a kernel_size depthwise separable convolution from explanes => explanes

  • a (batch) normalisation layer

  • a squeeze-and-excitation block (if reduction != nothing) from explanes => se_round_fn(explanes / reduction) and back to explanes

  • a 1x1 convolution from explanes => outplanes

  • a (batch) normalisation layer + activation

Warning

This function does not handle the residual connection by default. The user must add this manually to use this block as a standalone. To construct a model, check out the builders, which handle the residual connection and other details.

First introduced in the MobileNetv2 paper. (See Fig. 3 in reference.)

Arguments

  • kernel_size: kernel size of the convolutional layers
  • inplanes: number of input feature maps
  • explanes: The number of expanded feature maps. This is the number of feature maps after the first 1x1 convolution.
  • outplanes: The number of output feature maps
  • activation: The activation function for the first two convolution layer
  • stride: The stride of the convolutional kernel, has to be either 1 or 2
  • reduction: The reduction factor for the number of hidden feature maps in a squeeze and excite layer (see squeeze_excite)
  • se_round_fn: The function to round the number of reduced feature maps in the squeeze and excite layer
  • norm_layer: The normalization layer to use
source
Metalhead.Layers.fused_mbconvFunction
fused_mbconv(kernel_size::Dims{2}, inplanes::Integer, explanes::Integer,
              outplanes::Integer, activation = relu;
-             stride::Integer, norm_layer = BatchNorm)

Create a fused inverted residual block.

This is a sequence of layers:

  • a kernel_size depthwise separable convolution from explanes => explanes
  • a (batch) normalisation layer
  • a 1x1 convolution from explanes => outplanes followed by a (batch) normalisation layer + activation if inplanes != explanes
Warning

This function does not handle the residual connection by default. The user must add this manually to use this block as a standalone. To construct a model, check out the builders, which handle the residual connection and other details.

Originally introduced by Google in EfficientNet-EdgeTPU: Creating Accelerator-Optimized Neural Networks with AutoML. Later used in the EfficientNetv2 paper.

Arguments

  • kernel_size: kernel size of the convolutional layers
  • inplanes: number of input feature maps
  • explanes: The number of expanded feature maps
  • outplanes: The number of output feature maps
  • activation: The activation function for the first two convolution layer
  • stride: The stride of the convolutional kernel, has to be either 1 or 2
  • norm_layer: The normalization layer to use
source

The Layers module contains specific layers that are used to build vision transformer (ViT)-inspired models:

Metalhead.Layers.MultiHeadSelfAttentionType
MultiHeadSelfAttention(planes::Integer, nheads::Integer = 8; qkv_bias::Bool = false, 
-            attn_dropout_prob = 0., proj_dropout_prob = 0.)

Multi-head self-attention layer.

Arguments

  • planes: number of input channels
  • nheads: number of heads
  • qkv_bias: whether to use bias in the layer to get the query, key and value
  • attn_dropout_prob: dropout probability after the self-attention layer
  • proj_dropout_prob: dropout probability after the projection layer
source
Metalhead.Layers.ClassTokensType
ClassTokens(planes::Integer; init = Flux.zeros32)

Appends class tokens to an input with embedding dimension planes for use in many vision transformer models.

source
Metalhead.Layers.ViPosEmbeddingType
ViPosEmbedding(embedsize::Integer, npatches::Integer; 
-               init = (dims::Dims{2}) -> rand(Float32, dims))

Positional embedding layer used by many vision transformer-like models.

source
Metalhead.Layers.PatchEmbeddingFunction
PatchEmbedding(imsize::Dims{2} = (224, 224); inchannels::Integer = 3,
+             stride::Integer, norm_layer = BatchNorm)

Create a fused inverted residual block.

This is a sequence of layers:

  • a kernel_size depthwise separable convolution from explanes => explanes
  • a (batch) normalisation layer
  • a 1x1 convolution from explanes => outplanes followed by a (batch) normalisation layer + activation if inplanes != explanes
Warning

This function does not handle the residual connection by default. The user must add this manually to use this block as a standalone. To construct a model, check out the builders, which handle the residual connection and other details.

Originally introduced by Google in EfficientNet-EdgeTPU: Creating Accelerator-Optimized Neural Networks with AutoML. Later used in the EfficientNetv2 paper.

Arguments

  • kernel_size: kernel size of the convolutional layers
  • inplanes: number of input feature maps
  • explanes: The number of expanded feature maps
  • outplanes: The number of output feature maps
  • activation: The activation function for the first two convolution layer
  • stride: The stride of the convolutional kernel, has to be either 1 or 2
  • norm_layer: The normalization layer to use
source

The Layers module contains specific layers that are used to build vision transformer (ViT)-inspired models:

Metalhead.Layers.MultiHeadSelfAttentionType
MultiHeadSelfAttention(planes::Integer, nheads::Integer = 8; qkv_bias::Bool = false, 
+            attn_dropout_prob = 0., proj_dropout_prob = 0.)

Multi-head self-attention layer.

Arguments

  • planes: number of input channels
  • nheads: number of heads
  • qkv_bias: whether to use bias in the layer to get the query, key and value
  • attn_dropout_prob: dropout probability after the self-attention layer
  • proj_dropout_prob: dropout probability after the projection layer
source
Metalhead.Layers.ClassTokensType
ClassTokens(planes::Integer; init = Flux.zeros32)

Appends class tokens to an input with embedding dimension planes for use in many vision transformer models.

source
Metalhead.Layers.ViPosEmbeddingType
ViPosEmbedding(embedsize::Integer, npatches::Integer; 
+               init = (dims::Dims{2}) -> rand(Float32, dims))

Positional embedding layer used by many vision transformer-like models.

source
Metalhead.Layers.PatchEmbeddingFunction
PatchEmbedding(imsize::Dims{2} = (224, 224); inchannels::Integer = 3,
                patch_size::Dims{2} = (16, 16), embedplanes = 768,
-               norm_layer = planes -> identity, flatten = true)

Patch embedding layer used by many vision transformer-like models to split the input image into patches.

Arguments

  • imsize: the size of the input image
  • inchannels: number of input channels
  • patch_size: the size of the patches
  • embedplanes: the number of channels in the embedding
  • norm_layer: the normalization layer - by default the identity function but otherwise takes a single argument constructor for a normalization layer like LayerNorm or BatchNorm
  • flatten: set true to flatten the input spatial dimensions after the embedding
source

Apart from this, the Layers module also contains certain blocks used in MLPMixer-style models:

Metalhead.Layers.gated_mlp_blockFunction
gated_mlp(gate_layer, inplanes::Integer, hidden_planes::Integer, 
-          outplanes::Integer = inplanes; dropout_prob = 0.0, activation = gelu)

Feedforward block based on the implementation in the paper "Pay Attention to MLPs". (reference)

Arguments

  • gate_layer: Layer to use for the gating.
  • inplanes: Number of dimensions in the input.
  • hidden_planes: Number of dimensions in the intermediate layer.
  • outplanes: Number of dimensions in the output - by default it is the same as inplanes.
  • dropout_prob: Dropout probability.
  • activation: Activation function to use.
source
Metalhead.Layers.mlp_blockFunction
mlp_block(inplanes::Integer, hidden_planes::Integer, outplanes::Integer = inplanes; 
-          dropout_prob = 0., activation = gelu)

Feedforward block used in many MLPMixer-like and vision-transformer models.

Arguments

  • inplanes: Number of dimensions in the input.
  • hidden_planes: Number of dimensions in the intermediate layer.
  • outplanes: Number of dimensions in the output - by default it is the same as inplanes.
  • dropout_prob: Dropout probability.
  • activation: Activation function to use.
source

Miscellaneous utilities for layers

These are some miscellaneous utilities present in the Layers module, and are used with other custom/inbuilt layers to make certain common operations in neural networks easier.

Metalhead.Layers.inputscaleFunction
inputscale(λ; activation = identity)

Scale the input by a scalar λ and applies an activation function to it. Equivalent to activation.(λ .* x).

source
Metalhead.Layers.actaddFunction
actadd(activation = relu, xs...)

Convenience function for summing up the input arrays after applying an activation function to them. Useful as the connection argument for the block function in Metalhead.resnet.

source
Metalhead.Layers.addactFunction
addact(activation = relu, xs...)

Convenience function for applying an activation function to the output after summing up the input arrays. Useful as the connection argument for the block function in Metalhead.resnet.

source
Metalhead.Layers.cat_channelsFunction
cat_channels(x, y, zs...)

Concatenate x and y (and any zs) along the channel dimension (third dimension). Equivalent to cat(x, y, zs...; dims=3). Convenient reduction operator for use with Parallel.

source
Metalhead.Layers.flatten_chainsFunction
flatten_chains(m::Chain)
-flatten_chains(m)

Convenience function for traversing nested layers of a Chain object and flatten them into a single iterator.

source
Metalhead.Layers.swapdimsFunction
swapdims(perm)

Convenience function that returns a closure which permutes the dimensions of an array. perm is a vector or tuple specifying a permutation of the input dimensions. Equivalent to permutedims(x, perm).

source
+ norm_layer = planes -> identity, flatten = true)

Patch embedding layer used by many vision transformer-like models to split the input image into patches.

Arguments

source

Apart from this, the Layers module also contains certain blocks used in MLPMixer-style models:

Metalhead.Layers.gated_mlp_blockFunction
gated_mlp(gate_layer, inplanes::Integer, hidden_planes::Integer, 
+          outplanes::Integer = inplanes; dropout_prob = 0.0, activation = gelu)

Feedforward block based on the implementation in the paper "Pay Attention to MLPs". (reference)

Arguments

  • gate_layer: Layer to use for the gating.
  • inplanes: Number of dimensions in the input.
  • hidden_planes: Number of dimensions in the intermediate layer.
  • outplanes: Number of dimensions in the output - by default it is the same as inplanes.
  • dropout_prob: Dropout probability.
  • activation: Activation function to use.
source
Metalhead.Layers.mlp_blockFunction
mlp_block(inplanes::Integer, hidden_planes::Integer, outplanes::Integer = inplanes; 
+          dropout_prob = 0., activation = gelu)

Feedforward block used in many MLPMixer-like and vision-transformer models.

Arguments

  • inplanes: Number of dimensions in the input.
  • hidden_planes: Number of dimensions in the intermediate layer.
  • outplanes: Number of dimensions in the output - by default it is the same as inplanes.
  • dropout_prob: Dropout probability.
  • activation: Activation function to use.
source

Miscellaneous utilities for layers

These are some miscellaneous utilities present in the Layers module, and are used with other custom/inbuilt layers to make certain common operations in neural networks easier.

Metalhead.Layers.inputscaleFunction
inputscale(λ; activation = identity)

Scale the input by a scalar λ and applies an activation function to it. Equivalent to activation.(λ .* x).

source
Metalhead.Layers.actaddFunction
actadd(activation = relu, xs...)

Convenience function for summing up the input arrays after applying an activation function to them. Useful as the connection argument for the block function in Metalhead.resnet.

source
Metalhead.Layers.addactFunction
addact(activation = relu, xs...)

Convenience function for applying an activation function to the output after summing up the input arrays. Useful as the connection argument for the block function in Metalhead.resnet.

source
Metalhead.Layers.cat_channelsFunction
cat_channels(x, y, zs...)

Concatenate x and y (and any zs) along the channel dimension (third dimension). Equivalent to cat(x, y, zs...; dims=3). Convenient reduction operator for use with Parallel.

source
Metalhead.Layers.flatten_chainsFunction
flatten_chains(m::Chain)
+flatten_chains(m)

Convenience function for traversing nested layers of a Chain object and flatten them into a single iterator.

source
Metalhead.Layers.swapdimsFunction
swapdims(perm)

Convenience function that returns a closure which permutes the dimensions of an array. perm is a vector or tuple specifying a permutation of the input dimensions. Equivalent to permutedims(x, perm).

source
diff --git a/dev/api/layers_intro/index.html b/dev/api/layers_intro/index.html index acbb05b2..8bd2ccc7 100644 --- a/dev/api/layers_intro/index.html +++ b/dev/api/layers_intro/index.html @@ -3,16 +3,16 @@ using Metalhead.Layers

Convolution + Normalisation: the conv_norm layer

One of the most common patterns in modern neural networks is to have a convolutional layer followed by a normalisation layer. Most major deep learning libraries have a way to combine these two layers into a single layer. In Metalhead.jl, this is done with the Metalhead.Layers.conv_norm layer. The function signature for this is given below:

Metalhead.Layers.conv_normFunction
conv_norm(kernel_size::Dims{2}, inplanes::Integer, outplanes::Integer,
           activation = relu; norm_layer = BatchNorm, revnorm::Bool = false,
           preact::Bool = false, stride::Integer = 1, pad::Integer = 0,
-          dilation::Integer = 1, groups::Integer = 1, [bias, weight, init])

Create a convolution + normalisation layer pair with activation.

Arguments

  • kernel_size: size of the convolution kernel (tuple)
  • inplanes: number of input feature maps
  • outplanes: number of output feature maps
  • activation: the activation function for the final layer
  • norm_layer: the normalisation layer used. Note that using identity as the normalisation layer will result in no normalisation being applied. (This is only compatible with preact and revnorm both set to false.)
  • revnorm: set to true to place the normalisation layer before the convolution
  • preact: set to true to place the activation function before the normalisation layer (only compatible with revnorm = false)
  • bias: bias for the convolution kernel. This is set to false by default if norm_layer is not identity and true otherwise.
  • stride: stride of the convolution kernel
  • pad: padding of the convolution kernel
  • dilation: dilation of the convolution kernel
  • groups: groups for the convolution kernel
  • weight, init: initialization for the convolution kernel (see Flux.Conv)
source

To know more about the exact details of each of these parameters, you can refer to the documentation for this function. For now, we will focus on some common use cases. For example, if you want to create a convolutional layer with a kernel size of 3x3, with 32 input channels and 64 output channels, along with a BatchNorm layer, you can do the following:

conv_norm((3, 3), 32, 64)

This returns a Vector with the desired layers. To use it in a model, the user should splat it into a Chain. For example:

Chain(Dense(3, 32), conv_norm((3, 3), 32, 64)..., Dense(64, 10))

The default activation function for conv_norm is relu, and the default normalisation layer is BatchNorm. To use a different activation function, you can just pass it in as a positional argument. For example, to use a sigmoid activation function:

conv_norm((3, 3), 32, 64, sigmoid)

Let's try something else. Suppose you want to use a GroupNorm layer instead of a BatchNorm layer. Note that norm_layer is a keyword argument in the function signature of conv_norm as shown above. Then we can write:

conv_norm((3, 3), 32, 64; norm_layer = GroupNorm)

What if you want to change certain specific parameters of the norm_layer? For example, what if you want to change the number of groups in the GroupNorm layer?

# defining the norm layer
+          dilation::Integer = 1, groups::Integer = 1, [bias, weight, init])

Create a convolution + normalisation layer pair with activation.

Arguments

source

To know more about the exact details of each of these parameters, you can refer to the documentation for this function. For now, we will focus on some common use cases. For example, if you want to create a convolutional layer with a kernel size of 3x3, with 32 input channels and 64 output channels, along with a BatchNorm layer, you can do the following:

conv_norm((3, 3), 32, 64)

This returns a Vector with the desired layers. To use it in a model, the user should splat it into a Chain. For example:

Chain(Dense(3, 32), conv_norm((3, 3), 32, 64)..., Dense(64, 10))

The default activation function for conv_norm is relu, and the default normalisation layer is BatchNorm. To use a different activation function, you can just pass it in as a positional argument. For example, to use a sigmoid activation function:

conv_norm((3, 3), 32, 64, sigmoid)

Let's try something else. Suppose you want to use a GroupNorm layer instead of a BatchNorm layer. Note that norm_layer is a keyword argument in the function signature of conv_norm as shown above. Then we can write:

conv_norm((3, 3), 32, 64; norm_layer = GroupNorm)

What if you want to change certain specific parameters of the norm_layer? For example, what if you want to change the number of groups in the GroupNorm layer?

# defining the norm layer
 norm_layer = planes -> GroupNorm(planes, 4)
 # passing it to the conv_norm layer
 conv_norm((3, 3), 32, 64; norm_layer = norm_layer)

One of Julia's features is that functions are first-class objects, and can be passed around as arguments to other functions. Here, we have create an anonymous function that takes in the number of planes as an argument, and returns a GroupNorm layer with 4 groups. This is then passed to the norm_layer keyword argument of the conv_norm layer. Using anonymous functions allows us to configure the layers in a very flexible manner, and this is a common pattern in Metalhead.jl.

Let's take a slightly more complicated example. TensorFlow uses different defaults for its normalisation layers. In particular, it uses an epsilon value of 1e-3 for BatchNorm layers. If you want to use the same defaults as TensorFlow, you can do the following:

# note that 1e-3 is not a Float32 and Flux is optimized for Float32, so we use 1.0f-3
 conv_norm((3, 3), 32, 64; norm_layer = planes -> BatchNorm(planes, eps = 1.0f-3))

which, incidentally, is very similar to the code Metalhead uses internally for the Metalhead.Layers.basic_conv_bn layer that is used in the Inception family of models.

Metalhead.Layers.basic_conv_bnFunction
basic_conv_bn(kernel_size::Dims{2}, inplanes, outplanes, activation = relu;
-              kwargs...)

Returns a convolution + batch normalisation pair with activation as used by the Inception family of models with default values matching those used in the official TensorFlow implementation.

Arguments

  • kernel_size: size of the convolution kernel (tuple)
  • inplanes: number of input feature maps
  • outplanes: number of output feature maps
  • activation: the activation function for the final layer
  • batchnorm: set to true to include batch normalization after each convolution
  • kwargs: keyword arguments passed to conv_norm
source

Normalisation layers

The Layers module provides some custom normalisation functions that are not present in Flux.

Metalhead.Layers.LayerScaleFunction
LayerScale(planes::Integer, λ)

Creates a Flux.Scale layer that performs "LayerScale" (reference).

Arguments

  • planes: Size of channel dimension in the input.
  • λ: initialisation value for the learnable diagonal matrix.
source
Metalhead.Layers.LayerNormV2Type
LayerNormV2(size..., λ=identity; affine=true, eps=1f-5)

Same as Flux's LayerNorm but eps is added before taking the square root in the denominator. Therefore, LayerNormV2 matches pytorch's LayerNorm.

source
Metalhead.Layers.ChannelLayerNormType
ChannelLayerNorm(sz::Integer, λ = identity; eps = 1.0f-6)

A variant of LayerNorm where the input is normalised along the channel dimension. The input is expected to have channel dimension with size sz. It also applies a learnable shift and rescaling after the normalization.

Note that this is specifically for inputs with 4 dimensions in the format (H, W, C, N) where H, W are the height and width of the input, C is the number of channels, and N is the batch size.

source

There is also a utility function, prenorm, which applies a normalisation layer before a given block and simply returns a Chain with the normalisation layer and the block. This is useful for creating Vision Transformers (ViT)-like models.

Metalhead.Layers.prenormFunction
prenorm(planes, block; norm_layer = LayerNorm)

Utility function to apply a normalization layer before a block.

Arguments

  • planes: Size of dimension to normalize.
  • block: The block before which the normalization layer is applied.
  • norm_layer: The normalization layer to use.
source

Dropout layers

The Layers module provides two dropout-like layers not present in Flux:

Metalhead.Layers.DropBlockType
DropBlock(drop_block_prob = 0.1, block_size = 7, gamma_scale = 1.0, [rng])

The DropBlock layer. While training, it zeroes out continguous regions of size block_size in the input. During inference, it simply returns the input x. It can be used in two ways: either with all blocks having the same survival probability or with a linear scaling rule across the blocks. This is performed only at training time. At test time, the DropBlock layer is equivalent to identity.

(reference)

Arguments

  • drop_block_prob: probability of dropping a block. If nothing is passed, it returns identity. Note that some literature uses the term "survival probability" instead, which is equivalent to 1 - drop_block_prob.
  • block_size: size of the block to drop
  • gamma_scale: multiplicative factor for gamma used. For the calculation of gamma, refer to the paper.
  • rng: can be used to pass in a custom RNG instead of the default. Custom RNGs are only supported on the CPU.
source
Metalhead.Layers.StochasticDepthFunction
StochasticDepth(p, mode = :row; [rng])

Implements Stochastic Depth. This is a Dropout layer from Flux that drops values with probability p. (reference)

This layer can be used to drop certain blocks in a residual structure and allow them to propagate completely through the skip connection. It can be used in two ways: either with all blocks having the same survival probability or with a linear scaling rule across the blocks. This is performed only at training time. At test time, the StochasticDepth layer is equivalent to identity.

Arguments

  • p: probability of Stochastic Depth. Note that some literature uses the term "survival probability" instead, which is equivalent to 1 - p.
  • mode: Either :batch or :row. :batch randomly zeroes the entire input, row zeroes randomly selected rows from the batch. The default is :row.
  • rng: can be used to pass in a custom RNG instead of the default. See Flux.Dropout for more information on the behaviour of this argument. Custom RNGs are only supported on the CPU.
source

DropBlock also has a functional variant present in the Layers module:

Metalhead.Layers.dropblockFunction
dropblock([rng], x::AbstractArray{T, 4}, drop_block_prob, block_size,
-          gamma_scale, active::Bool = true)

The dropblock function. If active is true, for each input, it zeroes out continguous regions of size block_size in the input. Otherwise, it simply returns the input x.

Arguments

  • rng: can be used to pass in a custom RNG instead of the default. Custom RNGs are only supported on the CPU.
  • x: input array
  • drop_block_prob: probability of dropping a block. If nothing is passed, it returns identity.
  • block_size: size of the block to drop
  • gamma_scale: multiplicative factor for gamma used. For the calculations, refer to the paper.

If you are not a package developer, you most likely do not want this function. Use DropBlock instead.

source

Both DropBlock and StochasticDepth are used along with probability values that vary based on a linear schedule across the structure of the model (see the respective papers for more details). The Layers module provides a utility function to create such a schedule as well:

Metalhead.Layers.linear_schedulerFunction
linear_scheduler(drop_prob = 0.0; start_value = 0.0, depth)
-linear_scheduler(drop_prob::Nothing; depth::Integer)

Returns the dropout probabilities for a given depth using the linear scaling rule. Note that this returns evenly spaced values between start_value and drop_prob, not including drop_prob. If drop_prob is nothing, it returns a Vector of length depth with all values equal to nothing.

source

The Metalhead.resnet function which powers the ResNet family of models in Metalhead.jl is configured to allow the use of both these layers. For examples, check out the guide for using the ResNet family in Metalhead here. These layers can also be used by the user to construct other custom models.

Pooling layers

The Layers module provides a Metalhead.Layers.AdaptiveMeanMaxPool layer, which is inspired by a similar layer present in timm.

Metalhead.Layers.AdaptiveMeanMaxPoolFunction
AdaptiveMeanMaxPool([connection = +], output_size::Tuple = (1, 1))

A type of adaptive pooling layer which uses both mean and max pooling and combines them to produce a single output. Note that this is equivalent to Parallel(connection, AdaptiveMeanPool(output_size), AdaptiveMaxPool(output_size)). When connection is not specified, it defaults to +.

Arguments

  • connection: The connection type to use.
  • output_size: The size of the output after pooling.
source

Many mid-level model functions in Metalhead.jl have been written to support passing custom pooling layers to them if applicable (either in the model itself or in the classifier head). For example, the Metalhead.resnet function supports this, and examples of this can be found in the guide for using the ResNet family in Metalhead here.

Classifier creation

Metalhead provides a function to create a classifier for neural network models that is quite flexible, and is used by the library extensively to create the classifier "head" for networks. This function is called Metalhead.Layers.create_classifier and is documented below:

Metalhead.Layers.create_classifierFunction
create_classifier(inplanes::Integer, nclasses::Integer, activation = identity;
+              kwargs...)

Returns a convolution + batch normalisation pair with activation as used by the Inception family of models with default values matching those used in the official TensorFlow implementation.

Arguments

  • kernel_size: size of the convolution kernel (tuple)
  • inplanes: number of input feature maps
  • outplanes: number of output feature maps
  • activation: the activation function for the final layer
  • batchnorm: set to true to include batch normalization after each convolution
  • kwargs: keyword arguments passed to conv_norm
source

Normalisation layers

The Layers module provides some custom normalisation functions that are not present in Flux.

Metalhead.Layers.LayerScaleFunction
LayerScale(planes::Integer, λ)

Creates a Flux.Scale layer that performs "LayerScale" (reference).

Arguments

  • planes: Size of channel dimension in the input.
  • λ: initialisation value for the learnable diagonal matrix.
source
Metalhead.Layers.LayerNormV2Type
LayerNormV2(size..., λ=identity; affine=true, eps=1f-5)

Same as Flux's LayerNorm but eps is added before taking the square root in the denominator. Therefore, LayerNormV2 matches pytorch's LayerNorm.

source
Metalhead.Layers.ChannelLayerNormType
ChannelLayerNorm(sz::Integer, λ = identity; eps = 1.0f-6)

A variant of LayerNorm where the input is normalised along the channel dimension. The input is expected to have channel dimension with size sz. It also applies a learnable shift and rescaling after the normalization.

Note that this is specifically for inputs with 4 dimensions in the format (H, W, C, N) where H, W are the height and width of the input, C is the number of channels, and N is the batch size.

source

There is also a utility function, prenorm, which applies a normalisation layer before a given block and simply returns a Chain with the normalisation layer and the block. This is useful for creating Vision Transformers (ViT)-like models.

Metalhead.Layers.prenormFunction
prenorm(planes, block; norm_layer = LayerNorm)

Utility function to apply a normalization layer before a block.

Arguments

  • planes: Size of dimension to normalize.
  • block: The block before which the normalization layer is applied.
  • norm_layer: The normalization layer to use.
source

Dropout layers

The Layers module provides two dropout-like layers not present in Flux:

Metalhead.Layers.DropBlockType
DropBlock(drop_block_prob = 0.1, block_size = 7, gamma_scale = 1.0, [rng])

The DropBlock layer. While training, it zeroes out continguous regions of size block_size in the input. During inference, it simply returns the input x. It can be used in two ways: either with all blocks having the same survival probability or with a linear scaling rule across the blocks. This is performed only at training time. At test time, the DropBlock layer is equivalent to identity.

(reference)

Arguments

  • drop_block_prob: probability of dropping a block. If nothing is passed, it returns identity. Note that some literature uses the term "survival probability" instead, which is equivalent to 1 - drop_block_prob.
  • block_size: size of the block to drop
  • gamma_scale: multiplicative factor for gamma used. For the calculation of gamma, refer to the paper.
  • rng: can be used to pass in a custom RNG instead of the default. Custom RNGs are only supported on the CPU.
source
Metalhead.Layers.StochasticDepthFunction
StochasticDepth(p, mode = :row; [rng])

Implements Stochastic Depth. This is a Dropout layer from Flux that drops values with probability p. (reference)

This layer can be used to drop certain blocks in a residual structure and allow them to propagate completely through the skip connection. It can be used in two ways: either with all blocks having the same survival probability or with a linear scaling rule across the blocks. This is performed only at training time. At test time, the StochasticDepth layer is equivalent to identity.

Arguments

  • p: probability of Stochastic Depth. Note that some literature uses the term "survival probability" instead, which is equivalent to 1 - p.
  • mode: Either :batch or :row. :batch randomly zeroes the entire input, row zeroes randomly selected rows from the batch. The default is :row.
  • rng: can be used to pass in a custom RNG instead of the default. See Flux.Dropout for more information on the behaviour of this argument. Custom RNGs are only supported on the CPU.
source

DropBlock also has a functional variant present in the Layers module:

Metalhead.Layers.dropblockFunction
dropblock([rng], x::AbstractArray{T, 4}, drop_block_prob, block_size,
+          gamma_scale, active::Bool = true)

The dropblock function. If active is true, for each input, it zeroes out continguous regions of size block_size in the input. Otherwise, it simply returns the input x.

Arguments

  • rng: can be used to pass in a custom RNG instead of the default. Custom RNGs are only supported on the CPU.
  • x: input array
  • drop_block_prob: probability of dropping a block. If nothing is passed, it returns identity.
  • block_size: size of the block to drop
  • gamma_scale: multiplicative factor for gamma used. For the calculations, refer to the paper.

If you are not a package developer, you most likely do not want this function. Use DropBlock instead.

source

Both DropBlock and StochasticDepth are used along with probability values that vary based on a linear schedule across the structure of the model (see the respective papers for more details). The Layers module provides a utility function to create such a schedule as well:

Metalhead.Layers.linear_schedulerFunction
linear_scheduler(drop_prob = 0.0; start_value = 0.0, depth)
+linear_scheduler(drop_prob::Nothing; depth::Integer)

Returns the dropout probabilities for a given depth using the linear scaling rule. Note that this returns evenly spaced values between start_value and drop_prob, not including drop_prob. If drop_prob is nothing, it returns a Vector of length depth with all values equal to nothing.

source

The Metalhead.resnet function which powers the ResNet family of models in Metalhead.jl is configured to allow the use of both these layers. For examples, check out the guide for using the ResNet family in Metalhead here. These layers can also be used by the user to construct other custom models.

Pooling layers

The Layers module provides a Metalhead.Layers.AdaptiveMeanMaxPool layer, which is inspired by a similar layer present in timm.

Metalhead.Layers.AdaptiveMeanMaxPoolFunction
AdaptiveMeanMaxPool([connection = +], output_size::Tuple = (1, 1))

A type of adaptive pooling layer which uses both mean and max pooling and combines them to produce a single output. Note that this is equivalent to Parallel(connection, AdaptiveMeanPool(output_size), AdaptiveMaxPool(output_size)). When connection is not specified, it defaults to +.

Arguments

  • connection: The connection type to use.
  • output_size: The size of the output after pooling.
source

Many mid-level model functions in Metalhead.jl have been written to support passing custom pooling layers to them if applicable (either in the model itself or in the classifier head). For example, the Metalhead.resnet function supports this, and examples of this can be found in the guide for using the ResNet family in Metalhead here.

Classifier creation

Metalhead provides a function to create a classifier for neural network models that is quite flexible, and is used by the library extensively to create the classifier "head" for networks. This function is called Metalhead.Layers.create_classifier and is documented below:

Metalhead.Layers.create_classifierFunction
create_classifier(inplanes::Integer, nclasses::Integer, activation = identity;
                   use_conv::Bool = false, pool_layer = AdaptiveMeanPool((1, 1)), 
-                  dropout_prob = nothing)

Creates a classifier head to be used for models.

Arguments

  • inplanes: number of input feature maps
  • nclasses: number of output classes
  • activation: activation function to use
  • use_conv: whether to use a 1x1 convolutional layer instead of a Dense layer.
  • pool_layer: pooling layer to use. This is passed in with the layer instantiated with any arguments that are needed i.e. as AdaptiveMeanPool((1, 1)), for example.
  • dropout_prob: dropout probability used in the classifier head. Set to nothing to disable dropout.
source
create_classifier(inplanes::Integer, hidden_planes::Integer, nclasses::Integer,
+                  dropout_prob = nothing)

Creates a classifier head to be used for models.

Arguments

  • inplanes: number of input feature maps
  • nclasses: number of output classes
  • activation: activation function to use
  • use_conv: whether to use a 1x1 convolutional layer instead of a Dense layer.
  • pool_layer: pooling layer to use. This is passed in with the layer instantiated with any arguments that are needed i.e. as AdaptiveMeanPool((1, 1)), for example.
  • dropout_prob: dropout probability used in the classifier head. Set to nothing to disable dropout.
source
create_classifier(inplanes::Integer, hidden_planes::Integer, nclasses::Integer,
                   activations::NTuple{2} = (relu, identity);
                   use_conv::NTuple{2, Bool} = (false, false),
-                  pool_layer = AdaptiveMeanPool((1, 1)), dropout_prob = nothing)

Creates a classifier head to be used for models with an extra hidden layer.

Arguments

  • inplanes: number of input feature maps
  • hidden_planes: number of hidden feature maps
  • nclasses: number of output classes
  • activations: activation functions to use for the hidden and output layers. This is a tuple of two elements, the first being the activation function for the hidden layer and the second for the output layer.
  • use_conv: whether to use a 1x1 convolutional layer instead of a Dense layer. This is a tuple of two booleans, the first for the hidden layer and the second for the output layer.
  • pool_layer: pooling layer to use. This is passed in with the layer instantiated with any arguments that are needed i.e. as AdaptiveMeanPool((1, 1)), for example.
  • dropout_prob: dropout probability used in the classifier head. Set to nothing to disable dropout.
source

Due to the power of multiple dispatch in Julia, the above function can be called with two different signatures - one of which creates a classifier with no hidden layers, and the other which creates a classifier with a single hidden layer. The function signature for both is documented above, and the user can choose the one that is most convenient for them. Both are used in Metalhead.jl - the latter is used in MobileNetv3, and the former is used almost everywhere else.

+ pool_layer = AdaptiveMeanPool((1, 1)), dropout_prob = nothing)

Creates a classifier head to be used for models with an extra hidden layer.

Arguments

source

Due to the power of multiple dispatch in Julia, the above function can be called with two different signatures - one of which creates a classifier with no hidden layers, and the other which creates a classifier with a single hidden layer. The function signature for both is documented above, and the user can choose the one that is most convenient for them. Both are used in Metalhead.jl - the latter is used in MobileNetv3, and the former is used almost everywhere else.

diff --git a/dev/api/mixers/index.html b/dev/api/mixers/index.html index 3f3f787a..caccb7d4 100644 --- a/dev/api/mixers/index.html +++ b/dev/api/mixers/index.html @@ -1,13 +1,13 @@ MLPMixer-like models · Metalhead.jl

MLPMixer-like models

This is the API reference for the MLPMixer-like models supported by Metalhead.jl.

The higher-level model constructors

Metalhead.MLPMixerType
MLPMixer(config::Symbol; patch_size::Dims{2} = (16, 16), imsize::Dims{2} = (224, 224),
-         inchannels::Integer = 3, nclasses::Integer = 1000)

Creates a model with the MLPMixer architecture. (reference).

Arguments

  • config: the size of the model - one of :small, :base, :large or :huge
  • patch_size: the size of the patches
  • imsize: the size of the input image
  • stochastic_depth_prob: Stochastic depth probability
  • inchannels: the number of input channels
  • nclasses: number of output classes

See also Metalhead.mlpmixer.

source
Metalhead.ResMLPType
ResMLP(config::Symbol; patch_size::Dims{2} = (16, 16), imsize::Dims{2} = (224, 224),
-       inchannels::Integer = 3, nclasses::Integer = 1000)

Creates a model with the ResMLP architecture. (reference).

Arguments

  • config: the size of the model - one of :small, :base, :large or :huge
  • patch_size: the size of the patches
  • imsize: the size of the input image
  • inchannels: the number of input channels
  • nclasses: number of output classes

See also Metalhead.mlpmixer.

source
Metalhead.gMLPType
gMLP(config::Symbol; patch_size::Dims{2} = (16, 16), imsize::Dims{2} = (224, 224),
-     inchannels::Integer = 3, nclasses::Integer = 1000)

Creates a model with the gMLP architecture. (reference).

Arguments

  • config: the size of the model - one of :small, :base, :large or :huge
  • patch_size: the size of the patches
  • imsize: the size of the input image
  • inchannels: the number of input channels
  • nclasses: number of output classes

See also Metalhead.mlpmixer.

source

The core MLPMixer function

Metalhead.mlpmixerFunction
mlpmixer(block, imsize::Dims{2} = (224, 224); inchannels::Integer = 3, norm_layer = LayerNorm,
+         inchannels::Integer = 3, nclasses::Integer = 1000)

Creates a model with the MLPMixer architecture. (reference).

Arguments

  • config: the size of the model - one of :small, :base, :large or :huge
  • patch_size: the size of the patches
  • imsize: the size of the input image
  • stochastic_depth_prob: Stochastic depth probability
  • inchannels: the number of input channels
  • nclasses: number of output classes

See also Metalhead.mlpmixer.

source
Metalhead.ResMLPType
ResMLP(config::Symbol; patch_size::Dims{2} = (16, 16), imsize::Dims{2} = (224, 224),
+       inchannels::Integer = 3, nclasses::Integer = 1000)

Creates a model with the ResMLP architecture. (reference).

Arguments

  • config: the size of the model - one of :small, :base, :large or :huge
  • patch_size: the size of the patches
  • imsize: the size of the input image
  • inchannels: the number of input channels
  • nclasses: number of output classes

See also Metalhead.mlpmixer.

source
Metalhead.gMLPType
gMLP(config::Symbol; patch_size::Dims{2} = (16, 16), imsize::Dims{2} = (224, 224),
+     inchannels::Integer = 3, nclasses::Integer = 1000)

Creates a model with the gMLP architecture. (reference).

Arguments

  • config: the size of the model - one of :small, :base, :large or :huge
  • patch_size: the size of the patches
  • imsize: the size of the input image
  • inchannels: the number of input channels
  • nclasses: number of output classes

See also Metalhead.mlpmixer.

source

The core MLPMixer function

Metalhead.mlpmixerFunction
mlpmixer(block, imsize::Dims{2} = (224, 224); inchannels::Integer = 3, norm_layer = LayerNorm,
          patch_size::Dims{2} = (16, 16), embedplanes = 512, stochastic_depth_prob = 0.,
-         depth::Integer = 12, nclasses::Integer = 1000, kwargs...)

Creates a model with the MLPMixer architecture. (reference).

Arguments

  • block: the type of mixer block to use in the model - architecture dependent (a constructor of the form block(embedplanes, npatches; stochastic_depth_prob, kwargs...))
  • imsize: the size of the input image
  • inchannels: the number of input channels
  • norm_layer: the normalization layer to use in the model
  • patch_size: the size of the patches
  • embedplanes: the number of channels after the patch embedding (denotes the hidden dimension)
  • stochastic_depth_prob: Stochastic depth probability
  • depth: the number of blocks in the model
  • nclasses: number of output classes
  • kwargs: additional arguments (if any) to pass to the mixer block. Will use the defaults if not specified.
source

The block functions

Metalhead.mixerblockFunction
mixerblock(planes::Integer, npatches::Integer; mlp_layer = mlp_block,
+         depth::Integer = 12, nclasses::Integer = 1000, kwargs...)

Creates a model with the MLPMixer architecture. (reference).

Arguments

  • block: the type of mixer block to use in the model - architecture dependent (a constructor of the form block(embedplanes, npatches; stochastic_depth_prob, kwargs...))
  • imsize: the size of the input image
  • inchannels: the number of input channels
  • norm_layer: the normalization layer to use in the model
  • patch_size: the size of the patches
  • embedplanes: the number of channels after the patch embedding (denotes the hidden dimension)
  • stochastic_depth_prob: Stochastic depth probability
  • depth: the number of blocks in the model
  • nclasses: number of output classes
  • kwargs: additional arguments (if any) to pass to the mixer block. Will use the defaults if not specified.
source

The block functions

Metalhead.mixerblockFunction
mixerblock(planes::Integer, npatches::Integer; mlp_layer = mlp_block,
            mlp_ratio = (0.5, 4.0), dropout_prob = 0.0, stochastic_depth_prob = 0.0,
-           activation = gelu)

Creates a feedforward block for the MLPMixer architecture. (reference)

Arguments

  • planes: the number of planes in the block
  • npatches: the number of patches of the input
  • mlp_ratio: number(s) that determine(s) the number of hidden channels in the token mixing MLP and/or the channel mixing MLP as a ratio to the number of planes in the block.
  • mlp_layer: the MLP layer to use in the block
  • dropout_prob: the dropout probability to use in the MLP blocks
  • stochastic_depth_prob: Stochastic depth probability
  • activation: the activation function to use in the MLP blocks
source
Metalhead.resmixerblockFunction
resmixerblock(planes, npatches; dropout_prob = 0., stochastic_depth_prob = 0., mlp_ratio = 4.0,
-              activation = gelu, layerscale_init = 1e-4)

Creates a block for the ResMixer architecture. (reference).

Arguments

  • planes: the number of planes in the block
  • npatches: the number of patches of the input
  • mlp_ratio: ratio of the number of hidden channels in the channel mixing MLP to the number of planes in the block
  • mlp_layer: the MLP block to use
  • dropout_prob: the dropout probability to use in the MLP blocks
  • stochastic_depth_prob: Stochastic depth probability
  • activation: the activation function to use in the MLP blocks
  • layerscale_init: initialisation constant for the LayerScale
source
Metalhead.SpatialGatingUnitType
SpatialGatingUnit(planes::Integer, npatches::Integer; norm_layer = LayerNorm)

Creates a spatial gating unit as described in the gMLP paper. (reference)

Arguments

  • planes: the number of planes in the block
  • npatches: the number of patches of the input
  • norm_layer: the normalisation layer to use
source
Metalhead.spatialgatingblockFunction
spatialgatingblock(planes::Integer, npatches::Integer; mlp_ratio = 4.0,
+           activation = gelu)

Creates a feedforward block for the MLPMixer architecture. (reference)

Arguments

  • planes: the number of planes in the block
  • npatches: the number of patches of the input
  • mlp_ratio: number(s) that determine(s) the number of hidden channels in the token mixing MLP and/or the channel mixing MLP as a ratio to the number of planes in the block.
  • mlp_layer: the MLP layer to use in the block
  • dropout_prob: the dropout probability to use in the MLP blocks
  • stochastic_depth_prob: Stochastic depth probability
  • activation: the activation function to use in the MLP blocks
source
Metalhead.resmixerblockFunction
resmixerblock(planes, npatches; dropout_prob = 0., stochastic_depth_prob = 0., mlp_ratio = 4.0,
+              activation = gelu, layerscale_init = 1e-4)

Creates a block for the ResMixer architecture. (reference).

Arguments

  • planes: the number of planes in the block
  • npatches: the number of patches of the input
  • mlp_ratio: ratio of the number of hidden channels in the channel mixing MLP to the number of planes in the block
  • mlp_layer: the MLP block to use
  • dropout_prob: the dropout probability to use in the MLP blocks
  • stochastic_depth_prob: Stochastic depth probability
  • activation: the activation function to use in the MLP blocks
  • layerscale_init: initialisation constant for the LayerScale
source
Metalhead.SpatialGatingUnitType
SpatialGatingUnit(planes::Integer, npatches::Integer; norm_layer = LayerNorm)

Creates a spatial gating unit as described in the gMLP paper. (reference)

Arguments

  • planes: the number of planes in the block
  • npatches: the number of patches of the input
  • norm_layer: the normalisation layer to use
source
Metalhead.spatialgatingblockFunction
spatialgatingblock(planes::Integer, npatches::Integer; mlp_ratio = 4.0,
                    norm_layer = LayerNorm, mlp_layer = gated_mlp_block,
                    dropout_prob = 0.0, stochastic_depth_prob = 0.0,
-                   activation = gelu)

Creates a feedforward block based on the gMLP model architecture described in the paper. (reference)

Arguments

  • planes: the number of planes in the block
  • npatches: the number of patches of the input
  • mlp_ratio: ratio of the number of hidden channels in the channel mixing MLP to the number of planes in the block
  • norm_layer: the normalisation layer to use
  • dropout_prob: the dropout probability to use in the MLP blocks
  • stochastic_depth_prob: Stochastic depth probability
  • activation: the activation function to use in the MLP blocks
source
+ activation = gelu)

Creates a feedforward block based on the gMLP model architecture described in the paper. (reference)

Arguments

source diff --git a/dev/api/mobilenet/index.html b/dev/api/mobilenet/index.html index 1b31073a..dfde2747 100644 --- a/dev/api/mobilenet/index.html +++ b/dev/api/mobilenet/index.html @@ -1,10 +1,10 @@ MobileNet family of models · Metalhead.jl

MobileNet family of models

This is the API reference for the MobileNet family of models supported by Metalhead.jl.

The higher-level model constructors

Metalhead.MobileNetv1Type
MobileNetv1(width_mult::Real = 1; pretrain::Bool = false,
-            inchannels::Integer = 3, nclasses::Integer = 1000)

Create a MobileNetv1 model with the baseline configuration (reference).

Arguments

  • width_mult: Controls the number of output feature maps in each block (with 1 being the default in the paper; this is usually a value between 0.1 and 1.4)
  • pretrain: Whether to load the pre-trained weights for ImageNet
  • inchannels: The number of input channels.
  • nclasses: The number of output classes
Warning

MobileNetv1 does not currently support pretrained weights.

See also Metalhead.mobilenetv1.

source
Metalhead.MobileNetv2Type
MobileNetv2(width_mult = 1.0; inchannels::Integer = 3, pretrain::Bool = false,
-            nclasses::Integer = 1000)

Create a MobileNetv2 model with the specified configuration. (reference).

Arguments

  • width_mult: Controls the number of output feature maps in each block (with 1 being the default in the paper; this is usually a value between 0.1 and 1.4)
  • pretrain: Whether to load the pre-trained weights for ImageNet
  • inchannels: The number of input channels.
  • nclasses: The number of output classes
Warning

MobileNetv2 does not currently support pretrained weights.

See also Metalhead.mobilenetv2.

source
Metalhead.MobileNetv3Type
MobileNetv3(config::Symbol; width_mult::Real = 1, pretrain::Bool = false,
-            inchannels::Integer = 3, nclasses::Integer = 1000)

Create a MobileNetv3 model with the specified configuration. (reference). Set pretrain = true to load the model with pre-trained weights for ImageNet.

Arguments

  • config: :small or :large for the size of the model (see paper).
  • width_mult: Controls the number of output feature maps in each block (with 1 being the default in the paper; this is usually a value between 0.1 and 1.4)
  • pretrain: whether to load the pre-trained weights for ImageNet
  • inchannels: number of input channels
  • nclasses: the number of output classes
Warning

MobileNetv3 does not currently support pretrained weights.

See also Metalhead.mobilenetv3.

source
Metalhead.MNASNetType
MNASNet(config::Symbol; width_mult::Real = 1, pretrain::Bool = false,
-        inchannels::Integer = 3, nclasses::Integer = 1000)

Creates a MNASNet model with the specified configuration. (reference)

Arguments

  • config: configuration of the model. One of B1, A1 or small. B1 is without squeeze-and-excite layers, A1 is with squeeze-and-excite layers, and small is a smaller version of A1.
  • width_mult: Controls the number of output feature maps in each block (with 1 being the default in the paper; this is usually a value between 0.1 and 1.4)
  • pretrain: Whether to load the pre-trained weights for ImageNet
  • inchannels: The number of input channels.
  • nclasses: The number of output classes
Warning

MNASNet does not currently support pretrained weights.

See also Metalhead.mnasnet.

source

The mid-level functions

Metalhead.mobilenetv1Function
mobilenetv1(width_mult::Real = 1; inplanes::Integer = 32, dropout_prob = nothing,
-            inchannels::Integer = 3, nclasses::Integer = 1000)

Create a MobileNetv1 model. (reference).

Arguments

  • width_mult: Controls the number of output feature maps in each block (with 1 being the default in the paper; this is usually a value between 0.1 and 1.4)
  • inplanes: Number of input channels to the first convolution layer
  • dropout_prob: Dropout probability for the classifier head. Set to nothing to disable dropout.
  • inchannels: Number of input channels.
  • nclasses: Number of output classes.
source
Metalhead.mobilenetv2Function
mobilenetv2(width_mult::Real = 1; max_width::Integer = 1280,
+            inchannels::Integer = 3, nclasses::Integer = 1000)

Create a MobileNetv1 model with the baseline configuration (reference).

Arguments

  • width_mult: Controls the number of output feature maps in each block (with 1 being the default in the paper; this is usually a value between 0.1 and 1.4)
  • pretrain: Whether to load the pre-trained weights for ImageNet
  • inchannels: The number of input channels.
  • nclasses: The number of output classes
Warning

MobileNetv1 does not currently support pretrained weights.

See also Metalhead.mobilenetv1.

source
Metalhead.MobileNetv2Type
MobileNetv2(width_mult = 1.0; inchannels::Integer = 3, pretrain::Bool = false,
+            nclasses::Integer = 1000)

Create a MobileNetv2 model with the specified configuration. (reference).

Arguments

  • width_mult: Controls the number of output feature maps in each block (with 1 being the default in the paper; this is usually a value between 0.1 and 1.4)
  • pretrain: Whether to load the pre-trained weights for ImageNet
  • inchannels: The number of input channels.
  • nclasses: The number of output classes
Warning

MobileNetv2 does not currently support pretrained weights.

See also Metalhead.mobilenetv2.

source
Metalhead.MobileNetv3Type
MobileNetv3(config::Symbol; width_mult::Real = 1, pretrain::Bool = false,
+            inchannels::Integer = 3, nclasses::Integer = 1000)

Create a MobileNetv3 model with the specified configuration. (reference). Set pretrain = true to load the model with pre-trained weights for ImageNet.

Arguments

  • config: :small or :large for the size of the model (see paper).
  • width_mult: Controls the number of output feature maps in each block (with 1 being the default in the paper; this is usually a value between 0.1 and 1.4)
  • pretrain: whether to load the pre-trained weights for ImageNet
  • inchannels: number of input channels
  • nclasses: the number of output classes
Warning

MobileNetv3 does not currently support pretrained weights.

See also Metalhead.mobilenetv3.

source
Metalhead.MNASNetType
MNASNet(config::Symbol; width_mult::Real = 1, pretrain::Bool = false,
+        inchannels::Integer = 3, nclasses::Integer = 1000)

Creates a MNASNet model with the specified configuration. (reference)

Arguments

  • config: configuration of the model. One of B1, A1 or small. B1 is without squeeze-and-excite layers, A1 is with squeeze-and-excite layers, and small is a smaller version of A1.
  • width_mult: Controls the number of output feature maps in each block (with 1 being the default in the paper; this is usually a value between 0.1 and 1.4)
  • pretrain: Whether to load the pre-trained weights for ImageNet
  • inchannels: The number of input channels.
  • nclasses: The number of output classes
Warning

MNASNet does not currently support pretrained weights.

See also Metalhead.mnasnet.

source

The mid-level functions

Metalhead.mobilenetv1Function
mobilenetv1(width_mult::Real = 1; inplanes::Integer = 32, dropout_prob = nothing,
+            inchannels::Integer = 3, nclasses::Integer = 1000)

Create a MobileNetv1 model. (reference).

Arguments

  • width_mult: Controls the number of output feature maps in each block (with 1 being the default in the paper; this is usually a value between 0.1 and 1.4)
  • inplanes: Number of input channels to the first convolution layer
  • dropout_prob: Dropout probability for the classifier head. Set to nothing to disable dropout.
  • inchannels: Number of input channels.
  • nclasses: Number of output classes.
source
Metalhead.mobilenetv2Function
mobilenetv2(width_mult::Real = 1; max_width::Integer = 1280,
             inplanes::Integer = 32, dropout_prob = 0.2,
             inchannels::Integer = 3, nclasses::Integer = 1000)

Create a MobileNetv2 model. (reference).

Arguments

- `width_mult`: Controls the number of output feature maps in each block
 (with 1 being the default in the paper; this is usually a value between 0.1 and 1.4)
@@ -12,11 +12,11 @@
 - `inplanes`: Number of input channels to the first convolution layer
 - `dropout_prob`: Dropout probability for the classifier head. Set to `nothing` to disable dropout.
 - `inchannels`: Number of input channels.
-- `nclasses`: Number of output classes.
source
Metalhead.mobilenetv3Function
mobilenetv3(config::Symbol; width_mult::Real = 1, dropout_prob = 0.2,
+- `nclasses`: Number of output classes.
source
Metalhead.mobilenetv3Function
mobilenetv3(config::Symbol; width_mult::Real = 1, dropout_prob = 0.2,
             inchannels::Integer = 3, nclasses::Integer = 1000)

Create a MobileNetv3 model with the specified configuration. (reference).

Arguments

- `config`: The configuration of the model. Can be either `small` or `large`.
 - `width_mult`: Controls the number of output feature maps in each block
   (with 1 being the default in the paper; this is usually a value between 0.1 and 1.4)
 - `dropout_prob`: Dropout probability for the classifier head. Set to `nothing` to disable dropout.
 - `inchannels`: The number of input channels.
-- `nclasses`: The number of output classes.
source
Metalhead.mnasnetFunction
mnasnet(config::Symbol; width_mult::Real = 1, max_width::Integer = 1280,
-        dropout_prob = 0.2, inchannels::Integer = 3, nclasses::Integer = 1000)

Create an MNasNet model. (reference)

Arguments

  • config: configuration of the model. One of B1, A1 or small. B1 is without squeeze-and-excite layers, A1 is with squeeze-and-excite layers, and small is a smaller version of A1.
  • width_mult: Controls the number of output feature maps in each block (with 1 being the default in the paper; this is usually a value between 0.1 and 1.4)
  • max_width: Controls the maximum number of output feature maps in each block
  • dropout_prob: Dropout probability for the classifier head. Set to nothing to disable dropout.
  • inchannels: Number of input channels.
  • nclasses: Number of output classes.
source
+- `nclasses`: The number of output classes.source
Metalhead.mnasnetFunction
mnasnet(config::Symbol; width_mult::Real = 1, max_width::Integer = 1280,
+        dropout_prob = 0.2, inchannels::Integer = 3, nclasses::Integer = 1000)

Create an MNasNet model. (reference)

Arguments

  • config: configuration of the model. One of B1, A1 or small. B1 is without squeeze-and-excite layers, A1 is with squeeze-and-excite layers, and small is a smaller version of A1.
  • width_mult: Controls the number of output feature maps in each block (with 1 being the default in the paper; this is usually a value between 0.1 and 1.4)
  • max_width: Controls the maximum number of output feature maps in each block
  • dropout_prob: Dropout probability for the classifier head. Set to nothing to disable dropout.
  • inchannels: Number of input channels.
  • nclasses: Number of output classes.
source
diff --git a/dev/api/others/index.html b/dev/api/others/index.html index f29b0d5c..2944437c 100644 --- a/dev/api/others/index.html +++ b/dev/api/others/index.html @@ -1,14 +1,14 @@ Other models · Metalhead.jl

Other models

This is the API reference for some of the models supported by Metalhead.jl that do not fit into the other categories.

The higher-level model constructors

Metalhead.AlexNetType
AlexNet(; pretrain::Bool = false, inchannels::Integer = 3,
-        nclasses::Integer = 1000)

Create a AlexNet. (reference).

Arguments

  • pretrain: set to true to load pre-trained weights for ImageNet
  • inchannels: The number of input channels.
  • nclasses: the number of output classes
Warning

AlexNet does not currently support pretrained weights.

See also alexnet.

source
Metalhead.VGGType
VGG(depth::Integer; pretrain::Bool = false, batchnorm::Bool = false,
-    inchannels::Integer = 3, nclasses::Integer = 1000)

Create a VGG style model with specified depth. (reference).

Warning

VGG does not currently support pretrained weights for the batchnorm = true option.

Arguments

  • depth: the depth of the VGG model. Must be one of [11, 13, 16, 19].
  • pretrain: set to true to load pre-trained model weights for ImageNet
  • batchnorm: set to true to use batch normalization after each convolution
  • inchannels: number of input channels
  • nclasses: number of output classes

See also vgg.

source
Metalhead.SqueezeNetType
SqueezeNet(; pretrain::Bool = false, inchannels::Integer = 3,
-           nclasses::Integer = 1000)

Create a SqueezeNet (reference).

Arguments

  • pretrain: set to true to load the pre-trained weights for ImageNet
  • inchannels: number of input channels.
  • nclasses: the number of output classes.

See also squeezenet.

source
Metalhead.UNetType
UNet(imsize::Dims{2} = (256, 256), inchannels::Integer = 3, outplanes::Integer = 3,
-     encoder_backbone = Metalhead.backbone(DenseNet(121)); pretrain::Bool = false)

Creates a UNet model with an encoder built of specified backbone. By default it uses DenseNet backbone, but any ResNet-like Metalhead model can be used for the encoder. (reference).

Arguments

  • imsize: size of input image
  • inchannels: number of channels in input image
  • outplanes: number of output feature planes.
  • encoder_backbone: The backbone layers of specified model to be used as encoder. For example, Metalhead.backbone(Metalhead.ResNet(18)) can be passed to instantiate a UNet with layers of resnet18 as encoder.
  • pretrain: Whether to load the pre-trained weights for ImageNet
Warning

UNet does not currently support pretrained weights.

See also Metalhead.unet.

source

The mid-level functions

Metalhead.alexnetFunction
alexnet(; dropout_prob = 0.5, inchannels::Integer = 3, nclasses::Integer = 1000)

Create an AlexNet model (reference).

Arguments

  • dropout_prob: dropout probability for the classifier
  • inchannels: The number of input channels.
  • nclasses: the number of output classes
source
Metalhead.vggFunction
vgg(imsize::Dims{2}; config, batchnorm::Bool = false, fcsize::Integer = 4096,
-    dropout_prob = 0.0, inchannels::Integer = 3, nclasses::Integer = 1000)

Create a VGG model (reference).

Arguments

  • imsize: input image width and height as a tuple
  • config: the configuration for the convolution layers (see Metalhead.vgg_convolutional_layers)
  • inchannels: number of input channels
  • batchnorm: set to true to use batch normalization after each convolution
  • nclasses: number of output classes
  • fcsize: intermediate fully connected layer size (see Metalhead.vgg_classifier_layers)
  • dropout_prob: dropout level between fully connected layers
source
Metalhead.squeezenetFunction
squeezenet(; dropout_prob = 0.5, inchannels::Integer = 3, nclasses::Integer = 1000)

Create a SqueezeNet model. (reference).

Arguments

  • dropout_prob: dropout probability for the classifier head. Set to nothing to disable dropout.
  • inchannels: number of input channels.
  • nclasses: the number of output classes.
source
Metalhead.unetFunction
unet(encoder_backbone, imgdims, outplanes::Integer, final::Any = unet_final_block,
+        nclasses::Integer = 1000)

Create a AlexNet. (reference).

Arguments

  • pretrain: set to true to load pre-trained weights for ImageNet
  • inchannels: The number of input channels.
  • nclasses: the number of output classes
Warning

AlexNet does not currently support pretrained weights.

See also alexnet.

source
Metalhead.VGGType
VGG(depth::Integer; pretrain::Bool = false, batchnorm::Bool = false,
+    inchannels::Integer = 3, nclasses::Integer = 1000)

Create a VGG style model with specified depth. (reference).

Warning

VGG does not currently support pretrained weights for the batchnorm = true option.

Arguments

  • depth: the depth of the VGG model. Must be one of [11, 13, 16, 19].
  • pretrain: set to true to load pre-trained model weights for ImageNet
  • batchnorm: set to true to use batch normalization after each convolution
  • inchannels: number of input channels
  • nclasses: number of output classes

See also vgg.

source
Metalhead.SqueezeNetType
SqueezeNet(; pretrain::Bool = false, inchannels::Integer = 3,
+           nclasses::Integer = 1000)

Create a SqueezeNet (reference).

Arguments

  • pretrain: set to true to load the pre-trained weights for ImageNet
  • inchannels: number of input channels.
  • nclasses: the number of output classes.

See also squeezenet.

source
Metalhead.UNetType
UNet(imsize::Dims{2} = (256, 256), inchannels::Integer = 3, outplanes::Integer = 3,
+     encoder_backbone = Metalhead.backbone(DenseNet(121)); pretrain::Bool = false)

Creates a UNet model with an encoder built of specified backbone. By default it uses DenseNet backbone, but any ResNet-like Metalhead model can be used for the encoder. (reference).

Arguments

  • imsize: size of input image
  • inchannels: number of channels in input image
  • outplanes: number of output feature planes.
  • encoder_backbone: The backbone layers of specified model to be used as encoder. For example, Metalhead.backbone(Metalhead.ResNet(18)) can be passed to instantiate a UNet with layers of resnet18 as encoder.
  • pretrain: Whether to load the pre-trained weights for ImageNet
Warning

UNet does not currently support pretrained weights.

See also Metalhead.unet.

source

The mid-level functions

Metalhead.alexnetFunction
alexnet(; dropout_prob = 0.5, inchannels::Integer = 3, nclasses::Integer = 1000)

Create an AlexNet model (reference).

Arguments

  • dropout_prob: dropout probability for the classifier
  • inchannels: The number of input channels.
  • nclasses: the number of output classes
source
Metalhead.vggFunction
vgg(imsize::Dims{2}; config, batchnorm::Bool = false, fcsize::Integer = 4096,
+    dropout_prob = 0.0, inchannels::Integer = 3, nclasses::Integer = 1000)

Create a VGG model (reference).

Arguments

  • imsize: input image width and height as a tuple
  • config: the configuration for the convolution layers (see Metalhead.vgg_convolutional_layers)
  • inchannels: number of input channels
  • batchnorm: set to true to use batch normalization after each convolution
  • nclasses: number of output classes
  • fcsize: intermediate fully connected layer size (see Metalhead.vgg_classifier_layers)
  • dropout_prob: dropout level between fully connected layers
source
Metalhead.squeezenetFunction
squeezenet(; dropout_prob = 0.5, inchannels::Integer = 3, nclasses::Integer = 1000)

Create a SqueezeNet model. (reference).

Arguments

  • dropout_prob: dropout probability for the classifier head. Set to nothing to disable dropout.
  • inchannels: number of input channels.
  • nclasses: the number of output classes.
source
Metalhead.unetFunction
unet(encoder_backbone, imgdims, outplanes::Integer, final::Any = unet_final_block,
      fdownscale::Integer = 0)

Creates a UNet model with specified convolutional backbone. Backbone of any Metalhead ResNet-like model can be used as encoder (reference).

Arguments

- `encoder_backbone`: The backbone layers of specified model to be used as encoder.
 	For example, `Metalhead.backbone(Metalhead.ResNet(18))` can be passed 
 	to instantiate a UNet with layers of resnet18 as encoder.
 - `inputsize`: size of input image
 - `outplanes`: number of output feature planes
 - `final`: final block as described in original paper
-- `fdownscale`: downscale factor
source

Block-level functions

Metalhead.vgg_blockFunction
vgg_block(ifilters, ofilters, depth, batchnorm)

A VGG block of convolution layers (reference).

Arguments

  • ifilters: number of input feature maps
  • ofilters: number of output feature maps
  • depth: number of convolution/convolution + batch norm layers
  • batchnorm: set to true to include batch normalization after each convolution
source
Metalhead.vgg_convolutional_layersFunction
vgg_convolutional_layers(config, batchnorm, inchannels)

Create VGG convolution layers (reference).

Arguments

  • config: vector of tuples (output_channels, num_convolutions) for each block (see Metalhead.vgg_block)
  • batchnorm: set to true to include batch normalization after each convolution
  • inchannels: number of input channels
source
Metalhead.vgg_classifier_layersFunction
vgg_classifier_layers(imsize, nclasses, fcsize, dropout_prob)

Create VGG classifier (fully connected) layers (reference).

Arguments

  • imsize: tuple (width, height, channels) indicating the size after the convolution layers (see Metalhead.vgg_convolutional_layers)
  • nclasses: number of output classes
  • fcsize: input and output size of the intermediate fully connected layer
  • dropout_prob: the dropout level between each fully connected layer
source
+- `fdownscale`: downscale factorsource

Block-level functions

Metalhead.vgg_blockFunction
vgg_block(ifilters, ofilters, depth, batchnorm)

A VGG block of convolution layers (reference).

Arguments

  • ifilters: number of input feature maps
  • ofilters: number of output feature maps
  • depth: number of convolution/convolution + batch norm layers
  • batchnorm: set to true to include batch normalization after each convolution
source
Metalhead.vgg_convolutional_layersFunction
vgg_convolutional_layers(config, batchnorm, inchannels)

Create VGG convolution layers (reference).

Arguments

  • config: vector of tuples (output_channels, num_convolutions) for each block (see Metalhead.vgg_block)
  • batchnorm: set to true to include batch normalization after each convolution
  • inchannels: number of input channels
source
Metalhead.vgg_classifier_layersFunction
vgg_classifier_layers(imsize, nclasses, fcsize, dropout_prob)

Create VGG classifier (fully connected) layers (reference).

Arguments

  • imsize: tuple (width, height, channels) indicating the size after the convolution layers (see Metalhead.vgg_convolutional_layers)
  • nclasses: number of output classes
  • fcsize: input and output size of the intermediate fully connected layer
  • dropout_prob: the dropout level between each fully connected layer
source
diff --git a/dev/api/resnet/index.html b/dev/api/resnet/index.html index 94a08cb8..088c34ee 100644 --- a/dev/api/resnet/index.html +++ b/dev/api/resnet/index.html @@ -1,11 +1,11 @@ -ResNet-like models · Metalhead.jl

ResNet-like models

This is the API reference for the ResNet inspired model structures present in Metalhead.jl.

The higher-level model constructors

Metalhead.ResNetType
ResNet(depth::Integer; pretrain::Bool = false, inchannels::Integer = 3, nclasses::Integer = 1000)

Creates a ResNet model with the specified depth. (reference)

Arguments

  • depth: one of [18, 34, 50, 101, 152]. The depth of the ResNet model.
  • pretrain: set to true to load the model with pre-trained weights for ImageNet
  • inchannels: The number of input channels.
  • nclasses: the number of output classes

Advanced users who want more configuration options will be better served by using resnet.

source
Metalhead.WideResNetType
WideResNet(depth::Integer; pretrain::Bool = false, inchannels::Integer = 3, nclasses::Integer = 1000)

Creates a Wide ResNet model with the specified depth. The model is the same as ResNet except for the bottleneck number of channels which is twice larger in every block. The number of channels in outer 1x1 convolutions is the same. (reference)

Arguments

  • depth: one of [18, 34, 50, 101, 152]. The depth of the Wide ResNet model.
  • pretrain: set to true to load the model with pre-trained weights for ImageNet
  • inchannels: The number of input channels.
  • nclasses: The number of output classes

Advanced users who want more configuration options will be better served by using resnet.

source
Metalhead.ResNeXtType
ResNeXt(depth::Integer; pretrain::Bool = false, cardinality::Integer = 32,
-        base_width::Integer = 4, inchannels::Integer = 3, nclasses::Integer = 1000)

Creates a ResNeXt model with the specified depth, cardinality, and base width. (reference)

Arguments

  • depth: one of [50, 101, 152]. The depth of the ResNeXt model.

  • pretrain: set to true to load the model with pre-trained weights for ImageNet. Supported configurations are:

    • depth 50, cardinality of 32 and base width of 4.
    • depth 101, cardinality of 32 and base width of 8.
    • depth 101, cardinality of 64 and base width of 4.
  • cardinality: the number of groups to be used in the 3x3 convolution in each block.

  • base_width: the number of feature maps in each group.

  • inchannels: the number of input channels.

  • nclasses: the number of output classes

Advanced users who want more configuration options will be better served by using resnet.

source
Metalhead.SEResNetType
SEResNet(depth::Integer; pretrain::Bool = false, inchannels::Integer = 3, nclasses::Integer = 1000)

Creates a SEResNet model with the specified depth. (reference)

Arguments

  • depth: one of [18, 34, 50, 101, 152]. The depth of the SEResNet model.
  • pretrain: set to true to load the model with pre-trained weights for ImageNet
  • inchannels: the number of input channels.
  • nclasses: the number of output classes
Warning

SEResNet does not currently support pretrained weights.

Advanced users who want more configuration options will be better served by using resnet.

source
Metalhead.SEResNeXtType
SEResNeXt(depth::Integer; pretrain::Bool = false, cardinality::Integer = 32,
-          base_width::Integer = 4, inchannels::Integer = 3, nclasses::Integer = 1000)

Creates a SEResNeXt model with the specified depth, cardinality, and base width. (reference)

Arguments

  • depth: one of [50, 101, 152]. The depth of the SEResNeXt model.
  • pretrain: set to true to load the model with pre-trained weights for ImageNet
  • cardinality: the number of groups to be used in the 3x3 convolution in each block.
  • base_width: the number of feature maps in each group.
  • inchannels: the number of input channels
  • nclasses: the number of output classes
Warning

SEResNeXt does not currently support pretrained weights.

Advanced users who want more configuration options will be better served by using resnet.

source
Metalhead.Res2NetType
Res2Net(depth::Integer; pretrain::Bool = false, scale::Integer = 4,
+ResNet-like models · Metalhead.jl

ResNet-like models

This is the API reference for the ResNet inspired model structures present in Metalhead.jl.

The higher-level model constructors

Metalhead.ResNetType
ResNet(depth::Integer; pretrain::Bool = false, inchannels::Integer = 3, nclasses::Integer = 1000)

Creates a ResNet model with the specified depth. (reference)

Arguments

  • depth: one of [18, 34, 50, 101, 152]. The depth of the ResNet model.
  • pretrain: set to true to load the model with pre-trained weights for ImageNet
  • inchannels: The number of input channels.
  • nclasses: the number of output classes

Advanced users who want more configuration options will be better served by using resnet.

source
Metalhead.WideResNetType
WideResNet(depth::Integer; pretrain::Bool = false, inchannels::Integer = 3, nclasses::Integer = 1000)

Creates a Wide ResNet model with the specified depth. The model is the same as ResNet except for the bottleneck number of channels which is twice larger in every block. The number of channels in outer 1x1 convolutions is the same. (reference)

Arguments

  • depth: one of [18, 34, 50, 101, 152]. The depth of the Wide ResNet model.
  • pretrain: set to true to load the model with pre-trained weights for ImageNet
  • inchannels: The number of input channels.
  • nclasses: The number of output classes

Advanced users who want more configuration options will be better served by using resnet.

source
Metalhead.ResNeXtType
ResNeXt(depth::Integer; pretrain::Bool = false, cardinality::Integer = 32,
+        base_width::Integer = 4, inchannels::Integer = 3, nclasses::Integer = 1000)

Creates a ResNeXt model with the specified depth, cardinality, and base width. (reference)

Arguments

  • depth: one of [50, 101, 152]. The depth of the ResNeXt model.

  • pretrain: set to true to load the model with pre-trained weights for ImageNet. Supported configurations are:

    • depth 50, cardinality of 32 and base width of 4.
    • depth 101, cardinality of 32 and base width of 8.
    • depth 101, cardinality of 64 and base width of 4.
  • cardinality: the number of groups to be used in the 3x3 convolution in each block.

  • base_width: the number of feature maps in each group.

  • inchannels: the number of input channels.

  • nclasses: the number of output classes

Advanced users who want more configuration options will be better served by using resnet.

source
Metalhead.SEResNetType
SEResNet(depth::Integer; pretrain::Bool = false, inchannels::Integer = 3, nclasses::Integer = 1000)

Creates a SEResNet model with the specified depth. (reference)

Arguments

  • depth: one of [18, 34, 50, 101, 152]. The depth of the SEResNet model.
  • pretrain: set to true to load the model with pre-trained weights for ImageNet
  • inchannels: the number of input channels.
  • nclasses: the number of output classes
Warning

SEResNet does not currently support pretrained weights.

Advanced users who want more configuration options will be better served by using resnet.

source
Metalhead.SEResNeXtType
SEResNeXt(depth::Integer; pretrain::Bool = false, cardinality::Integer = 32,
+          base_width::Integer = 4, inchannels::Integer = 3, nclasses::Integer = 1000)

Creates a SEResNeXt model with the specified depth, cardinality, and base width. (reference)

Arguments

  • depth: one of [50, 101, 152]. The depth of the SEResNeXt model.
  • pretrain: set to true to load the model with pre-trained weights for ImageNet
  • cardinality: the number of groups to be used in the 3x3 convolution in each block.
  • base_width: the number of feature maps in each group.
  • inchannels: the number of input channels
  • nclasses: the number of output classes
Warning

SEResNeXt does not currently support pretrained weights.

Advanced users who want more configuration options will be better served by using resnet.

source
Metalhead.Res2NetType
Res2Net(depth::Integer; pretrain::Bool = false, scale::Integer = 4,
         base_width::Integer = 26, inchannels::Integer = 3,
-        nclasses::Integer = 1000)

Creates a Res2Net model with the specified depth, scale, and base width. (reference)

Arguments

  • depth: one of [50, 101, 152]. The depth of the Res2Net model.
  • pretrain: set to true to load the model with pre-trained weights for ImageNet
  • scale: the number of feature groups in the block. See the paper for more details.
  • base_width: the number of feature maps in each group.
  • inchannels: the number of input channels.
  • nclasses: the number of output classes
Warning

Res2Net does not currently support pretrained weights.

Advanced users who want more configuration options will be better served by using resnet.

source
Metalhead.Res2NeXtType
Res2NeXt(depth::Integer; pretrain::Bool = false, scale::Integer = 4,
+        nclasses::Integer = 1000)

Creates a Res2Net model with the specified depth, scale, and base width. (reference)

Arguments

  • depth: one of [50, 101, 152]. The depth of the Res2Net model.
  • pretrain: set to true to load the model with pre-trained weights for ImageNet
  • scale: the number of feature groups in the block. See the paper for more details.
  • base_width: the number of feature maps in each group.
  • inchannels: the number of input channels.
  • nclasses: the number of output classes
Warning

Res2Net does not currently support pretrained weights.

Advanced users who want more configuration options will be better served by using resnet.

source
Metalhead.Res2NeXtType
Res2NeXt(depth::Integer; pretrain::Bool = false, scale::Integer = 4,
          base_width::Integer = 4, cardinality::Integer = 8,
-         inchannels::Integer = 3, nclasses::Integer = 1000)

Creates a Res2NeXt model with the specified depth, scale, base width and cardinality. (reference)

Arguments

  • depth: one of [50, 101, 152]. The depth of the Res2Net model.
  • pretrain: set to true to load the model with pre-trained weights for ImageNet
  • scale: the number of feature groups in the block. See the paper for more details.
  • base_width: the number of feature maps in each group.
  • cardinality: the number of groups in the 3x3 convolutions.
  • inchannels: the number of input channels.
  • nclasses: the number of output classes
Warning

Res2NeXt does not currently support pretrained weights.

Advanced users who want more configuration options will be better served by using resnet.

source

The mid-level function

Metalhead.resnetFunction
resnet(block_type, block_repeats::AbstractVector{<:Integer},
+         inchannels::Integer = 3, nclasses::Integer = 1000)

Creates a Res2NeXt model with the specified depth, scale, base width and cardinality. (reference)

Arguments

  • depth: one of [50, 101, 152]. The depth of the Res2Net model.
  • pretrain: set to true to load the model with pre-trained weights for ImageNet
  • scale: the number of feature groups in the block. See the paper for more details.
  • base_width: the number of feature maps in each group.
  • cardinality: the number of groups in the 3x3 convolutions.
  • inchannels: the number of input channels.
  • nclasses: the number of output classes
Warning

Res2NeXt does not currently support pretrained weights.

Advanced users who want more configuration options will be better served by using resnet.

source

The mid-level function

Metalhead.resnetFunction
resnet(block_type, block_repeats::AbstractVector{<:Integer},
        downsample_opt::NTuple{2, Any} = (downsample_conv, downsample_identity);
        cardinality::Integer = 1, base_width::Integer = 64,
        inplanes::Integer = 64, reduction_factor::Integer = 1,
@@ -15,28 +15,28 @@
        use_conv::Bool = false, dropblock_prob = nothing,
        stochastic_depth_prob = nothing, dropout_prob = nothing,
        imsize::Dims{2} = (256, 256), inchannels::Integer = 3,
-       nclasses::Integer = 1000, kwargs...)

Creates a generic ResNet-like model that is used to create The higher-level model constructors like ResNet, Wide ResNet, ResNeXt and Res2Net. For an even more generic model API, see Metalhead.build_resnet.

Arguments

  • block_type: The type of block to be used in the model. This can be one of Metalhead.basicblock, Metalhead.bottleneck and Metalhead.bottle2neck. basicblock is used in the original ResNet paper for ResNet-18 and ResNet-34, and bottleneck is used in the original ResNet-50 and ResNet-101 models, as well as for the Wide ResNet and ResNeXt models. bottle2neck is introduced in the Res2Net paper.
  • block_repeats: A Vector of integers specifying the number of times each block is repeated in each stage of the ResNet model. For example, [3, 4, 6, 3] is the configuration used in ResNet-50, which has 3 blocks in the first stage, 4 blocks in the second stage, 6 blocks in the third stage and 3 blocks in the fourth stage.
  • downsample_opt: A NTuple of two callbacks that are used to determine the downsampling operation to be used in the model. The first callback is used to determine the convolutional operation to be used in the downsampling operation and the second callback is used to determine the identity operation to be used in the downsampling operation.
  • cardinality: The number of groups to be used in the 3x3 convolutional layer in the bottleneck block. This is usually modified from the default value of 1 in the ResNet models to 32 or 64 in the ResNeXt models.
  • base_width: The base width of the convolutional layer in the blocks of the model.
  • inplanes: The number of input channels in the first convolutional layer.
  • reduction_factor: The reduction factor used in the model.
  • connection: This is a function that determines the residual connection in the model. For resnets, either of Metalhead.Layers.addact or Metalhead.Layers.actadd is recommended. These decide whether the residual connection is added before or after the activation function.
  • norm_layer: The normalisation layer to be used in the model.
  • revnorm: set to true to place the normalisation layers before the convolutions
  • attn_fn: A callback that is used to determine the attention function to be used in the model. See Metalhead.Layers.squeeze_excite for an example.
  • pool_layer: A fully-instantiated pooling layer passed in to be used by the classifier head. For example, AdaptiveMeanPool((1, 1)) is used in the ResNet family by default, but something like MeanPool((3, 3)) should also work provided the dimensions after applying the pooling layer are compatible with the rest of the classifier head.
  • use_conv: Set to true to use convolutions instead of identity operations in the model.
  • dropblock_prob: DropBlock probability to be used in the model. Set to nothing to disable DropBlock. See Metalhead.DropBlock for more details.
  • stochastic_depth_prob: StochasticDepth probability to be used in the model. Set to nothing to disable StochasticDepth. See Metalhead.StochasticDepth for more details.
  • dropout_prob: Dropout probability to be used in the classifier head. Set to nothing to disable Dropout.
  • imsize: The size of the input (height, width).
  • inchannels: The number of input channels.
  • nclasses: The number of output classes.
  • kwargs: Additional keyword arguments to be passed to the block builder (note: ignore this argument if you are not sure what it does. To know more about how this works, check out the section of the documentation that talks about builders in Metalhead and specifically for the ResNet block functions).
source

Lower-level functions and builders

Block functions

Metalhead.basicblockFunction
basicblock(inplanes::Integer, planes::Integer; stride::Integer = 1,
+       nclasses::Integer = 1000, kwargs...)

Creates a generic ResNet-like model that is used to create The higher-level model constructors like ResNet, Wide ResNet, ResNeXt and Res2Net. For an even more generic model API, see Metalhead.build_resnet.

Arguments

  • block_type: The type of block to be used in the model. This can be one of Metalhead.basicblock, Metalhead.bottleneck and Metalhead.bottle2neck. basicblock is used in the original ResNet paper for ResNet-18 and ResNet-34, and bottleneck is used in the original ResNet-50 and ResNet-101 models, as well as for the Wide ResNet and ResNeXt models. bottle2neck is introduced in the Res2Net paper.
  • block_repeats: A Vector of integers specifying the number of times each block is repeated in each stage of the ResNet model. For example, [3, 4, 6, 3] is the configuration used in ResNet-50, which has 3 blocks in the first stage, 4 blocks in the second stage, 6 blocks in the third stage and 3 blocks in the fourth stage.
  • downsample_opt: A NTuple of two callbacks that are used to determine the downsampling operation to be used in the model. The first callback is used to determine the convolutional operation to be used in the downsampling operation and the second callback is used to determine the identity operation to be used in the downsampling operation.
  • cardinality: The number of groups to be used in the 3x3 convolutional layer in the bottleneck block. This is usually modified from the default value of 1 in the ResNet models to 32 or 64 in the ResNeXt models.
  • base_width: The base width of the convolutional layer in the blocks of the model.
  • inplanes: The number of input channels in the first convolutional layer.
  • reduction_factor: The reduction factor used in the model.
  • connection: This is a function that determines the residual connection in the model. For resnets, either of Metalhead.Layers.addact or Metalhead.Layers.actadd is recommended. These decide whether the residual connection is added before or after the activation function.
  • norm_layer: The normalisation layer to be used in the model.
  • revnorm: set to true to place the normalisation layers before the convolutions
  • attn_fn: A callback that is used to determine the attention function to be used in the model. See Metalhead.Layers.squeeze_excite for an example.
  • pool_layer: A fully-instantiated pooling layer passed in to be used by the classifier head. For example, AdaptiveMeanPool((1, 1)) is used in the ResNet family by default, but something like MeanPool((3, 3)) should also work provided the dimensions after applying the pooling layer are compatible with the rest of the classifier head.
  • use_conv: Set to true to use convolutions instead of identity operations in the model.
  • dropblock_prob: DropBlock probability to be used in the model. Set to nothing to disable DropBlock. See Metalhead.DropBlock for more details.
  • stochastic_depth_prob: StochasticDepth probability to be used in the model. Set to nothing to disable StochasticDepth. See Metalhead.StochasticDepth for more details.
  • dropout_prob: Dropout probability to be used in the classifier head. Set to nothing to disable Dropout.
  • imsize: The size of the input (height, width).
  • inchannels: The number of input channels.
  • nclasses: The number of output classes.
  • kwargs: Additional keyword arguments to be passed to the block builder (note: ignore this argument if you are not sure what it does. To know more about how this works, check out the section of the documentation that talks about builders in Metalhead and specifically for the ResNet block functions).
source

Lower-level functions and builders

Block functions

Metalhead.basicblockFunction
basicblock(inplanes::Integer, planes::Integer; stride::Integer = 1,
            reduction_factor::Integer = 1, activation = relu,
            norm_layer = BatchNorm, revnorm::Bool = false,
            drop_block = identity, drop_path = identity,
-           attn_fn = planes -> identity)

Creates a basic residual block (see reference). This function creates the layers. For more configuration options and to see the function used to build the block for the model, see Metalhead.basicblock_builder.

Arguments

  • inplanes: number of input feature maps
  • planes: number of feature maps for the block
  • stride: the stride of the block
  • reduction_factor: the factor by which the input feature maps are reduced before

the first convolution.

  • activation: the activation function to use.
  • norm_layer: the normalization layer to use.
  • revnorm: set to true to place the normalisation layer before the convolution
  • drop_block: the drop block layer
  • drop_path: the drop path layer
  • attn_fn: the attention function to use. See squeeze_excite for an example.
source
Metalhead.bottleneckFunction
bottleneck(inplanes::Integer, planes::Integer; stride::Integer,
+           attn_fn = planes -> identity)

Creates a basic residual block (see reference). This function creates the layers. For more configuration options and to see the function used to build the block for the model, see Metalhead.basicblock_builder.

Arguments

  • inplanes: number of input feature maps
  • planes: number of feature maps for the block
  • stride: the stride of the block
  • reduction_factor: the factor by which the input feature maps are reduced before

the first convolution.

  • activation: the activation function to use.
  • norm_layer: the normalization layer to use.
  • revnorm: set to true to place the normalisation layer before the convolution
  • drop_block: the drop block layer
  • drop_path: the drop path layer
  • attn_fn: the attention function to use. See squeeze_excite for an example.
source
Metalhead.bottleneckFunction
bottleneck(inplanes::Integer, planes::Integer; stride::Integer,
            cardinality::Integer = 1, base_width::Integer = 64,
            reduction_factor::Integer = 1, activation = relu,
            norm_layer = BatchNorm, revnorm::Bool = false,
            drop_block = identity, drop_path = identity,
-           attn_fn = planes -> identity)

Creates a bottleneck residual block (see reference). This function creates the layers. For more configuration options and to see the function used to build the block for the model, see Metalhead.bottleneck_builder.

Arguments

  • inplanes: number of input feature maps
  • planes: number of feature maps for the block
  • stride: the stride of the block
  • cardinality: the number of groups in the convolution.
  • base_width: the number of output feature maps for each convolutional group.
  • reduction_factor: the factor by which the input feature maps are reduced before the first convolution.
  • activation: the activation function to use.
  • norm_layer: the normalization layer to use.
  • revnorm: set to true to place the normalisation layer before the convolution
  • drop_block: the drop block layer
  • drop_path: the drop path layer
  • attn_fn: the attention function to use. See squeeze_excite for an example.
source
Metalhead.bottle2neckFunction
bottle2neck(inplanes::Integer, planes::Integer; stride::Integer = 1,
+           attn_fn = planes -> identity)

Creates a bottleneck residual block (see reference). This function creates the layers. For more configuration options and to see the function used to build the block for the model, see Metalhead.bottleneck_builder.

Arguments

  • inplanes: number of input feature maps
  • planes: number of feature maps for the block
  • stride: the stride of the block
  • cardinality: the number of groups in the convolution.
  • base_width: the number of output feature maps for each convolutional group.
  • reduction_factor: the factor by which the input feature maps are reduced before the first convolution.
  • activation: the activation function to use.
  • norm_layer: the normalization layer to use.
  • revnorm: set to true to place the normalisation layer before the convolution
  • drop_block: the drop block layer
  • drop_path: the drop path layer
  • attn_fn: the attention function to use. See squeeze_excite for an example.
source
Metalhead.bottle2neckFunction
bottle2neck(inplanes::Integer, planes::Integer; stride::Integer = 1,
             cardinality::Integer = 1, base_width::Integer = 26,
             scale::Integer = 4, activation = relu, norm_layer = BatchNorm,
-            revnorm::Bool = false, attn_fn = planes -> identity)

Creates a bottleneck block as described in the Res2Net paper. (reference) This function creates the layers. For more configuration options and to see the function used to build the block for the model, see Metalhead.bottle2neck_builder.

Arguments

  • inplanes: number of input feature maps
  • planes: number of feature maps for the block
  • stride: the stride of the block
  • cardinality: the number of groups in the 3x3 convolutions.
  • base_width: the number of output feature maps for each convolutional group.
  • scale: the number of feature groups in the block. See the paper for more details.
  • activation: the activation function to use.
  • norm_layer: the normalization layer to use.
  • revnorm: set to true to place the batch norm before the convolution
  • attn_fn: the attention function to use. See squeeze_excite for an example.
source

Downsampling functions

Metalhead.downsample_identityFunction
downsample_identity(inplanes::Integer, outplanes::Integer; kwargs...)

Creates an identity downsample layer. This returns identity if inplanes == outplanes. If outplanes > inplanes, it maps the input to outplanes channels using a 1x1 max pooling layer and zero padding.

Warning

This does not currently support the scenario where inplanes > outplanes.

Arguments

  • inplanes: number of input feature maps
  • outplanes: number of output feature maps

Note that kwargs are ignored and only included for compatibility with other downsample layers.

source
Metalhead.downsample_convFunction
downsample_conv(inplanes::Integer, outplanes::Integer; stride::Integer = 1,
-                norm_layer = BatchNorm, revnorm::Bool = false)

Creates a 1x1 convolutional downsample layer as used in ResNet.

Arguments

  • inplanes: number of input feature maps
  • outplanes: number of output feature maps
  • stride: the stride of the convolution
  • norm_layer: the normalization layer to use.
  • revnorm: set to true to place the normalisation layer before the convolution
source
Metalhead.downsample_poolFunction
downsample_pool(inplanes::Integer, outplanes::Integer; stride::Integer = 1,
-                norm_layer = BatchNorm, revnorm::Bool = false)

Creates a pooling-based downsample layer as described in the Bag of Tricks paper. This adds an average pooling layer of size (2, 2) with stride followed by a 1x1 convolution.

Arguments

  • inplanes: number of input feature maps
  • outplanes: number of output feature maps
  • stride: the stride of the convolution
  • norm_layer: the normalization layer to use.
  • revnorm: set to true to place the normalisation layer before the convolution
source

Block builders

Metalhead.basicblock_builderFunction
basicblock_builder(block_repeats::AbstractVector{<:Integer};
+            revnorm::Bool = false, attn_fn = planes -> identity)

Creates a bottleneck block as described in the Res2Net paper. (reference) This function creates the layers. For more configuration options and to see the function used to build the block for the model, see Metalhead.bottle2neck_builder.

Arguments

  • inplanes: number of input feature maps
  • planes: number of feature maps for the block
  • stride: the stride of the block
  • cardinality: the number of groups in the 3x3 convolutions.
  • base_width: the number of output feature maps for each convolutional group.
  • scale: the number of feature groups in the block. See the paper for more details.
  • activation: the activation function to use.
  • norm_layer: the normalization layer to use.
  • revnorm: set to true to place the batch norm before the convolution
  • attn_fn: the attention function to use. See squeeze_excite for an example.
source

Downsampling functions

Metalhead.downsample_identityFunction
downsample_identity(inplanes::Integer, outplanes::Integer; kwargs...)

Creates an identity downsample layer. This returns identity if inplanes == outplanes. If outplanes > inplanes, it maps the input to outplanes channels using a 1x1 max pooling layer and zero padding.

Warning

This does not currently support the scenario where inplanes > outplanes.

Arguments

  • inplanes: number of input feature maps
  • outplanes: number of output feature maps

Note that kwargs are ignored and only included for compatibility with other downsample layers.

source
Metalhead.downsample_convFunction
downsample_conv(inplanes::Integer, outplanes::Integer; stride::Integer = 1,
+                norm_layer = BatchNorm, revnorm::Bool = false)

Creates a 1x1 convolutional downsample layer as used in ResNet.

Arguments

  • inplanes: number of input feature maps
  • outplanes: number of output feature maps
  • stride: the stride of the convolution
  • norm_layer: the normalization layer to use.
  • revnorm: set to true to place the normalisation layer before the convolution
source
Metalhead.downsample_poolFunction
downsample_pool(inplanes::Integer, outplanes::Integer; stride::Integer = 1,
+                norm_layer = BatchNorm, revnorm::Bool = false)

Creates a pooling-based downsample layer as described in the Bag of Tricks paper. This adds an average pooling layer of size (2, 2) with stride followed by a 1x1 convolution.

Arguments

  • inplanes: number of input feature maps
  • outplanes: number of output feature maps
  • stride: the stride of the convolution
  • norm_layer: the normalization layer to use.
  • revnorm: set to true to place the normalisation layer before the convolution
source

Block builders

Metalhead.basicblock_builderFunction
basicblock_builder(block_repeats::AbstractVector{<:Integer};
                    inplanes::Integer = 64, reduction_factor::Integer = 1,
                    expansion::Integer = 1, norm_layer = BatchNorm,
                    revnorm::Bool = false, activation = relu,
                    attn_fn = planes -> identity,
                    dropblock_prob = nothing, stochastic_depth_prob = nothing,
                    stride_fn = resnet_stride, planes_fn = resnet_planes,
-                   downsample_tuple = (downsample_conv, downsample_identity))

Builder for creating a basic block for a ResNet model. (reference)

Arguments

  • block_repeats: number of repeats of a block in each stage

  • inplanes: number of input channels

  • reduction_factor: reduction factor for the number of channels in each stage

  • expansion: expansion factor for the number of channels for the block

  • norm_layer: normalization layer to use

  • revnorm: set to true to place normalization layer before the convolution

  • activation: activation function to use

  • attn_fn: attention function to use

  • dropblock_prob: dropblock probability. Set to nothing to disable DropBlock

  • stochastic_depth_prob: stochastic depth probability. Set to nothing to disable StochasticDepth

  • stride_fn: callback for computing the stride of the block

  • planes_fn: callback for computing the number of channels in each block

  • downsample_tuple: two-element tuple of downsample functions to use. The first one is used when the number of channels changes in the block, the second one is used when the number of channels stays the same.

source
Metalhead.bottleneck_builderFunction
bottleneck_builder(block_repeats::AbstractVector{<:Integer};
+                   downsample_tuple = (downsample_conv, downsample_identity))

Builder for creating a basic block for a ResNet model. (reference)

Arguments

  • block_repeats: number of repeats of a block in each stage

  • inplanes: number of input channels

  • reduction_factor: reduction factor for the number of channels in each stage

  • expansion: expansion factor for the number of channels for the block

  • norm_layer: normalization layer to use

  • revnorm: set to true to place normalization layer before the convolution

  • activation: activation function to use

  • attn_fn: attention function to use

  • dropblock_prob: dropblock probability. Set to nothing to disable DropBlock

  • stochastic_depth_prob: stochastic depth probability. Set to nothing to disable StochasticDepth

  • stride_fn: callback for computing the stride of the block

  • planes_fn: callback for computing the number of channels in each block

  • downsample_tuple: two-element tuple of downsample functions to use. The first one is used when the number of channels changes in the block, the second one is used when the number of channels stays the same.

source
Metalhead.bottleneck_builderFunction
bottleneck_builder(block_repeats::AbstractVector{<:Integer};
                    inplanes::Integer = 64, cardinality::Integer = 1,
                    base_width::Integer = 64, reduction_factor::Integer = 1,
                    expansion::Integer = 4, norm_layer = BatchNorm,
@@ -44,13 +44,13 @@
                    attn_fn = planes -> identity, dropblock_prob = nothing,
                    stochastic_depth_prob = nothing, stride_fn = resnet_stride,
                    planes_fn = resnet_planes,
-                   downsample_tuple = (downsample_conv, downsample_identity))

Builder for creating a bottleneck block for a ResNet/ResNeXt model. (reference)

Arguments

  • block_repeats: number of repeats of a block in each stage
  • inplanes: number of input channels
  • cardinality: number of groups for the convolutional layer
  • base_width: base width for the convolutional layer
  • reduction_factor: reduction factor for the number of channels in each stage
  • expansion: expansion factor for the number of channels for the block
  • norm_layer: normalization layer to use
  • revnorm: set to true to place normalization layer before the convolution
  • activation: activation function to use
  • attn_fn: attention function to use
  • dropblock_prob: dropblock probability. Set to nothing to disable DropBlock
  • stochastic_depth_prob: stochastic depth probability. Set to nothing to disable StochasticDepth
  • stride_fn: callback for computing the stride of the block
  • planes_fn: callback for computing the number of channels in each block
  • downsample_tuple: two-element tuple of downsample functions to use. The first one is used when the number of channels changes in the block, the second one is used when the number of channels stays the same.
source
Metalhead.bottle2neck_builderFunction
bottle2neck_builder(block_repeats::AbstractVector{<:Integer};
+                   downsample_tuple = (downsample_conv, downsample_identity))

Builder for creating a bottleneck block for a ResNet/ResNeXt model. (reference)

Arguments

  • block_repeats: number of repeats of a block in each stage
  • inplanes: number of input channels
  • cardinality: number of groups for the convolutional layer
  • base_width: base width for the convolutional layer
  • reduction_factor: reduction factor for the number of channels in each stage
  • expansion: expansion factor for the number of channels for the block
  • norm_layer: normalization layer to use
  • revnorm: set to true to place normalization layer before the convolution
  • activation: activation function to use
  • attn_fn: attention function to use
  • dropblock_prob: dropblock probability. Set to nothing to disable DropBlock
  • stochastic_depth_prob: stochastic depth probability. Set to nothing to disable StochasticDepth
  • stride_fn: callback for computing the stride of the block
  • planes_fn: callback for computing the number of channels in each block
  • downsample_tuple: two-element tuple of downsample functions to use. The first one is used when the number of channels changes in the block, the second one is used when the number of channels stays the same.
source
Metalhead.bottle2neck_builderFunction
bottle2neck_builder(block_repeats::AbstractVector{<:Integer};
                     inplanes::Integer = 64, cardinality::Integer = 1,
                     base_width::Integer = 26, scale::Integer = 4,
                     expansion::Integer = 4, norm_layer = BatchNorm,
                     revnorm::Bool = false, activation = relu,
                     attn_fn = planes -> identity, stride_fn = resnet_stride,
                     planes_fn = resnet_planes,
-                    downsample_tuple = (downsample_conv, downsample_identity))

Builder for creating a bottle2neck block for a Res2Net model. (reference)

Arguments

  • block_repeats: number of repeats of a block in each stage
  • inplanes: number of input channels
  • cardinality: number of groups for the convolutional layer
  • base_width: base width for the convolutional layer
  • scale: scale for the number of channels in each block
  • expansion: expansion factor for the number of channels for the block
  • norm_layer: normalization layer to use
  • revnorm: set to true to place normalization layer before the convolution
  • activation: activation function to use
  • attn_fn: attention function to use
  • stride_fn: callback for computing the stride of the block
  • planes_fn: callback for computing the number of channels in each block
  • downsample_tuple: two-element tuple of downsample functions to use. The first one is used when the number of channels changes in the block, the second one is used when the number of channels stays the same.
source

Generic ResNet model builder

Metalhead.build_resnetFunction
build_resnet(img_dims, stem, get_layers, block_repeats::AbstractVector{<:Integer},
-             connection, classifier_fn)

Creates a generic ResNet-like model.

Info

This is a very generic, flexible but low level function that can be used to create any of the ResNet variants. For a more user friendly function, see Metalhead.resnet.

Arguments

  • img_dims: The dimensions of the input image. This is used to determine the number of feature maps to be passed to the classifier. This should be a tuple of the form (height, width, channels).
  • stem: The stem of the ResNet model. The stem should be created outside of this function and passed in as an argument. This is done to allow for more flexibility in creating the stem. resnet_stem is a helper function that Metalhead provides which is recommended for creating the stem.
  • get_layers is a function that takes in two inputs - the stage_idx, or the index of the stage, and the block_idx, or the index of the block within the stage. It returns a tuple of layers. If the tuple returned by get_layers has more than one element, then connection is used to splat this tuple into Parallel - if not, then the only element of the tuple is directly inserted into the network. get_layers is a very specific function and should not be created on its own. Instead, use one of the builders provided by Metalhead to create it.
  • block_repeats: This is a Vector of integers that specifies the number of repeats of each block in each stage.
  • connection: This is a function that determines the residual connection in the model. For resnets, either of Metalhead.Layers.addact or Metalhead.Layers.actadd is recommended.
  • classifier_fn: This is a function that takes in the number of feature maps and returns a classifier. This is usually built as a closure using a function like Metalhead.create_classifier. For example, if the number of output classes is nclasses, then the function can be defined as channels -> create_classifier(channels, nclasses).
source

Utility callbacks

Metalhead.resnet_planesFunction
resnet_planes(block_repeats::AbstractVector{<:Integer})

Default callback for determining the number of channels in each block in a ResNet model.

Arguments

block_repeats: A Vector of integers specifying the number of times each block is repeated in each stage of the ResNet model. For example, [3, 4, 6, 3] is the configuration used in ResNet-50, which has 3 blocks in the first stage, 4 blocks in the second stage, 6 blocks in the third stage and 3 blocks in the fourth stage.

source
Metalhead.resnet_strideFunction
resnet_stride(stage_idx::Integer, block_idx::Integer)

Default callback for determining the stride of a block in a ResNet model. Returns 2 for the first block in every stage except the first stage and 1 for all other blocks.

Arguments

  • stage_idx: The index of the stage in the ResNet model.
  • block_idx: The index of the block in the stage.
source
Metalhead.resnet_stemFunction
resnet_stem(; stem_type = :default, inchannels::Integer = 3, replace_stem_pool = false,
-              norm_layer = BatchNorm, activation = relu)

Builds a stem to be used in a ResNet model. See the stem argument of resnet for details on how to use this function.

Arguments

  • stem_type: The type of stem to be built. One of [:default, :deep, :deep_tiered].

    • :default: Builds a stem based on the default ResNet stem, which consists of a single 7x7 convolution with stride 2 and a normalisation layer followed by a 3x3 max pooling layer with stride 2.
    • :deep: This borrows ideas from other papers (InceptionResNetv2, for example) in using a deeper stem with 3 successive 3x3 convolutions having normalisation layers after each one. This is followed by a 3x3 max pooling layer with stride 2.
    • :deep_tiered: A variant of the :deep stem that has a larger width in the second convolution. This is an experimental variant from the timm library in Python that shows peformance improvements over the :deep stem in some cases.
  • inchannels: number of input channels

  • replace_pool: Set to true to replace the max pooling layers with a 3x3 convolution + normalization with a stride of two.

  • norm_layer: The normalisation layer used in the stem.

  • activation: The activation function used in the stem.

source
+ downsample_tuple = (downsample_conv, downsample_identity))

Builder for creating a bottle2neck block for a Res2Net model. (reference)

Arguments

  • block_repeats: number of repeats of a block in each stage
  • inplanes: number of input channels
  • cardinality: number of groups for the convolutional layer
  • base_width: base width for the convolutional layer
  • scale: scale for the number of channels in each block
  • expansion: expansion factor for the number of channels for the block
  • norm_layer: normalization layer to use
  • revnorm: set to true to place normalization layer before the convolution
  • activation: activation function to use
  • attn_fn: attention function to use
  • stride_fn: callback for computing the stride of the block
  • planes_fn: callback for computing the number of channels in each block
  • downsample_tuple: two-element tuple of downsample functions to use. The first one is used when the number of channels changes in the block, the second one is used when the number of channels stays the same.
source

Generic ResNet model builder

Metalhead.build_resnetFunction
build_resnet(img_dims, stem, get_layers, block_repeats::AbstractVector{<:Integer},
+             connection, classifier_fn)

Creates a generic ResNet-like model.

Info

This is a very generic, flexible but low level function that can be used to create any of the ResNet variants. For a more user friendly function, see Metalhead.resnet.

Arguments

  • img_dims: The dimensions of the input image. This is used to determine the number of feature maps to be passed to the classifier. This should be a tuple of the form (height, width, channels).
  • stem: The stem of the ResNet model. The stem should be created outside of this function and passed in as an argument. This is done to allow for more flexibility in creating the stem. resnet_stem is a helper function that Metalhead provides which is recommended for creating the stem.
  • get_layers is a function that takes in two inputs - the stage_idx, or the index of the stage, and the block_idx, or the index of the block within the stage. It returns a tuple of layers. If the tuple returned by get_layers has more than one element, then connection is used to splat this tuple into Parallel - if not, then the only element of the tuple is directly inserted into the network. get_layers is a very specific function and should not be created on its own. Instead, use one of the builders provided by Metalhead to create it.
  • block_repeats: This is a Vector of integers that specifies the number of repeats of each block in each stage.
  • connection: This is a function that determines the residual connection in the model. For resnets, either of Metalhead.Layers.addact or Metalhead.Layers.actadd is recommended.
  • classifier_fn: This is a function that takes in the number of feature maps and returns a classifier. This is usually built as a closure using a function like Metalhead.create_classifier. For example, if the number of output classes is nclasses, then the function can be defined as channels -> create_classifier(channels, nclasses).
source

Utility callbacks

Metalhead.resnet_planesFunction
resnet_planes(block_repeats::AbstractVector{<:Integer})

Default callback for determining the number of channels in each block in a ResNet model.

Arguments

block_repeats: A Vector of integers specifying the number of times each block is repeated in each stage of the ResNet model. For example, [3, 4, 6, 3] is the configuration used in ResNet-50, which has 3 blocks in the first stage, 4 blocks in the second stage, 6 blocks in the third stage and 3 blocks in the fourth stage.

source
Metalhead.resnet_strideFunction
resnet_stride(stage_idx::Integer, block_idx::Integer)

Default callback for determining the stride of a block in a ResNet model. Returns 2 for the first block in every stage except the first stage and 1 for all other blocks.

Arguments

  • stage_idx: The index of the stage in the ResNet model.
  • block_idx: The index of the block in the stage.
source
Metalhead.resnet_stemFunction
resnet_stem(; stem_type = :default, inchannels::Integer = 3, replace_stem_pool = false,
+              norm_layer = BatchNorm, activation = relu)

Builds a stem to be used in a ResNet model. See the stem argument of resnet for details on how to use this function.

Arguments

  • stem_type: The type of stem to be built. One of [:default, :deep, :deep_tiered].

    • :default: Builds a stem based on the default ResNet stem, which consists of a single 7x7 convolution with stride 2 and a normalisation layer followed by a 3x3 max pooling layer with stride 2.
    • :deep: This borrows ideas from other papers (InceptionResNetv2, for example) in using a deeper stem with 3 successive 3x3 convolutions having normalisation layers after each one. This is followed by a 3x3 max pooling layer with stride 2.
    • :deep_tiered: A variant of the :deep stem that has a larger width in the second convolution. This is an experimental variant from the timm library in Python that shows peformance improvements over the :deep stem in some cases.
  • inchannels: number of input channels

  • replace_pool: Set to true to replace the max pooling layers with a 3x3 convolution + normalization with a stride of two.

  • norm_layer: The normalisation layer used in the stem.

  • activation: The activation function used in the stem.

source
diff --git a/dev/api/utilities/index.html b/dev/api/utilities/index.html index 191bb312..fc3deb8c 100644 --- a/dev/api/utilities/index.html +++ b/dev/api/utilities/index.html @@ -1,2 +1,2 @@ -Model Utilities · Metalhead.jl

Model utilities

Metalhead provides some utility functions for making it easier to work with the models inside the library or to build new ones. The API reference for these is documented below.

Metalhead.backboneFunction
backbone(model)

This function returns the backbone of a model that can be used for feature extraction. A Flux.Chain is returned, which can be indexed/sliced into to get the desired layer(s). Note that the model used here as input must be the "camel-cased" version of the model, e.g. ResNet instead of resnet.

source
Metalhead.classifierFunction
classifier(model)

This function returns the classifier head of a model. This is sometimes useful for fine-tuning a model on a different dataset. A Flux.Chain is returned, which can be indexed/sliced into to get the desired layer(s). Note that the model used here as input must be the "camel-cased" version of the model, e.g. ResNet instead of resnet.

source
+Model Utilities · Metalhead.jl

Model utilities

Metalhead provides some utility functions for making it easier to work with the models inside the library or to build new ones. The API reference for these is documented below.

Metalhead.backboneFunction
backbone(model)

This function returns the backbone of a model that can be used for feature extraction. A Flux.Chain is returned, which can be indexed/sliced into to get the desired layer(s). Note that the model used here as input must be the "camel-cased" version of the model, e.g. ResNet instead of resnet.

source
Metalhead.classifierFunction
classifier(model)

This function returns the classifier head of a model. This is sometimes useful for fine-tuning a model on a different dataset. A Flux.Chain is returned, which can be indexed/sliced into to get the desired layer(s). Note that the model used here as input must be the "camel-cased" version of the model, e.g. ResNet instead of resnet.

source
diff --git a/dev/api/vit/index.html b/dev/api/vit/index.html index 8ef1feca..ff0df044 100644 --- a/dev/api/vit/index.html +++ b/dev/api/vit/index.html @@ -1,5 +1,5 @@ Vision Transformer models · Metalhead.jl

Vision Transformer models

This is the API reference for the Vision Transformer models supported by Metalhead.jl.

The higher-level model constructors

Metalhead.ViTType
ViT(config::Symbol = base; imsize::Dims{2} = (224, 224), inchannels::Integer = 3,
-    patch_size::Dims{2} = (16, 16), pool = :class, nclasses::Integer = 1000)

Creates a Vision Transformer (ViT) model. (reference).

Arguments

  • config: the model configuration, one of [:tiny, :small, :base, :large, :huge, :giant, :gigantic]
  • imsize: image size
  • inchannels: number of input channels
  • patch_size: size of the patches
  • pool: pooling type, either :class or :mean
  • nclasses: number of classes in the output

See also Metalhead.vit.

source

The mid-level functions

Metalhead.vitFunction
vit(imsize::Dims{2} = (256, 256); inchannels::Integer = 3, patch_size::Dims{2} = (16, 16),
+    patch_size::Dims{2} = (16, 16), pool = :class, nclasses::Integer = 1000)

Creates a Vision Transformer (ViT) model. (reference).

Arguments

  • config: the model configuration, one of [:tiny, :small, :base, :large, :huge, :giant, :gigantic]
  • imsize: image size
  • inchannels: number of input channels
  • patch_size: size of the patches
  • pool: pooling type, either :class or :mean
  • nclasses: number of classes in the output

See also Metalhead.vit.

source

The mid-level functions

Metalhead.vitFunction
vit(imsize::Dims{2} = (256, 256); inchannels::Integer = 3, patch_size::Dims{2} = (16, 16),
     embedplanes = 768, depth = 6, nheads = 16, mlp_ratio = 4.0, dropout_prob = 0.1,
-    emb_dropout_prob = 0.1, pool = :class, nclasses::Integer = 1000)

Creates a Vision Transformer (ViT) model. (reference).

Arguments

  • imsize: image size
  • inchannels: number of input channels
  • patch_size: size of the patches
  • embedplanes: the number of channels after the patch embedding
  • depth: number of blocks in the transformer
  • nheads: number of attention heads in the transformer
  • mlpplanes: number of hidden channels in the MLP block in the transformer
  • dropout_prob: dropout probability
  • emb_dropout: dropout probability for the positional embedding layer
  • pool: pooling type, either :class or :mean
  • nclasses: number of classes in the output
source
+ emb_dropout_prob = 0.1, pool = :class, nclasses::Integer = 1000)

Creates a Vision Transformer (ViT) model. (reference).

Arguments

source diff --git a/dev/assets/documenter.js b/dev/assets/documenter.js index 82252a11..7d68cd80 100644 --- a/dev/assets/documenter.js +++ b/dev/assets/documenter.js @@ -612,176 +612,194 @@ function worker_function(documenterSearchIndex, documenterBaseURL, filters) { }; } -// `worker = Threads.@spawn worker_function(documenterSearchIndex)`, but in JavaScript! -const filters = [ - ...new Set(documenterSearchIndex["docs"].map((x) => x.category)), -]; -const worker_str = - "(" + - worker_function.toString() + - ")(" + - JSON.stringify(documenterSearchIndex["docs"]) + - "," + - JSON.stringify(documenterBaseURL) + - "," + - JSON.stringify(filters) + - ")"; -const worker_blob = new Blob([worker_str], { type: "text/javascript" }); -const worker = new Worker(URL.createObjectURL(worker_blob)); - /////// SEARCH MAIN /////// -// Whether the worker is currently handling a search. This is a boolean -// as the worker only ever handles 1 or 0 searches at a time. -var worker_is_running = false; - -// The last search text that was sent to the worker. This is used to determine -// if the worker should be launched again when it reports back results. -var last_search_text = ""; - -// The results of the last search. This, in combination with the state of the filters -// in the DOM, is used compute the results to display on calls to update_search. -var unfiltered_results = []; - -// Which filter is currently selected -var selected_filter = ""; - -$(document).on("input", ".documenter-search-input", function (event) { - if (!worker_is_running) { - launch_search(); - } -}); - -function launch_search() { - worker_is_running = true; - last_search_text = $(".documenter-search-input").val(); - worker.postMessage(last_search_text); -} - -worker.onmessage = function (e) { - if (last_search_text !== $(".documenter-search-input").val()) { - launch_search(); - } else { - worker_is_running = false; - } - - unfiltered_results = e.data; - update_search(); -}; +function runSearchMainCode() { + // `worker = Threads.@spawn worker_function(documenterSearchIndex)`, but in JavaScript! + const filters = [ + ...new Set(documenterSearchIndex["docs"].map((x) => x.category)), + ]; + const worker_str = + "(" + + worker_function.toString() + + ")(" + + JSON.stringify(documenterSearchIndex["docs"]) + + "," + + JSON.stringify(documenterBaseURL) + + "," + + JSON.stringify(filters) + + ")"; + const worker_blob = new Blob([worker_str], { type: "text/javascript" }); + const worker = new Worker(URL.createObjectURL(worker_blob)); + + // Whether the worker is currently handling a search. This is a boolean + // as the worker only ever handles 1 or 0 searches at a time. + var worker_is_running = false; + + // The last search text that was sent to the worker. This is used to determine + // if the worker should be launched again when it reports back results. + var last_search_text = ""; + + // The results of the last search. This, in combination with the state of the filters + // in the DOM, is used compute the results to display on calls to update_search. + var unfiltered_results = []; + + // Which filter is currently selected + var selected_filter = ""; + + $(document).on("input", ".documenter-search-input", function (event) { + if (!worker_is_running) { + launch_search(); + } + }); -$(document).on("click", ".search-filter", function () { - if ($(this).hasClass("search-filter-selected")) { - selected_filter = ""; - } else { - selected_filter = $(this).text().toLowerCase(); + function launch_search() { + worker_is_running = true; + last_search_text = $(".documenter-search-input").val(); + worker.postMessage(last_search_text); } - // This updates search results and toggles classes for UI: - update_search(); -}); + worker.onmessage = function (e) { + if (last_search_text !== $(".documenter-search-input").val()) { + launch_search(); + } else { + worker_is_running = false; + } -/** - * Make/Update the search component - */ -function update_search() { - let querystring = $(".documenter-search-input").val(); + unfiltered_results = e.data; + update_search(); + }; - if (querystring.trim()) { - if (selected_filter == "") { - results = unfiltered_results; + $(document).on("click", ".search-filter", function () { + if ($(this).hasClass("search-filter-selected")) { + selected_filter = ""; } else { - results = unfiltered_results.filter((result) => { - return selected_filter == result.category.toLowerCase(); - }); + selected_filter = $(this).text().toLowerCase(); } - let search_result_container = ``; - let modal_filters = make_modal_body_filters(); - let search_divider = `
`; + // This updates search results and toggles classes for UI: + update_search(); + }); - if (results.length) { - let links = []; - let count = 0; - let search_results = ""; - - for (var i = 0, n = results.length; i < n && count < 200; ++i) { - let result = results[i]; - if (result.location && !links.includes(result.location)) { - search_results += result.div; - count++; - links.push(result.location); - } - } + /** + * Make/Update the search component + */ + function update_search() { + let querystring = $(".documenter-search-input").val(); - if (count == 1) { - count_str = "1 result"; - } else if (count == 200) { - count_str = "200+ results"; + if (querystring.trim()) { + if (selected_filter == "") { + results = unfiltered_results; } else { - count_str = count + " results"; + results = unfiltered_results.filter((result) => { + return selected_filter == result.category.toLowerCase(); + }); } - let result_count = `
${count_str}
`; - search_result_container = ` + let search_result_container = ``; + let modal_filters = make_modal_body_filters(); + let search_divider = `
`; + + if (results.length) { + let links = []; + let count = 0; + let search_results = ""; + + for (var i = 0, n = results.length; i < n && count < 200; ++i) { + let result = results[i]; + if (result.location && !links.includes(result.location)) { + search_results += result.div; + count++; + links.push(result.location); + } + } + + if (count == 1) { + count_str = "1 result"; + } else if (count == 200) { + count_str = "200+ results"; + } else { + count_str = count + " results"; + } + let result_count = `
${count_str}
`; + + search_result_container = ` +
+ ${modal_filters} + ${search_divider} + ${result_count} +
+ ${search_results} +
+
+ `; + } else { + search_result_container = `
${modal_filters} ${search_divider} - ${result_count} -
- ${search_results} -
-
+
0 result(s)
+ +
No result found!
`; - } else { - search_result_container = ` -
- ${modal_filters} - ${search_divider} -
0 result(s)
-
-
No result found!
- `; - } + } - if ($(".search-modal-card-body").hasClass("is-justify-content-center")) { - $(".search-modal-card-body").removeClass("is-justify-content-center"); - } + if ($(".search-modal-card-body").hasClass("is-justify-content-center")) { + $(".search-modal-card-body").removeClass("is-justify-content-center"); + } - $(".search-modal-card-body").html(search_result_container); - } else { - if (!$(".search-modal-card-body").hasClass("is-justify-content-center")) { - $(".search-modal-card-body").addClass("is-justify-content-center"); + $(".search-modal-card-body").html(search_result_container); + } else { + if (!$(".search-modal-card-body").hasClass("is-justify-content-center")) { + $(".search-modal-card-body").addClass("is-justify-content-center"); + } + + $(".search-modal-card-body").html(` +
Type something to get started!
+ `); } + } - $(".search-modal-card-body").html(` -
Type something to get started!
- `); + /** + * Make the modal filter html + * + * @returns string + */ + function make_modal_body_filters() { + let str = filters + .map((val) => { + if (selected_filter == val.toLowerCase()) { + return `${val}`; + } else { + return `${val}`; + } + }) + .join(""); + + return ` +
+ Filters: + ${str} +
`; } } -/** - * Make the modal filter html - * - * @returns string - */ -function make_modal_body_filters() { - let str = filters - .map((val) => { - if (selected_filter == val.toLowerCase()) { - return `${val}`; - } else { - return `${val}`; - } - }) - .join(""); - - return ` -
- Filters: - ${str} -
`; +function waitUntilSearchIndexAvailable() { + // It is possible that the documenter.js script runs before the page + // has finished loading and documenterSearchIndex gets defined. + // So we need to wait until the search index actually loads before setting + // up all the search-related stuff. + if (typeof documenterSearchIndex !== "undefined") { + runSearchMainCode(); + } else { + console.warn("Search Index not available, waiting"); + setTimeout(waitUntilSearchIndexAvailable, 1000); + } } +// The actual entry point to the search code +waitUntilSearchIndexAvailable(); + }) //////////////////////////////////////////////////////////////////////////////// require(['jquery'], function($) { diff --git a/dev/contributing/index.html b/dev/contributing/index.html index d8a61b7a..f89c352e 100644 --- a/dev/contributing/index.html +++ b/dev/contributing/index.html @@ -1,2 +1,2 @@ -Contributing to Metalhead · Metalhead.jl

Contribute to Metalhead.jl

We welcome contributions from anyone to Metalhead.jl! Thank you for taking the time to make our ecosystem better.

You can contribute by fixing bugs, adding new models, or adding pre-trained weights. If you aren't ready to write some code, but you think you found a bug or have a feature request, please post an issue.

Before continuing, make sure you read the FluxML contributing guide for general guidelines and tips.

Fixing bugs

To fix a bug in Metalhead.jl, you can open a PR. It would be helpful to file an issue first so that we can confirm the bug.

Adding models

To add a new model architecture to Metalhead.jl, you can open a PR. Keep in mind a few guiding principles for how this package is designed:

  • reuse layers from Flux as much as possible (e.g. use Parallel before defining a Bottleneck struct)
  • adhere as closely as possible to a reference such as a published paper (i.e. the structure of your model should follow intuitively from the paper)
  • use generic functional builders (e.g. Metalhead.resnet is the underlying function that builds "ResNet-like" models)
  • use multiple dispatch to add convenience constructors that wrap your functional builder

When in doubt, just open a PR! We are more than happy to help review your code to help it align with the rest of the library. After adding a model, you might consider adding some pre-trained weights (see below).

Adding pre-trained weights

To add pre-trained weights for an existing model or new model, you can open a PR. Below, we describe the steps you should follow to get there.

All Metalhead.jl model artifacts are hosted on HuggingFace. You can find the FluxML account here. This documentation from HuggingFace will provide you with an introduction to their ModelHub. In short, the Model Hub is a collection of Git repositories, similar to Julia packages on GitHub. This means you can make a pull request to our HuggingFace repositories to upload updated weight artifacts just like you would make a PR on GitHub to upload code.

  1. Train your model or port the weights from another framework.
  2. Save the model state using BSON.jl with BSON.@save "modelname.bson" model_state=Flux.state(model). It is important that your model is saved under the key model_state.
  3. Compress the saved model as a tarball using tar -cvzf modelname.tar.gz modelname.bson.
  4. Obtain the SHAs (see the Pkg docs). Edit the Artifacts.toml file in the Metalhead.jl repository and add entry for your model. You can leave the URL empty for now.
  5. Open a PR on Metalhead.jl. Be sure to ping a maintainer (e.g. @darsnack or @theabhirath) to let us know that you are adding a pre-trained weight. We will create a model repository on HuggingFace if it does not already exist.
  6. Open a PR to the corresponding HuggingFace repo. Do this by going to the "Community" tab in the HuggingFace repository. PRs and discussions are shown as the same thing in the HuggingFace web app. You can use your local Git program to make clone the repo and make PRs if you wish. Check out the guide on PRs to HuggingFace for more information.
  7. Copy the download URL for the model file that you added to HuggingFace. Make sure to grab the URL for a specific commit and not for the main branch.
  8. Update your Metalhead.jl PR by adding the URL to the Artifacts.toml.
  9. If the tests pass for your weights, we will merge your PR! Your model should pass the acctest function in the Metalhead.jl test suite. If your model already exists in the repo, then these tests are already in place, and you can add your model configuration to the PRETRAINED_MODELS list in the runtests.jl file. Please refer to the ResNet tests as an example.

If you want to fix existing weights, then you can follow the same set of steps.

See the scripts/ folder in the repo for some helpful scripts that can be used to automate some of these steps.

+Contributing to Metalhead · Metalhead.jl

Contribute to Metalhead.jl

We welcome contributions from anyone to Metalhead.jl! Thank you for taking the time to make our ecosystem better.

You can contribute by fixing bugs, adding new models, or adding pre-trained weights. If you aren't ready to write some code, but you think you found a bug or have a feature request, please post an issue.

Before continuing, make sure you read the FluxML contributing guide for general guidelines and tips.

Fixing bugs

To fix a bug in Metalhead.jl, you can open a PR. It would be helpful to file an issue first so that we can confirm the bug.

Adding models

To add a new model architecture to Metalhead.jl, you can open a PR. Keep in mind a few guiding principles for how this package is designed:

  • reuse layers from Flux as much as possible (e.g. use Parallel before defining a Bottleneck struct)
  • adhere as closely as possible to a reference such as a published paper (i.e. the structure of your model should follow intuitively from the paper)
  • use generic functional builders (e.g. Metalhead.resnet is the underlying function that builds "ResNet-like" models)
  • use multiple dispatch to add convenience constructors that wrap your functional builder

When in doubt, just open a PR! We are more than happy to help review your code to help it align with the rest of the library. After adding a model, you might consider adding some pre-trained weights (see below).

Adding pre-trained weights

To add pre-trained weights for an existing model or new model, you can open a PR. Below, we describe the steps you should follow to get there.

All Metalhead.jl model artifacts are hosted on HuggingFace. You can find the FluxML account here. This documentation from HuggingFace will provide you with an introduction to their ModelHub. In short, the Model Hub is a collection of Git repositories, similar to Julia packages on GitHub. This means you can make a pull request to our HuggingFace repositories to upload updated weight artifacts just like you would make a PR on GitHub to upload code.

  1. Train your model or port the weights from another framework.
  2. Save the model state using BSON.jl with BSON.@save "modelname.bson" model_state=Flux.state(model). It is important that your model is saved under the key model_state.
  3. Compress the saved model as a tarball using tar -cvzf modelname.tar.gz modelname.bson.
  4. Obtain the SHAs (see the Pkg docs). Edit the Artifacts.toml file in the Metalhead.jl repository and add entry for your model. You can leave the URL empty for now.
  5. Open a PR on Metalhead.jl. Be sure to ping a maintainer (e.g. @darsnack or @theabhirath) to let us know that you are adding a pre-trained weight. We will create a model repository on HuggingFace if it does not already exist.
  6. Open a PR to the corresponding HuggingFace repo. Do this by going to the "Community" tab in the HuggingFace repository. PRs and discussions are shown as the same thing in the HuggingFace web app. You can use your local Git program to make clone the repo and make PRs if you wish. Check out the guide on PRs to HuggingFace for more information.
  7. Copy the download URL for the model file that you added to HuggingFace. Make sure to grab the URL for a specific commit and not for the main branch.
  8. Update your Metalhead.jl PR by adding the URL to the Artifacts.toml.
  9. If the tests pass for your weights, we will merge your PR! Your model should pass the acctest function in the Metalhead.jl test suite. If your model already exists in the repo, then these tests are already in place, and you can add your model configuration to the PRETRAINED_MODELS list in the runtests.jl file. Please refer to the ResNet tests as an example.

If you want to fix existing weights, then you can follow the same set of steps.

See the scripts/ folder in the repo for some helpful scripts that can be used to automate some of these steps.

diff --git a/dev/howto/resnet/index.html b/dev/howto/resnet/index.html index 6dac49a7..c2582a84 100644 --- a/dev/howto/resnet/index.html +++ b/dev/howto/resnet/index.html @@ -8,4 +8,4 @@ stochastic_depth_prob = 0.2)

To make this a ResNeXt-like model, all we need to do is configure the cardinality and the base width:

custom_resnet = Metalhead.resnet(Metalhead.bottleneck, [3, 4, 6, 3];
                                  cardinality = 32, base_width = 4,
                                  pool_layer = AdaptiveMeanMaxPool((1, 1)),
-                                 stochastic_depth_prob = 0.2)

And we have a custom model, built with minimal effort! The documentation for Metalhead.resnet has been written with extensive care and in as much detail as possible to facilitate ease of use. Still, if you find anything difficult to understand, feel free to open an issue and we will be happy to help you out, and to improve the documentation where necessary.

+ stochastic_depth_prob = 0.2)

And we have a custom model, built with minimal effort! The documentation for Metalhead.resnet has been written with extensive care and in as much detail as possible to facilitate ease of use. Still, if you find anything difficult to understand, feel free to open an issue and we will be happy to help you out, and to improve the documentation where necessary.

diff --git a/dev/index.html b/dev/index.html index 4814a717..667d180f 100644 --- a/dev/index.html +++ b/dev/index.html @@ -1,2 +1,2 @@ -Home · Metalhead.jl

Metalhead

Dev CI Coverage

Metalhead.jl provides standard machine learning vision models for use with Flux.jl. The architectures in this package make use of pure Flux layers, and they represent the best-practices for creating modules like residual blocks, inception blocks, etc. in Flux. Metalhead also provides some building blocks for more complex models in the Layers module.

Installation

julia> ]add Metalhead

Getting Started

You can find the Metalhead.jl getting started guide here.

Available models

To contribute new models, see our contributing docs.

Image Classification

Model NameConstructorPre-trained?
AlexNetAlexNetN
ConvMixerConvMixerN
ConvNeXtConvNeXtN
DenseNetDenseNetN
EfficientNetEfficientNetN
EfficientNetv2EfficientNetv2N
gMLPgMLPN
GoogLeNetGoogLeNetN
Inception-v3Inceptionv3N
Inception-v4Inceptionv4N
InceptionResNet-v2InceptionResNetv2N
MLPMixerMLPMixerN
MobileNetv1MobileNetv1N
MobileNetv2MobileNetv2N
MobileNetv3MobileNetv3N
MNASNetMNASNetN
ResMLPResMLPN
ResNetResNetY
ResNeXtResNeXtY
SqueezeNetSqueezeNetY
XceptionXceptionN
WideResNetWideResNetY
VGGVGGY
Vision TransformerViTY

Other Models

Model NameConstructorPre-trained?
UNetUNetN
+Home · Metalhead.jl

Metalhead

Dev CI Coverage

Metalhead.jl provides standard machine learning vision models for use with Flux.jl. The architectures in this package make use of pure Flux layers, and they represent the best-practices for creating modules like residual blocks, inception blocks, etc. in Flux. Metalhead also provides some building blocks for more complex models in the Layers module.

Installation

julia> ]add Metalhead

Getting Started

You can find the Metalhead.jl getting started guide here.

Available models

To contribute new models, see our contributing docs.

Image Classification

Model NameConstructorPre-trained?
AlexNetAlexNetN
ConvMixerConvMixerN
ConvNeXtConvNeXtN
DenseNetDenseNetN
EfficientNetEfficientNetN
EfficientNetv2EfficientNetv2N
gMLPgMLPN
GoogLeNetGoogLeNetN
Inception-v3Inceptionv3N
Inception-v4Inceptionv4N
InceptionResNet-v2InceptionResNetv2N
MLPMixerMLPMixerN
MobileNetv1MobileNetv1N
MobileNetv2MobileNetv2N
MobileNetv3MobileNetv3N
MNASNetMNASNetN
ResMLPResMLPN
ResNetResNetY
ResNeXtResNeXtY
SqueezeNetSqueezeNetY
XceptionXceptionN
WideResNetWideResNetY
VGGVGGY
Vision TransformerViTY

Other Models

Model NameConstructorPre-trained?
UNetUNetN
diff --git a/dev/tutorials/pretrained/index.html b/dev/tutorials/pretrained/index.html index 349455a0..43c0e541 100644 --- a/dev/tutorials/pretrained/index.html +++ b/dev/tutorials/pretrained/index.html @@ -268,4 +268,4 @@ logitcrossentropy(m(x), y) end; state, model = Optimisers.update(state, model, gs); -end +end diff --git a/dev/tutorials/quickstart/index.html b/dev/tutorials/quickstart/index.html index 8ee89f53..f3969b08 100644 --- a/dev/tutorials/quickstart/index.html +++ b/dev/tutorials/quickstart/index.html @@ -3,4 +3,4 @@ model = ResNet(18);

The API reference contains the documentation and options for each model function. These models also support the option for loading pre-trained weights from ImageNet.

Note

Metalhead is still under active development and thus not all models have pre-trained weights supported. While we are working on expanding the footprint of the pre-trained models, if you would like to help contribute model weights yourself, please check out the contributing guide guide.

To use a pre-trained model, just instantiate the model with the pretrain keyword argument set to true:

using Metalhead
   
-model = ResNet(18; pretrain = true);

Refer to the pretraining guide for more details on how to use pre-trained models.

More model configuration options

For users who want to use more options for model configuration, Metalhead provides a "mid-level" API for models. These are the model functions that are in lowercase such as Metalhead.resnet or Metalhead.mobilenetv3. End-users who want to experiment with model architectures should use these functions. These models do not support the option for loading pre-trained weights from ImageNet out of the box, although one can always load weights explicitly using the loadmodel! function from Flux.

To use any of these models, check out the docstrings for the model functions (these are documented in the API reference). Note that these functions typically require more configuration options to be passed in, but offer a lot more flexibility in terms of model architecture. Metalhead defines as many default options as possible so as to make it easier for the user to pick and choose specific options to customise.

Builders for the advanced user

For users who want the ability to customise their models as much as possible, Metalhead offers a powerful low-level interface. These are known as builders and allow the user to hack into the core of models and build them up as per their liking. Most users will not need to use builders since a large number of configuration options are exposed at the mid-level API. However, for package developers and users who want to build customised versions of their own models, the low-level API provides the customisability required while still reducing user code.

+model = ResNet(18; pretrain = true);

Refer to the pretraining guide for more details on how to use pre-trained models.

More model configuration options

For users who want to use more options for model configuration, Metalhead provides a "mid-level" API for models. These are the model functions that are in lowercase such as Metalhead.resnet or Metalhead.mobilenetv3. End-users who want to experiment with model architectures should use these functions. These models do not support the option for loading pre-trained weights from ImageNet out of the box, although one can always load weights explicitly using the loadmodel! function from Flux.

To use any of these models, check out the docstrings for the model functions (these are documented in the API reference). Note that these functions typically require more configuration options to be passed in, but offer a lot more flexibility in terms of model architecture. Metalhead defines as many default options as possible so as to make it easier for the user to pick and choose specific options to customise.

Builders for the advanced user

For users who want the ability to customise their models as much as possible, Metalhead offers a powerful low-level interface. These are known as builders and allow the user to hack into the core of models and build them up as per their liking. Most users will not need to use builders since a large number of configuration options are exposed at the mid-level API. However, for package developers and users who want to build customised versions of their own models, the low-level API provides the customisability required while still reducing user code.