-
Notifications
You must be signed in to change notification settings - Fork 4.3k
Parameters And Constants
This page has migrated to our new site. Please update any bookmarks.
Creates a scalar, vector, matrix, or tensor of learnable parameters.
ParameterTensor {shape,
init='uniform'/*|heNormal|...*/, initOutputRank=1, initValueScale=1.0,
randomSeed=-1,
initValue=0.0, initFromFilePath='',
learningRateMultiplier=1.0}
-
shape
: shape (dimensions) of parameter as an array. E.g.(13:42)
to create a matrix with 13 rows and 42 columns. For some operations, dimensions given as 0 are automatically inferred (see here) -
init
(default 'uniform'): specifies random initialization, e.g.init='heNormal'
(see here) -
initOutputRank
(default 1): specifies number of leading fan-out axes. If negative, -number of trailing fan-out axes (see here) -
initValueScale
(default 1): additional scaling factor applied to random initialization values -
randomSeed
(default -1): if positive, use this random seed for random initialization. If negative, use a counter that gets increased for eachParameterTensor{}
-
initValue
: specifies initialization with a constant value, e.g.initValue=0
-
initFromFilePath
: specifies initialization by loading initial values from a file. E.g.initFromFilePath="my_init_vals.txt"
-
learningRateMultiplier
: system learning rate will be scaled by this (0 to disable learning) (see here)
A tensor of learnable parameters.
This factory function creates a scalar, vector, matrix or tensor of learnable parameters, that is,
a tensor that is recognized by the "train"
action as containing parameters
that shall be updated during training.
The values will be initialized, depending on which optional parameter is given, to
- random numbers, if
init
is given; - a constant if
initValue
is given; or - a tensor read from an external input file if
initFromFilePath
is given. The default isinit="uniform"
.
To create a scalar, vector, matrix, or tensor with rank>2, pass the following as the shape
parameter:
-
(1)
for a scalar; -
(M)
for a column vector withM
elements; -
(1:N)
for a row vector withN
elements. Row vectors are one-row matrices; -
(M:N)
for a matrix withN
rows andI
columns; -
(I:J:K...)
for a tensor of arbitrary rank>2 (note: the maximum allowed rank is 12); and -
(W:H:C)
for a tensor that matches the dimensions of an[W x H]
image withC
channels.
When a ParameterTensor
is used for weights as an immediate input of specific operations, it is allowed to specify
some dimensions as Inferred
. For example, the matrix product ParameterTensor{42:Inferred} * x)
will automatically infer the second dimension to be equal to the dimension of x
.
This is extremely handy for inputs of layers, as it frees the user's BrainScript code from the burden of passing around the input dimensions. Further, in some situations it is very cumbersome to determine the precise input dimensions of a layer, for example for the first fully connected layer on top of a pyramid of convolution/pooling combinations without padding, where each convolution and pooling operation may drop rows or columns of boundary pixels, and strides scale the dimensions.
This feature is what allows CNTK's predefined layers to be specified by their output dimension only
(e.g. DenseLayer
{1024}
).
Random initialization is selected by the init
parameter,
which chooses between uniform and normal distribution,
where the range/standard deviation is computed as a function of fan-in and fan-out:
value of init
|
distribution | range/standard deviation |
---|---|---|
'heNormal' | normal | sqrt (2 / fanIn) |
'heUniform' | uniform | sqrt (6 / fanIn) |
'glorotNormal' | normal | sqrt (2 / (fanIn+fanOut)) |
'glorotUniform' | uniform | sqrt (6 / (fanIn+fanOut)) |
'xavier' | uniform | sqrt (3 / fanIn) |
'uniform' | uniform | 1/20 |
'gaussian' | normal | sqrt (0.04 / fanIn) |
'zero' | n/a | 0 |
(Where zero
is a sometimes convenient alternative to specifying initValue=0
.) For uniform distribution, the parameters will be initialized uniformly in [-range, range]; for normal distribution, the mean is always zero.
Note that the default for init
is uniform
when using ParameterTensor{}
directly.
However, default is glorotUniform
for layers that contain parameters inside, such as DenseLayer{}
and ConvolutionalLayer{}
.
Random initialization assumes that the parameters are part of some form of matrix-product like operation which has a well-defined fan-in and fan-out, which are used in determining the scaling of the random values per above table. By default, the first axis is considered fan-out, and the remaining axis/axes are fan-in, matching semantics of the regular matrix product.
The optional parameter initOutputRank
can be used to specify the number of leading axes that
should be considered fan-out.
For example, for a matrix product in CNTK's extended tensor interpretation that maps a [K]
-dimensional vector x
to a [I x J]
-dimensional rank-2 object can be written as Times
(W, x, outputRank=2)
,
where W
has the shape [I x J x K]
.
Here, initOutputRank=2
specifies that in scaling the random initialization values,
the fan-out is I*J
and the fan-in K
.
Negative values for initOutputRank
indicate that the fan-out axes are trailing axes. For example,
the filter kernel of the ConvolutionalLayer{}
and the underlying Convolution()
operation
for a typical image-processing setup
has a shape [W x H x C x K]
, where K
is the fan-out, while the fan-in is W*H*C
.
This is specified by initOutputRank=-1
.
The initial values can be read from a text file. To do this, pass a pathname for the optional
parameter initFromFilePath
.
The text file is expected to consist of one line per matrix rows, which consist of space-separated numbers, one per column.
The row and column dimensions in the file must match shape
.
Parameter-specific learning rates can be realized with the optional learningRateMultiplier
parameter.
This factor is multiplied with the actual learning rate when performing parameter updates.
For example, if specified as 0, the parameter will not be updated, it is constant.
A regular parameter matrix that will be initialized as heUniform
(default would be heNormal
):
W = ParameterTensor {(outDim:inDim), init=`heUniform`}
A regular bias parameter that will be initialized as zero:
b = ParameterTensor {outDim, initValue=0}
An embedding matrix that should be read from a file and kept constant:
E = ParameterTensor {(embedDim:vocabSize),
initFromFilePath='./EmbeddingMatrix.txt',
learningRateMultiplier=0} # prevents learning
A bias parameter of the full size of an [width x height]
-size image with numChannels
color planes:
bFull = ParameterTensor {(width:height:numChannels)}
Create a constant tensor.
Constant {scalarValue, rows=1, cols=1}
-
scalarValue
: value of this constant -
rows
(default: 1): number of rows, if constant is not a scalar -
cols
(default: 1): number of cols, if constant is not a scalar
A constant, either a scalar or a rank-1 or rank-2 object of dimension [rows x cols]
, where all elements are
filled with scalarValue
.
A constant value. It can be either a scalar, or a rank-1 object (vector) or rank-2 object (matrix) initialized with a single value (such as 0). Note that because for vector and matrix constants all values are identical, constants used in conjunction with element-wise operations can often be specified as a scalar, while taking advantage of broadcasting.
A Constant()
is a ParameterTensor{}
with learningRateMultiplier=0
.
Interpolation between two values with interpolation weight alpha
in range 0..1 ("soft multiplexer"):
SoftMUX (x, y, alpha) = Constant (1-alpha) .* x + Constant (alpha) .* y
Hamming loss (cf. here):
HammingLoss (y, p) = ReduceSum (BS.Boolean.Xor (y, Greater (p, Constant(0.5))))
hl = HammingLoss (multiLabels, probabilities)