All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog and this project adheres to Semantic Versioning.
- Option
--pretrained-model
to be used for network weights initialization with a pretrained model - Version number saved in the model file
- CMake option
-DCOMPILE_SERVER=ON
- Right-to-left training, scoring, decoding with
--right-left
- Fixed marian-server compilation with Boost 1.66
- Fixed compilation on g++-4.8.4
- Fixed compilation without marian-server if openssl is not available
- Added back gradient-dropping
- Fixed parameters initialization for
--tied-embeddings
during translation
- Fixed ensembling with language model and batched decoding
- Fixed attention reduction kernel with large matrices (added missing
syncthreads()
), which should fix stability with large batches and beam-size during batched decoding.
- Option
--max-length-crop
to be used together with--max-length N
to crop sentences to length N rather than omitting them. - Experimental model with convolution over input characters
- Fixed a number of bugs for vocabulary and directory handling
- Batched translation for all model types, significant translation speed-up
- Batched translation during validation with translation
--maxi-batch-sort
option formarian-decoder
- Support for CUBLAS_TENSOR_OP_MATH mode for cublas in cuda 9.0
- The "marian-vocab" tool to create vocabularies
- Multi-gpu validation, scorer and in-training translation
- summary-mode for scorer
- New "transformer" model based on Attention is all you need
- Options specific for the transformer model
- Linear learning rate warmup with and without initial value
- Cyclic learning rate warmup
- More options for learning rate decay, including: optimizer history reset, repeated warmup
- Continuous inverted square root decay of learning (
--lr-decay-inv-sqrt
) rate based on number of updates - Exposed optimizer parameters (e.g. momentum etc. for Adam)
- Version of deep RNN-based models compatible with Nematus (
--type nematus
) - Synchronous SGD training for multi-gpu (enable with
--sync-sgd
) - Dynamic construction of complex models with different encoders and decoders, currently only available through the C++ API
- Option
--quiet
to suppress output to stderr - Option to choose different variants of optimization criterion: mean cross-entropy, perplexity, cross-entropy sum
- In-process translation for validation, uses the same memory as training
- Label Smoothing
- CHANGELOG.md
- CONTRIBUTING.md
- Swish activation function default for Transformer (https://arxiv.org/pdf/1710.05941.pdf)
- Changed shape organization to follow numpy.
- Changed option
--moving-average
to--exponential-smoothing
and inverted formula tos_t = (1 - \alpha) * s_{t-1} + \alpha * x_t
,\alpha
is now1-e4
by default - Got rid of thrust for compile-time mathematical expressions
- Changed boolean option
--normalize
to--normalize [arg=1] (=0)
. New behaviour is backwards-compatible and can also be specified as--normalize=0.6
- Renamed "s2s" binary to "marian-decoder"
- Renamed "rescorer" binary to "marian-scorer"
- Renamed "server" binary to "marian-server"
- Renamed option name
--dynamic-batching
to--mini-batch-fit
- Unified cross-entropy-based validation, supports now perplexity and other CE
- Changed
--normalize (bool)
to--normalize (float)arg
, allow to change length normalization weight asscore / pow(length, arg)
.
- Temporarily removed gradient dropping (
--drop-rate X
) until refactoring.