Skip to content

Marian v1.11.0

Compare
Choose a tag to compare
@emjotde emjotde released this 08 Feb 16:52
· 77 commits to master since this release

[1.11.0] - 2022-02-08

Added

  • Parallelized data reading with e.g. --data-threads 8
  • Top-k sampling during decoding with e.g. --output-sampling topk 10
  • Improved mixed precision training with --fp16
  • Set FFN width in decoder independently from encoder with e.g. --transformer-dim-ffn 4096 --transformer-decoder-dim-ffn 2048
  • Adds option --add-lsh to marian-conv which allows the LSH to be memory-mapped.
  • Early stopping based on first, all, or any validation metrics via --early-stopping-on
  • Compute 8.6 support if using CUDA>=11.1
  • Support for RMSNorm as drop-in replace for LayerNorm from Biao Zhang; Rico Sennrich (2019). Root Mean Square Layer Normalization. Enabled in Transformer model via --transformer-postprocess dar instead of dan.
  • Extend suppression of unwanted output symbols, specifically "\n" from default vocabulary if generated by SentencePiece with byte-fallback. Deactivates with --allow-special
  • Allow for fine-grained CPU intrinsics overrides when BUILD_ARCH != native e.g. -DBUILD_ARCH=x86-64 -DCOMPILE_AVX512=off
  • Adds custom bias epilogue kernel.
  • Adds support for fusing relu and bias addition into gemms when using cuda 11.
  • Better suppression of unwanted output symbols, specifically "\n" from SentencePiece with byte-fallback. Can be deactivated with --allow-special
  • Display decoder time statistics with marian-decoder --stat-freq 10 ...
  • Support for MS-internal binary shortlist
  • Local/global sharding with MPI training via --sharding local
  • fp16 support for factors.
  • Correct training with fp16 via --fp16.
  • Dynamic cost-scaling with --cost-scaling.
  • Dynamic gradient-scaling with --dynamic-gradient-scaling.
  • Add unit tests for binary files.
  • Fix compilation with OMP
  • Added --model-mmap option to enable mmap loading for CPU-based translation
  • Compute aligned memory sizes using exact sizing
  • Support for loading lexical shortlist from a binary blob
  • Integrate a shortlist converter (which can convert a text lexical shortlist to a binary shortlist) into marian-conv with --shortlist option

Fixed

  • Fix AVX2 and AVX512 detection on MacOS
  • Add GCC11 support into FBGEMM
  • Added pragma to ignore unused-private-field error on elementType_ on macOS
  • Do not set guided alignments for case augmented data if vocab is not factored
  • Various fixes to enable LSH in Quicksand
  • Added support to MPIWrappest::bcast (and similar) for count of type size_t
  • Adding new validation metrics when training is restarted and --reset-valid-stalled is used
  • Missing depth-scaling in transformer FFN
  • Fixed an issue when loading intgemm16 models from unaligned memory.
  • Fix building marian with gcc 9.3+ and FBGEMM
  • Find MKL installed under Ubuntu 20.04 via apt-get
  • Support for CUDA 11.
  • General improvements and fixes for MPI handling, was essentially non-functional before (syncing, random seeds, deadlocks during saving, validation etc.)
  • Allow to compile -DUSE_MPI=on with -DUSE_STATIC_LIBS=on although MPI gets still linked dynamically since it has so many dependencies.
  • Fix building server with Boost 1.75
  • Missing implementation for cos/tan expression operator
  • Fixed loading binary models on architectures where size_t != uint64_t.
  • Missing float template specialisation for elem::Plus
  • Broken links to MNIST data sets
  • Enforce validation for the task alias in training mode.

Changed

  • MacOS marian uses Apple Accelerate framework by default, as opposed to openblas/mkl.
  • Optimize LSH for speed by treating is as a shortlist generator. No option changes in decoder
  • Set REQUIRED_BIAS_ALIGNMENT = 16 in tensors/gpu/prod.cpp to avoid memory-misalignment on certain Ampere GPUs.
  • For BUILD_ARCH != native enable all intrinsics types by default, can be disabled like this: -DCOMPILE_AVX512=off
  • Moved FBGEMM pointer to commit c258054 for gcc 9.3+ fix
  • Change compile options a la -DCOMPILE_CUDA_SM35 to -DCOMPILE_KEPLER, -DCOMPILE_MAXWELL,
    -DCOMPILE_PASCAL, -DCOMPILE_VOLTA, -DCOMPILE_TURING and -DCOMPILE_AMPERE
  • Disable -DCOMPILE_KEPLER, -DCOMPILE_MAXWELL by default.
  • Dropped support for legacy graph groups.
  • Developer documentation framework based on Sphinx+Doxygen+Breathe+Exhale
  • Expresion graph documentation (#788)
  • Graph operators documentation (#801)
  • Remove unused variable from expression graph
  • Factor groups and concatenation: doc/factors.md