Releases: facebookresearch/fairseq
Releases · facebookresearch/fairseq
v0.7.0
Notable (possibly breaking) changes:
- d45db80: Remove checkpoint utility functions from utils.py into checkpoint_utils.py
- f2563c2: Move LM definitions into separate files
- dffb167: Updates to model API:
FairseqModel
->FairseqEncoderDecoderModel
- add
FairseqDecoder.extract_features
andFairseqDecoder.output_layer
encoder_out_dict
->encoder_out
- rm unused
remove_head
functions
- 34726d5: Move
distributed_init
intoDistributedFairseqModel
- cf17068: Simplify distributed launch by automatically launching multiprocessing on each node for all visible GPUs (allows launching just one job per node instead of one per GPU)
- d45db80: Change default LR scheduler from
reduce_lr_on_plateau
tofixed
- 96ac28d: Rename
--sampling-temperature
->--temperature
- fc1a19a: Deprecate dummy batches
- a1c997b: Add memory mapped datasets
- 0add50c: Allow cycling over multiple datasets, where each one becomes an "epoch"
Plus many additional features and bugfixes
v0.6.2
Changelog:
- 998ba4f: Add language models from Baevski & Auli (2018)
- 4294c4f: Add mixture of experts code from Shen et al. (2019)
- 0049349: Add example for multilingual training
- 48d9afb: Speed improvements, including fused operators from apex
- 44d27e6: Add Tensorboard support
- d17fa85: Add Adadelta optimizer
- 9e1c880: Add
FairseqEncoderModel
- b65c579: Add
FairseqTask.inference_step
to modularize generate.py - 2ad1178: Add back
--curriculum
- Misc bug fixes and other features
v0.6.1
v0.6.0
Changelog:
- 4908863: Switch to DistributedDataParallelC10d and bump version 0.5.0 -> 0.6.0
- no more FP16Trainer, we just have an FP16Optimizer wrapper
- most of the distributed code is moved to a new wrapper class called DistributedFairseqModel, which behaves like DistributedDataParallel and a FairseqModel at the same time
- Trainer now requires an extra dummy_batch argument at initialization, which we do fwd/bwd on when there's an uneven number of batches per worker. We hide the gradients from these dummy batches by multiplying the loss by 0
- Trainer.train_step now takes a list of samples, which will allow cleaner --update-freq
- 1c56b58: parallelize preprocessing
- Misc bug fixes and features
v0.5.0: 0.4.0 -> 0.5.0
Changelog: - 97b58b4: add Transformer model from Vaswani et al. (2017) - b2374e5: faster Transformer inference with improved caching - 2d27ae0: simulate large mini-batch training with delayed updates (`--update-freq`) - 7ee1d28: add FP16 training support (`--fp16`) - 2a84f46: faster inference by removing completed sentences from the batch - 663fd80: batched interactive generation - 4c2ef2d: add language modeling / gated convolutional model from Dauphin et al. (2017) - b59815b: add Hierarchical Neural Story Generation model from Fan et al. (2018) - ff68a9e: add FairseqTask to modularize task definitions (e.g., translation, language modeling)