Skip to content

Commit

Permalink
Rebasing tpu branch on a more recent fairseq upstream commit (#19)
Browse files Browse the repository at this point in the history
TPU specific changes [here](https://gist.github.com/taylanbil/150abd31b1fbf5c91ca90ef5a4d79f08)

The rest is rebasing on a more current fairseq upstream commit.

---


* v0.7.1 -> v0.7.2 (#891)

Summary:
No major API changes since the last release. Cutting a new release since we'll be merging significant (possibly breaking) changes to logging, data loading and the masked LM implementation soon.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/891

Differential Revision: D16377132

Pulled By: myleott

fbshipit-source-id: f1cb88e671ccd510e53334d0f449fe18585268c7

* Switch to torch.nn.functional.gelu when available

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/735

Differential Revision: D16377046

Pulled By: myleott

fbshipit-source-id: 9725d4a3ce6b2fc8cee0b1d1cb8921f9d59c551a

* Improve interactive generation (support --tokenizer and --bpe)

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/734

Differential Revision: D16377044

Pulled By: myleott

fbshipit-source-id: 37d5553d76aa7c653113fec089f59710281c31d7

* Store task in the criterion base class

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/737

Differential Revision: D16377805

Pulled By: myleott

fbshipit-source-id: 1e090a02ff4fbba8695173f57d3cc5b88ae98bbf

* Create standalone label_smoothed_nll_loss

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/739

Differential Revision: D16377798

Pulled By: myleott

fbshipit-source-id: 20047c80de2e6f108269ace4ae3eec906a5920dd

* Allow not specifying --warmup-init-lr

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/736

Differential Revision: D16378001

Pulled By: myleott

fbshipit-source-id: 2907f63bcbf7068ceaa48b00096040fa2639e569

* Rename _load_model_ensemble -> load_model_ensemble_and_task

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/738

Differential Revision: D16377803

Pulled By: myleott

fbshipit-source-id: 6beb2f78e7464b70ff65a965d2b747cdca0ca951

* Rename data.transforms -> data.encoders

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/747

Differential Revision: D16403464

Pulled By: myleott

fbshipit-source-id: ee3b4184f129a02be833c7bdc00685978b4de883

* Fix topp sampling issues (#882)

Summary:
Two issues here:

1. `last_included` should be the last included index `cumsum_mask[:, :, -1:]` instead of `cumsum_mask[:, :, :1]`  (which is either 0 or 1);

2. If `--no-repeat-ngram-size` is set, the sum of `probs` may less than 1, we need to re-normalize to make it a valid probability distribution

The following code can reproduce this issues:

```
import torch
import numpy as np

def _sample_topp(probs):

    # =====  Code from  fairseq/search.py _sample_topp ======

    # sort the last dimension (vocab dimension) in descending order
    sorted_probs, sorted_indices = probs.sort(descending=True)

    # compute a mask to indicate the words to be included in the top-P set.
    cumsum_probs = sorted_probs.cumsum(dim=2)
    mask = cumsum_probs.lt(sampling_topp)

    # note that mask was computed by 'lt'. One more word needs to be included
    # so that the cumulative probability mass can exceed p.
    cumsum_mask = mask.cumsum(dim=2)
    last_included = cumsum_mask[:, :, :1]
    mask = mask.scatter_(2, last_included, 1)

    # truncate unnecessary dims.
    max_dim = last_included.max()
    truncated_mask = mask[:, :, :max_dim + 1]
    truncated_probs = sorted_probs[:, :, :max_dim + 1]
    truncated_indices = sorted_indices[:, :, :max_dim + 1]

    # trim the words that are not in top-P by setting their probabilities
    # to 0, so that they would not be sampled later.
    trim_mask = 1 - truncated_mask
    trimed_probs = truncated_probs.masked_fill_(trim_mask, 0)
    return trimed_probs, truncated_indices

    # ========================================================

if __name__ == '__main__':
    np.random.seed(1234)
    torch.manual_seed(1234)

    sampling_topp = 0.9
    probs = torch.softmax(torch.randn(1, 1, 10), dim=-1)
    # probs = tensor([0.0545, 0.0779, 0.0189, 0.0647, 0.0282, 0.0862, 0.0656, 0.1041, 0.0399, 0.4600])
    print('probs =', probs[0][0])

    trimed_probs, truncated_indices = _sample_topp(probs)

    cum_probs = trimed_probs.cumsum(dim=-1)[0][0]
    # cumsum = tensor([0.4600, 0.5641])
    print('cumsum =', cum_probs)
    # Will throw AssertionError
    assert float(cum_probs[-1]) >= sampling_topp

```
Pull Request resolved: https://github.com/pytorch/fairseq/pull/882

Differential Revision: D16409269

Pulled By: xingz9

fbshipit-source-id: 94b1122eed50c656057b64e22af6f4a6ea7a68af

* Default to mmap and infer dataset implementations automatically

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/751

Differential Revision: D16410989

Pulled By: myleott

fbshipit-source-id: ddbbee49756f9ff6c4487977a3f5d2259b7abafe

* Update GPT-2 BPE

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/749

Differential Revision: D16410984

Pulled By: myleott

fbshipit-source-id: 7698df46b8a179afccb287990f9705358690454a

* Misc improvements to torch hub interface

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/750

Differential Revision: D16410986

Pulled By: myleott

fbshipit-source-id: 8ee6b4371d6ae5b041b00a54a6039a422345795e

* Move Masked LM components to legacy/ -- new ones are coming

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/740

Differential Revision: D16377797

Pulled By: myleott

fbshipit-source-id: f7d6c8b00a77e279ea94376b1f0fcd15087eaf5f

* Add fallback for SLURM config

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/752

Differential Revision: D16417582

Pulled By: myleott

fbshipit-source-id: 6b4289febcf9290452bb91f1f2181a02c09c82a7

* Fix --reset-meters

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/756

Differential Revision: D16418302

Pulled By: myleott

fbshipit-source-id: 62495a0bff41d1741e2b09807a3b43ff2c66c8fb

* Simplify hubconf

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/758

Differential Revision: D16418932

Pulled By: myleott

fbshipit-source-id: 59f005164b61b9fa712922eeb23525f7eec38f38

* Add new Datasets

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/757

Differential Revision: D16418305

Pulled By: myleott

fbshipit-source-id: 25f293a2792509f7a75c688e4bf8cff02e6bba2e

* Add new Masked LM task + criterion

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/761

Differential Revision: D16421335

Pulled By: myleott

fbshipit-source-id: 257d92c2b90361147642e2baa38486b4d18f6297

* Implement sparse transformer fixed attention pattern (#804)

Summary:
Pull Request resolved: https://github.com/facebookresearch/pytext/pull/804

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/746

Pull Request resolved: https://github.com/pytorch/fairseq/pull/894

Adding an implementation of the sparse transformer to multi-head attention using the fixed attention pattern specified https://arxiv.org/pdf/1904.10509.pdf. The sparse_mask masks out words using -inf; after softmax, -inf becomes 0. Thus, a mask does not need to be re-calculated and re-applied when multiplying attn_weights and values.

Four inputs are added to the config: sparse, is_bidirectional, stride, expressivity. If we are using the sparse transformer, is_bidirectional, stride, and expressivity must be specified (there are defaults). If is_bidirectional is False, the mask values using the fixed attention pattern described in the paper. If is_bidirectional is True, subset one includes all values in the current stride window and a summary from every stride window--all other values are masked. Stride (L in the paper) controls the window size and expressivity (c in the paper) controls the size of the summary.

Reviewed By: borguz

Differential Revision: D16042988

fbshipit-source-id: c59166dc7cfe89187a256e4076000c2458842fd5

* Fix read_binarized.py script

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/762

Differential Revision: D16427266

Pulled By: myleott

fbshipit-source-id: 9bd9b8c6b4994ae98a62a37b34d03265bd365453

* Initializing mask as a tensor of ints (not long) (#875)

Summary:
Since mask really is a tensor of ints, this change should be mathematically
equivalent to the base.

On the other hand, this has performance implications for xla, hence the
pull request.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/875

Differential Revision: D16232877

Pulled By: myleott

fbshipit-source-id: e63175ee0016dcf0dfe10e2fd22570b8bbfbde84

* Update README.md

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/899

Differential Revision: D16448602

Pulled By: myleott

fbshipit-source-id: afd1a1b713274b6328150cd85d7f8a81833597aa

* check save_dir before beginning training

Summary: I sadly discovery that my checkpoint directory wasn't globally readable after 8 hours of training. Adding this check at the beginning of train loop to keep that from happening again!

Reviewed By: myleott

Differential Revision: D16455394

fbshipit-source-id: 35959aa058150b2afb63710c468d01ebc8a12b0c

* Update torch.hub usage

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/770

Differential Revision: D16491911

Pulled By: myleott

fbshipit-source-id: 8dd2b76f8fa24183640ae9d1129ea47ded77d43d

* Standardize on 'teacher forcing' rather than 'input feeding' which is… (#769)

Summary:
Input feeding generally refers to a slightly different concept
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/769

Differential Revision: D16491898

Pulled By: myleott

fbshipit-source-id: 68573584e820f11f199db4e7e37e9ee7a69a3287

* Add RoBERTa README

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/778

Differential Revision: D16525447

Pulled By: myleott

fbshipit-source-id: e721e3a10e243a2408a04f89f06b5adbbe2fdff2

* Add return_all_hiddens flag to hub interface

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/909

Differential Revision: D16532919

Pulled By: myleott

fbshipit-source-id: 16ce884cf3d84579026e4406a75ba3c01a128dbd

* Fix compatibility with PyTorch 1.0.x (Fixes #906)

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/910

Differential Revision: D16536532

Pulled By: myleott

fbshipit-source-id: 56bb5570e70b5670ad87c64d9dd20c64c1fa9f5c

* Make hub_utils.generator inherit from nn.Module

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/913

Differential Revision: D16536562

Pulled By: myleott

fbshipit-source-id: ce28642da6868ec884e3e416388a652977a062df

* Misc dataset improvements

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/911

Differential Revision: D16536559

Pulled By: myleott

fbshipit-source-id: 7fe495054ce5b7658b1d3a43eca38c5858360236

* Correctly zero padding index in TransformerSentenceEncoder

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/912

Differential Revision: D16536561

Pulled By: myleott

fbshipit-source-id: 54c5c20a826a14f4e690770e027bcb282acdf911

* Add Adamax optimizer

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/914

Differential Revision: D16536670

Pulled By: myleott

fbshipit-source-id: 8a41c98f0fb87af6c384cdade756e3eae2978a88

* Change default --num-workers to 1

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/779

Differential Revision: D16536673

Pulled By: myleott

fbshipit-source-id: bf56e9a81d3086f3d95a3273391dc5e04ed2dbc4

* Update BPE library code

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/780

Differential Revision: D16537567

Pulled By: myleott

fbshipit-source-id: 4e18c529959935e82ea122c3a2ee477308ffcbe3

* Add RoBERTa

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/916

Differential Revision: D16537774

Pulled By: myleott

fbshipit-source-id: 86bb7b1913a428ee4a21674cc3fc7b39264067ec

* Add instructions to load RoBERTa models on PyTorch 1.0

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/921

Differential Revision: D16541025

Pulled By: myleott

fbshipit-source-id: bb78d30fe285da2adfc7c4e5897ee01fa413b2e4

* Fix RoBERTa model import (fixes #918)

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/920

Differential Revision: D16540932

Pulled By: myleott

fbshipit-source-id: b64438ad8651ecc8fe8904c5f69fa6111b4bed64

* Add missing files for RoBERTa hub interface

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/923

Differential Revision: D16541289

Pulled By: myleott

fbshipit-source-id: b3563a9d61507d4864ac6ecf0648672eaa40b5f3

* Update README.md to add top-p sampling (#783)

Summary:
Update README.md to include the recently implemented top-p/nucleus sampling.
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/783

Differential Revision: D16543974

Pulled By: myleott

fbshipit-source-id: 27c502af10ee390d29607038118a99ff0067aec4

* Support different --max-positions and --tokens-per-sample

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/924

Differential Revision: D16548165

Pulled By: myleott

fbshipit-source-id: 49569ece3e54fad7b4f0dfb201ac99123bfdd4f2

* adding glue data preprocessing scripts (#771)

Summary:
1) Added glue data pre-processing script.
2) updated README with usage.

TODO:
1) releasing fairseq dictionary and remove hardcoded path.
2) remove hard-coded path for bpe-encoding,

myleott what do you recommend for above TODOs?
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/771

Reviewed By: myleott

Differential Revision: D16547679

Pulled By: myleott

fbshipit-source-id: 6a6562d9b6215523d048fdf3daee63ffac21e231

* Fix tokenization (fixes #926) (#929)

Summary:
Fixes https://github.com/pytorch/fairseq/issues/926
Pull Request resolved: https://github.com/pytorch/fairseq/pull/929

Differential Revision: D16560281

Pulled By: myleott

fbshipit-source-id: 751051bcdbf25207315bb05f5bee0235d21be627

* Relicense fairseq under MIT license (#786)

Summary:
The previous BSD+PATENTS license was controversial. We have been
approved to relicense fairseq under the MIT license.
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/786

Differential Revision: D16560654

Pulled By: myleott

fbshipit-source-id: f78b1beb4f2895dd7b9bfc79f5f952a2bfb94034

* 1) replaced fstring 2) fixed error from max-positions arg

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/787

Differential Revision: D16562052

fbshipit-source-id: 640e30b2378ec917d60092558d3088a77f9741cb

* Add roberta.decode to hub interface to decode BPE (#931)

Summary:
Fixes https://github.com/pytorch/fairseq/issues/930.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/931

Differential Revision: D16562511

Pulled By: myleott

fbshipit-source-id: c4c07e2f067326b79daa547dcb3db84aeddbd555

* Wmt19 models (#767)

Summary:
Release of the WMT 19 pretrained models
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/767

Reviewed By: edunov

Differential Revision: D16472717

Pulled By: nng555

fbshipit-source-id: acf0fa3548c33f2bf2b5f71e551c782ad8c31a42

* Use commandline interface in preprocess_GLUE_tasks.sh (#937)

Summary:
Just a small fix for issue https://github.com/pytorch/fairseq/issues/936 .
Pull Request resolved: https://github.com/pytorch/fairseq/pull/937

Differential Revision: D16580263

Pulled By: myleott

fbshipit-source-id: 1777e782491c63697726e95bd555892da3fed4ec

* Update language_model README.md (#941)

Summary:
Adding a backslash in the convolutional language model training usage.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/941

Differential Revision: D16581388

Pulled By: myleott

fbshipit-source-id: 7e2e05ecf13e86cb844dc5200d49f560c63b12ff

* Roberta add classification finetuning example readme (#790)

Summary:
Added readme for IMDB classification as tutorial for custm finetuning of roberta
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/790

Reviewed By: myleott

Differential Revision: D16587877

Pulled By: myleott

fbshipit-source-id: ed265b7254e6fa2fc8a899ba04c0d2bb45a7f5c4

* Fix citation errors (#791)

Summary:
Fixing booktitle in wmt19 citation
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/791

Reviewed By: myleott

Differential Revision: D16589372

Pulled By: nng555

fbshipit-source-id: 28402784bb6ef0615e46b8d8383bfa52d79e46de

* Fix small syntax error in hub_utils.py (fixes #942)

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/944

Differential Revision: D16593568

Pulled By: myleott

fbshipit-source-id: 611bccae2ad0b8dc704c47a8a3343161010c2356

* Update PyTorch Hub interface

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/782

Differential Revision: D16542256

Pulled By: myleott

fbshipit-source-id: ea3279e7a1ce4687a5914f32b76787c419be1ffa

* Fix sampling with beam>1

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/792

Differential Revision: D16591987

Pulled By: myleott

fbshipit-source-id: d27c490ae75f80ded19226b8384f4776485dd694

* Changed tensor comparison return type from uint8 to bool (#21113)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21113
ghimport-source-id: 9c4ba63457a72bfc41894387e0b01be3fd9a9baf

Test Plan: Imported from OSS

Differential Revision: D15552204

Pulled By: izdeby

fbshipit-source-id: a608213668649d058e22b510d7755cb99e7d0037

* Add more details for bulk BPE encoding

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/793

Differential Revision: D16603930

Pulled By: myleott

fbshipit-source-id: b302db3743db4f36c14fb0dc7f3456fe8a0079dd

* Use ==/!= to compare str, bytes, and int literals (#948)

Summary:
Identity is not the same thing as equality in Python.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/948

Differential Revision: D16608269

Pulled By: myleott

fbshipit-source-id: be203d62e7824c96c59400d1b342196adb89a839

* Fix wmt19 links (#796)

Summary:
fix links to .tar.gz vs .tar.bz2
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/796

Reviewed By: myleott

Differential Revision: D16611740

Pulled By: nng555

fbshipit-source-id: 76210484225ed917ff14ef626845680d918948f5

* Update beam search code to support torch.bool change

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/797

Differential Revision: D16617067

Pulled By: myleott

fbshipit-source-id: 52e3aeb98d6e3b55ff9154b784028bf13eabfe38

* Update READMEs for torch.hub

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/795

Differential Revision: D16620488

Pulled By: myleott

fbshipit-source-id: 1998a9ccd8816fc7f590861fb4898f910a36bc1e

* Add single-models for WMT'19 for hub tutorial

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/800

Differential Revision: D16621509

Pulled By: myleott

fbshipit-source-id: d3e8e97d30bcafbc35c3f67cd8bbc657b6fa5fe7

* Fewer torch.hub requirements (#959)

Summary:
We will raise exceptions if these are needed and aren't available. Only keep minimum set of reqs
Pull Request resolved: https://github.com/pytorch/fairseq/pull/959

Differential Revision: D16623304

Pulled By: myleott

fbshipit-source-id: 8e65253742e393b527e8396a9433e64ebec9bb55

* Avoid cast in PositionalEmbeddings to fix BLEU drop in pytorch native export

Summary:
Tracing mode doesn't generalize correctly in positional embedding calculation, which caused -5 BLEU at transformer export when using pytorch native.

Details: The original issue was that in ensemble_export, _to_tensor(x) in scripting mode turns integer x into 1-d tensor torch.tensor([x]), not 0-d tensor (scalar x) which is expected in the embedding. So the return value in embedding forward() is actually of wrong shape. When self.weights is of size [x,y], the return value should be (bsz, y, 1) but it was (bsz, 1, y), which caused problem in downstream computation. Tracing only becomes an issue when I used pos = timestep.view(-1)[0] to fix the shape. Then casting the scalar to primary int, to be used as index is not generalizable by tracing mode. Thus I need to convert everything to tensor and replace the advanced indexing with index_select operator.

In summary, less understood features in both scripting&tracing sides caused the bleu drop. :)

Reviewed By: myleott

Differential Revision: D16623025

fbshipit-source-id: 0c7a2c3eafbd774760a5c880c6034009ee084abb

* Fix generating with a fixed prefix

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/801

Differential Revision: D16628318

Pulled By: myleott

fbshipit-source-id: 50e93bb9108afd2ba90f1edd4f34306a7c9964a4

* remove default params from args so architecture works properly

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/798

Reviewed By: myleott

Differential Revision: D16619502

Pulled By: alexeib

fbshipit-source-id: af20c90c4522458850d8f42cab001259ef4293cc

* Add doc string for Roberta.encode function

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/969

Differential Revision: D16642388

Pulled By: myleott

fbshipit-source-id: c5b1655dbddb697822feefa433f33f6bb08253ab

* fixed roberta finetuning with --find-unused-parameters on multiGPU

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/806

Differential Revision: D16649933

fbshipit-source-id: 6eeda6e2caf8019228e3efc0c27ddfcc3c4d8674

* Add back set_epoch functionality lost in RoBERTa merge

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/982

Differential Revision: D16668353

Pulled By: myleott

fbshipit-source-id: 699243d6c028c47cd0e3f801d89051b3f919b17e

* Add code to realign RoBERTa features to word-level tokenizers

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/805

Differential Revision: D16670825

Pulled By: myleott

fbshipit-source-id: 872a1a0274681a34d54bda00bfcfcda2e94144c6

* Fix tests and GLUE finetuning (fixes #989)

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/991

Differential Revision: D16687970

Pulled By: myleott

fbshipit-source-id: d877fc16891a8ab97aec47a8d440baa56c2b5f46

* Added mask_fill api and some examples in README (#807)

Summary:
1) This currently works only for single `<mask>` token as multi mask, we might have to look more into order of factorization.
2) This is currently only for single BPE token
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/807

Differential Revision: D16674509

fbshipit-source-id: 0a020030ee5df6a5115e5f85d5a9ef52b1ad9e1c

* fixed reloading from checkpoint (#811)

Summary:
Tested by starting training from (a) `roberta.large`, (b) `roberta.large.mnli`, (c) `checkpoints/checkpoint_last.pt`
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/811

Reviewed By: myleott

Differential Revision: D16689528

Pulled By: myleott

fbshipit-source-id: 849d72ede9d526c34b4753c1bffd689554d1f837

* Asr initial push (#810)

Summary:
Initial code for speech recognition task.
Right now only one ASR model added - https://arxiv.org/abs/1904.11660

unit test testing:
python -m unittest discover tests

also run model training with this code and obtained
5.0 test_clean | 13.4 test_other
on librispeech with pytorch/audio features
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/810

Reviewed By: cpuhrsch

Differential Revision: D16706659

Pulled By: okhonko

fbshipit-source-id: 89a5f9883e50bc0e548234287aa0ea73f7402514

* Integrate with Apache Arrow/Plasma in-memory store for large datasets (#995)

Summary:
Datasets with many examples can generate very large indexes in TokenBlockDataset (and possibly elsewhere). When using `--num-workers>0` these indexes are pickled and transferred via a multiprocessing pipe, which is slow and can fail if the index grows beyond 4GB (~0.5B examples). Apache Arrow has an in-memory store called Plasma that will offload these arrays to shared memory, which both reduces duplication of the data and avoids needing to pickle.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/995

Differential Revision: D16697219

Pulled By: myleott

fbshipit-source-id: 1b679ee5b3d2726af54ff418f6159a3671173fb8

* replace 'mkdir' with 'mkdir -p' (#997)

Summary:
Allow shell script to create sub directories with -p flag. Amends readme file too.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/997

Differential Revision: D16710813

Pulled By: myleott

fbshipit-source-id: 89abefa27e8fac99d212fc9b7b0dbc3690043ba0

* added superglue dev set results to readme

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/815

Differential Revision: D16733633

fbshipit-source-id: 0a5029e41b6dbb9fb28e9703ad057d939d489d90

* MacOS requires c++ flag (#1000)

Summary:
To install on MacOS, `-stdlib=libc++` needs to be specified.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1000

Differential Revision: D16733819

Pulled By: myleott

fbshipit-source-id: 7a1ed11e2b4e1071e61c64c379c84f72e02ad2b5

* added sentence ranking task and loss (#809)

Summary:
This task and loss are used for sentence ranking and multiple choice tasks such as RACE
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/809

Reviewed By: myleott

Differential Revision: D16715745

Pulled By: jingfeidu

fbshipit-source-id: cb4d1c7b26ebb3e2382449ba51af5745ef56f30f

* Fix Python 3.5 compat

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1005

Differential Revision: D16751489

Pulled By: myleott

fbshipit-source-id: 6e372ac23643e32a3791044c13f4466bdc28f049

* Add WSC task and criterion

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1004

Differential Revision: D16751443

Pulled By: myleott

fbshipit-source-id: f70acd6c7be6d69da45b5b32fe4c4eff021539ab

* Fix torch.hub for MNLI

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1006

Differential Revision: D16753078

Pulled By: myleott

fbshipit-source-id: 970055632edffcce4e75931ed93b42a249120a4a

* Update --restore-file logic (partially fixes #999)

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1007

Differential Revision: D16762490

Pulled By: myleott

fbshipit-source-id: d67137bcf581887850323d188bb4ea643a35ac9e

* Remove LAMB optimizer (at least until we can test it more)

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1008

Differential Revision: D16763315

Pulled By: myleott

fbshipit-source-id: d4bad8384eec273f2d5de4ed29fb8d158ab9187c

* Lint

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/817

Differential Revision: D16762905

Pulled By: myleott

fbshipit-source-id: d920595bec44ed26b72dfc6fbc15c0aa107b4e56

* Minor fixes for RACE finetuning (#818)

Summary:
- remove unnecessary extra spaces in RACE data in preprocessing
- fix finetuning instructions (add `--truncate-sequence` and add `--dropout` params)
- close file handle in SentenceRankingTask
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/818

Differential Revision: D16770055

Pulled By: myleott

fbshipit-source-id: 2c80084e92cdf8692f2ea7e43f7c344c402b9e61

* ignore files starting with . e.g. .ipynb_checkpoints (#819)

Summary:
.ipynb_checkpoints folder in models folders crashed the importlib
now there is a check for this
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/819

Differential Revision: D16772192

Pulled By: myleott

fbshipit-source-id: 01c956aef4ed312bc7645c31c83dbf98af89d931

* fix cosine scheduler docstring

Summary: as title

Reviewed By: myleott

Differential Revision: D16773845

fbshipit-source-id: 2d10e197c31f94d894430559327289a4d03e33f7

* added readme code for inference with GLUE finetuned model

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/820

Differential Revision: D16783469

fbshipit-source-id: d5af8ba6a6685608d67b72d584952b8e43eabf9f

* Add Commonsense QA task

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1014

Differential Revision: D16784120

Pulled By: myleott

fbshipit-source-id: 946c0e33b594f8378e4ab6482ce49efcb36e1743

* Add fairseq-validate

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/765

Differential Revision: D16763357

Pulled By: myleott

fbshipit-source-id: 758b03158e486ee82786e2d5bf4e46073b50c503

* Updates for PyTorch 1.2 masking/bool behavior

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/821

Differential Revision: D16790120

Pulled By: myleott

fbshipit-source-id: 2fb5070172636561d08596a29f08c93df07548bf

* Fix tests

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/822

Differential Revision: D16800078

Pulled By: myleott

fbshipit-source-id: b86e08e01f2fe13c64b77f1d23a5f6800f252bf7

* v0.7.2 -> v0.8.0 (#1017)

Summary:
Changelog:
- Relicensed under MIT license
- Add RoBERTa
- Add wav2vec
- Add WMT'19 models
- Add initial ASR code
- Changed torch.hub interface (`generate` renamed to `translate`)
- Add `--tokenizer` and `--bpe`
- f812e52: Renamed data.transforms -> data.encoders
- 654affc: New Dataset API (optional)
- `47fd985`: Deprecate old Masked LM components
- `5f78106`: Set mmap as default dataset format and infer format automatically
- Misc fixes for sampling
- Misc fixes to support PyTorch 1.2
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1017

Differential Revision: D16799880

Pulled By: myleott

fbshipit-source-id: 45ad8bc531724a53063cbc24ca1c93f715cdc5a7

* Update READMEs

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/823

Differential Revision: D16804995

Pulled By: myleott

fbshipit-source-id: abac5dc0ed6b7bfe2309ba273456e54b37340b2c

* initial light and dynamic convolution kernels (#547)

Summary:
CUDA code for light/dynamicconv kernels, including pytorch modules. Modules can be built by running setup.py in each respective folder, and can then be imported and used like any other module.
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/547

Reviewed By: myleott, shubho

Differential Revision: D15703660

Pulled By: nng555

fbshipit-source-id: e9c913753be3a1cd571965f7200df6678b644520

* added effcient wsc task/criterion for winogrande (#825)

Summary:
1) So far getting `78%`  on winogrande validation dataset comapred to `63.5%` in the paper.
2) Will upgrade readme once everything is finalized.

Questions:

1) Should I just call `binary_wsc_task` instead of `winogrande` to be less specific to dataset and be generic?
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/825

Differential Revision: D16810159

fbshipit-source-id: cfde73561fa4caaaa63a4773c0aecd12ce1fa518

* Update README

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/826

Differential Revision: D16830402

Pulled By: myleott

fbshipit-source-id: 25afaa6d9de7b51cc884e3f417c8e6b349f5a7bc

* Backward reranking public (#667)

Summary:
Implementation of noisy channel model reranking for release with paper
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/667

Reviewed By: michaelauli

Differential Revision: D15901665

Pulled By: nng555

fbshipit-source-id: 2de2c518be8e5828ffad72db3e741b0940623373

* Update README

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/827

Differential Revision: D16833252

Pulled By: myleott

fbshipit-source-id: 8eded8cc651002dfd60869fc2383d305ed335d3a

* BMUF Resetting local state param

Summary:
BMUF
1) Resetting BMUF parameters after warmup.
2) Resetting local param state after warmup.
3) Allowing user to pass block momentum value instead of gpu derived Block Momentum.

Reviewed By: skritika, mrshenli

Differential Revision: D16692026

fbshipit-source-id: d02eaf29d0e4b37007418166ec937d4bf5fe6aca

* added hf bert bpe

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/829

Differential Revision: D16856693

fbshipit-source-id: 545bbf4815f5c40e72a6ed241312a51dc90e34a1

* added check in token block dataset for multiple consecutive blank lines

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/830

Differential Revision: D16861799

fbshipit-source-id: d85deaf78ec5b9c23eafd4145a96252e3901fa22

* implement tri-stage lr_scheduler (#1028)

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1028

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/831

tri-stage lr-scheduler consisted of 3 stages: 1. warmup; 2. hold; 3.
(exponentially) decay; used in https://arxiv.org/pdf/1904.08779.pdf

Reviewed By: myleott

Differential Revision: D16806206

fbshipit-source-id: 40e472ec382449a0fb711f8ee980f14d27d2114a

* Fix bug (the returned value has a dimension mismatch) in label-smoothed-cross-entropy for MoE (#1037)

Summary:
MoE will encounter a dimension mismatch bug when using label-smoothed cross entropy as the criterion, which occurs at [https://github.com/pytorch/fairseq/blob/master/fairseq/tasks/translation_moe.py#L125](url). This is a fix to the bug.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1037

Differential Revision: D16892674

Pulled By: myleott

fbshipit-source-id: a73bc03d2280356667d02422d22ad11d968d0c65

* remove shlex.quote in scripts/spm_train.py (#972)

Summary:
to resolve the issue https://github.com/pytorch/fairseq/issues/971
Pull Request resolved: https://github.com/pytorch/fairseq/pull/972

Differential Revision: D16892827

Pulled By: myleott

fbshipit-source-id: baf277961f1e292f4593eefe31e3541aa9d0d8c4

* add constrains when checking multiple consecutive blank lines (#1031)

Summary:
It will cause runtime error on some standard datasets (e.g. wikitext-103).

Details:
After preprocessing to wikitext-103 folder with current master branch, I use fairseq-train and get the following Error:
```bash
Traceback (most recent call last):
  File "/home/trinkle/.local/bin/fairseq-train", line 11, in <module>
    load_entry_point('fairseq', 'console_scripts', 'fairseq-train')()
  File "/data/git/Transformer/fairseq/fairseq_cli/train.py", line 321, in cli_main
    main(args)
  File "/data/git/Transformer/fairseq/fairseq_cli/train.py", line 46, in main
    task.load_dataset(valid_sub_split, combine=False, epoch=0)
  File "/data/git/Transformer/fairseq/fairseq/tasks/language_modeling.py", line 167, in load_dataset
    break_mode=self.args.sample_break_mode, include_targets=True,
  File "/data/git/Transformer/fairseq/fairseq/data/token_block_dataset.py", line 54, in init
    "Found multiple blank lines in the dataset, please remove them"
AssertionError: Found multiple blank lines in the dataset, please remove them (eg. cat -s raw.txt) and preprocess the data again.
```

It's because these datasets have multiple blank lines. The assertion is added in https://github.com/pytorch/fairseq/commit/851c022610b27da3beaa4e40a6834b5fb3b44f44, however, adding this assertion is not a good way.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1031

Differential Revision: D16892942

Pulled By: myleott

fbshipit-source-id: 90c41b7d98a7b78f506bb57320f9f6b901e05d5b

* Add instructions to resume training from released RoBERTa models (fixes #1034)

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1041

Differential Revision: D16904073

Pulled By: myleott

fbshipit-source-id: 22e5e25a15f7a0b6f2d827d98c953a6cec07610e

* Small fixes

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/835

Differential Revision: D16904038

Pulled By: myleott

fbshipit-source-id: 2c9d0b913f8d688297ac80fcabd905bd1397f66a

* Back out "[fairseq][PR] Fix bug (the returned value has a dimension mismatch) in label-smoothed-cross-entropy for MoE" (#837)

Summary:
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/837

Original commit changeset: a73bc03d2280

Differential Revision: D16904372

fbshipit-source-id: b4c4047b2686ba47258cdf0783059726134c920a

* Fix method has same name as property

Summary:
Training is failing sometimes because `self.collater` can be both method and property for AsrDataset
https://github.com/pytorch/fairseq/issues/1036

Reviewed By: jcai1

Differential Revision: D16919945

fbshipit-source-id: b34ba54e4dae315b7c723996610a348a8e3031af

* Give path when checkpoint can't be found (#1040)

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1040

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/836

Reviewed By: myleott, liezl200

Differential Revision: D16889252

fbshipit-source-id: 45a1b6c1217fb099f0350096e38e1c7d83ea0a64

* vggblock support without pooling and pooling_kernel_size missing self (#839)

Summary:
1) VggBlock was not supported if pooling kernel size was None.
2) Since we modify pooling kernel size by using _pair. We should use self.pooling_kernel_size. But I agree it doesn't matter as pytorch is robust to this.
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/839

Differential Revision: D16934112

Pulled By: okhonko

fbshipit-source-id: b6b95163b0e7f7203d76d535f01a41912382bdc3

* Multiset (#838)

Summary:
Adds ability to tag individual examples with the names of their datasets, along with some minor miscellaneous fixes and improvements
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/838

Differential Revision: D16919175

Pulled By: alexeib

fbshipit-source-id: 4bf493299645bae63f3ee6382e15f18a9f73666c

* Parameterized criterions (#808)

Summary:
Support criterion with parameters, such as AutoSegmentationCriterion (ASG) used in wav2letter which has a transition matrix parameter. This is needed to integrate wav2letter's ASG into PySpeech.

With this diff, parameters in criterions will be:
(1) updated by optimizers, with a configurable learning rate
(2) saved and loaded from checkpoints, preserving backward compatibility for criterions without parameters
(3) synchronized across nodes in distributed training.
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/808

Reviewed By: jcai1

Differential Revision: D16934097

Pulled By: okhonko

fbshipit-source-id: 121ec9382459385c6f9cbef3a8274bec1a434038

* fix string format to work in python 3.5 (#1050)

Summary:
change string fromat in fairseq/data/subsample_dataset.py#20
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1050

Differential Revision: D16946060

Pulled By: okhonko

fbshipit-source-id: 0eabf22e7ffd4f658b6d18c87dc6e59c81a355c7

* Misc changes

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/840

Differential Revision: D16947645

Pulled By: myleott

fbshipit-source-id: e869789bc22bbf5cb08d9adfa44f9fc09b3805af

* Add links to cuda models (#828)

Summary:
Add links to pre-trained cuda models in pay less attention
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/828

Reviewed By: michaelauli

Differential Revision: D16833577

Pulled By: nng555

fbshipit-source-id: 1556aa77fd87ea259812de8ef65963257c370f9b

* Fix year in noisy channel citation (#842)

Summary:
2018->2019
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/842

Differential Revision: D16973530

Pulled By: nng555

fbshipit-source-id: 00207b79821ac0257a53a0581a84582130e1bff5

* wav2vec everstore support

Summary: changes for internal support

Differential Revision: D16646887

fbshipit-source-id: ac5bf6c32901819726249422324eae32a0a6e148

* Cythonize token block dataset (#834)

Summary:
Cythonized token block dataset code, it's `> 100x` faster. Token block for entire `bookwiki+CC+stories+openweb` is just ~`39.9` seconds.

TODO:
1) I think, I can make it 2x more faster.
2) cleanup.

EDIT History:
~~First pass at parellelizing `token_block_dataset`. The code feels somewhat complicated and cluttered.
This is 2-3x faster though on my tests on `bookwiki` dataset with both `complete` and `complete_doc` modes.
myleott Can you take a look for correctness as I am still not 100% sure that I am not missing corner cases.~~
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/834

Test Plan:
Imported from GitHub, without a `Test Plan:` line.

Test workflow: f133816198

Reviewed By: myleott

Differential Revision: D16970257

Pulled By: myleott

fbshipit-source-id: ec45a308193c9e9f3e7075336c15df4723228d6f

* Suppress leaked semaphore warnings

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/844

Differential Revision: D16985131

Pulled By: myleott

fbshipit-source-id: 66ba3b9aa0cdf329a1e38fc09786f34906afdb43

* fix cython dependency in the setup (#847)

Summary:
Fixes broken build for `pytext` https://github.com/pytorch/fairseq/commit/4fc39538aec5141aa41f5d6d7dc0097e7c0f7b48

Earlier version of setup tools required `cython` to be installed before even starting setup.py. This one fixes it.
More details: https://github.com/pypa/setuptools/blob/master/CHANGES.rst#180
and https://stackoverflow.com/questions/37471313/setup-requires-with-cython
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/847

Differential Revision: D16997450

fbshipit-source-id: 5f65026c228a1b94280ca73937078ee3e21ce4f8

* wav2vec everstore support fix

Summary: fixes some merge issues that prevented wav2vec from training properly

Reviewed By: myleott

Differential Revision: D16981120

fbshipit-source-id: cad39aaf2f44daabcbafe7b4e8735d055b3842a7

* installing numpy headers for cython

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/848

Differential Revision: D17060283

fbshipit-source-id: c7e61cae76a0566cc3e2ddc3ab4d48f8dec9d777

* Minor update of README.md of language model example (#1063)

Summary:
With this white space, the command might fail.
```
fairseq-preprocess: error: unrecognized arguments:
zsh: command not found: --destdir
```
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1063

Differential Revision: D17072516

Pulled By: myleott

fbshipit-source-id: 68bb9d05b40b215b18aceac2bff3f5ec1ef2f537

* Minor cleanup for setup.py

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1078

Differential Revision: D17072514

Pulled By: myleott

fbshipit-source-id: 69a8c8c9cc7caa7e04c414329a5d79e6e1a6621c

* use numpy function for filter by size when possible (#845)

Summary:
For general Masked language modeling use-case, this is much faster, (`3 minutes vs 1 sec`).

Let me know what you think about it myleott, if you don't like all the special case checking, we can think of reorganizing the dataset APIs to always have `sizes` as property calculated in `__init__`.
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/845

Reviewed By: myleott

Differential Revision: D16993769

Pulled By: myleott

fbshipit-source-id: 161bba62af2965190c07c47e838ee967cb886e88

* Fix multi-gpu training (fixes #1088)

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1089

Differential Revision: D17108918

Pulled By: myleott

fbshipit-source-id: 818c77a5bbf3b146028991aca64d79b93f144b28

* Adopt Contributor Covenant

Summary:
In order to foster healthy open source communities, we're adopting the
[Contributor Covenant](https://www.contributor-covenant.org/). It has been
built by open source community members and represents a shared understanding of
what is expected from a healthy community.

Reviewed By: josephsavona, danobi, rdzhabarov

Differential Revision: D17104640

fbshipit-source-id: d210000de686c5f0d97d602b50472d5869bc6a49

* set numpy seed explicitly + other minor fixes (#850)

Summary:
not setting the numpy seed explicitly at the beginning was an extremely annoying bug to find. it it caused different gpus to have a different view of data if some randomization was used in the dataset (e.g. subsample dataset)
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/850

Differential Revision: D17085006

Pulled By: alexeib

fbshipit-source-id: 62bb2116369fb703df878e6bc24c06f1ea4e75a0

* add missing colorize dataset

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/851

Differential Revision: D17145769

Pulled By: alexeib

fbshipit-source-id: 9dd26799d044ae5386e8204a129b5e3fc66d6e85

* Improve support for `python setup.py build_ext --inplace`

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/852

Differential Revision: D17147452

Pulled By: myleott

fbshipit-source-id: 5fd9c7da3cc019c7beec98d41db1aef1329ee57a

* Cleaner handling of numpy-based extensions in setup.py

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/853

Differential Revision: D17147879

Pulled By: myleott

fbshipit-source-id: b1f5e838533de62ade52fa82112ea5308734c70f

* fixed numpy based size filtering (#854)

Summary:
This bug got introduced in my [commit](https://github.com/fairinternal/fairseq-py/commit/9624f9651478bcb88022decf7e1b0685b410133b) for fast numpy based size filtering.
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/854

Differential Revision: D17150350

fbshipit-source-id: cb564119543e116d6a17784d1c22e9bce7059a0c

* Fix an error in the command about Hierarchical Neural Story Generation (#1099)

Summary:
When I try to reproduce the experiment in  _Hierarchical Neural Story Generation_, I found the command about generation cannot be executed.

It said that **fairseq-generate: error: unrecognized arguments: --sampling-temperature 0.8**
In the document, I find:
```
--temperature   temperature for generation
Default: 1.0
```
And I don't find a parameter named `--sampling-temperature`, so I think the parameter `--sampling-temperature` should be changed to `--temperature`
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1099

Differential Revision: D17163065

Pulled By: myleott

fbshipit-source-id: 25c430eeee4703f8ec30353825ffec4bb973da0d

* added cython to install_requires

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/856

Reviewed By: myleott

Differential Revision: D17162411

Pulled By: myleott

fbshipit-source-id: e70ecc802398bbba2b5326e9700f2121c422fd18

* Fix multilingual translation bug for to-many case

Summary:
The logic for adding decoder side language token was wrongly implemented.
The way we inject the language token is by replacing the eos symbol with language token symbol. However, the parameter for source / target eos symbol was not set correctly.

Reviewed By: tangyuq

Differential Revision: D17129108

fbshipit-source-id: 6fae385b787370656fd7ca7ab74e6bb91fe5463b

* Return predicted token for RoBERTa filling mask

Summary:
Added the `predicted_token` to each `topk` filled output item

Updated RoBERTa filling mask example in README.md

Reviewed By: myleott

Differential Revision: D17188810

fbshipit-source-id: 5fdc57ff2c13239dabf13a8dad43ae9a55e8931c

* Average local optimizer param after warmup and during bmuf sync

Summary: We have seen that averaging the local param instead of doing reset or broadcast after warmup improves the WER.

Reviewed By: skritika

Differential Revision: D16739278

fbshipit-source-id: 75033d2d25f9a88fd6dd325d0d9d4c856d22d947

* added fast stats sync option (#858)

Summary:
Added `--fast-stat-sync` option.
This avoids pickle and achieves `~7%` more `wps` on 16 nodes.
It is less flexible as it just aggregates only basic stats and it ignores the aggregate function defined by criterion.

Let me know what you think myleott
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/858

Differential Revision: D17398770

fbshipit-source-id: 36261a1d970e67deeda8211af8f009ef9b4f9c14

* Update README.md

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1140

Differential Revision: D17431506

Pulled By: myleott

fbshipit-source-id: b47dae303d7e76daa5b49795476b5e48d7b090ad

* Fix link to RACE fine-tuning instructions.

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1125

Differential Revision: D17431557

Pulled By: myleott

fbshipit-source-id: f712e5355d8dbb0a8f1170674d62e2b6880295b4

* dont project maske tokens for mlm loss (#859)

Summary:
This saves ~4-5gb gpu memory while training roberta large with `seq_len=512`.

I am able to fit `--max-sentences=16` on `volta32gb` for `roberta-large`
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/859

Differential Revision: D17435814

fbshipit-source-id: 2663909768fac0ef0102107613770ee01b1f8c00

* Minor fix to make adafactor work for >2d conv kernels (#1122)

Summary:
missing .unsqueeze(-1) in line 124,
without this change we'll encounter runtime error for >2d convolutional kernels, with this fix, we're applying adafactor's 2d logic to the two final dimensions.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1122

Differential Revision: D17431662

Pulled By: myleott

fbshipit-source-id: e7435e77270a9252f75f01b2457ef0048f5bcf36

* Add autogenerated cython files to gitignore (#860)

Summary:
`python setup.py build_ext --inplace` generates C++ source files directly in the Python source tree. They should most likely be ignored by git.
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/860

Differential Revision: D17460597

Pulled By: jma127

fbshipit-source-id: 72a29d438ebb57627b68ec7e9a2a77c8a36f1c21

* Add cython language_level hints

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1147

Differential Revision: D17468447

Pulled By: myleott

fbshipit-source-id: 0dbac04b92c8df74ad991d5e92cd02036d662369

* Add dataset class for weighted sampling with replacement. (#861)

Summary:
As discussed with Naman earlier today. Weighted sampling with
replacement can be done on a per-epoch basis using `set_epoch()`
functionality, which generates the samples as a function of random seed
and epoch.

Additionally, `FairseqTask` needs to set the starting epoch for the
dataset at the very beginning of iterator construction.

Not yet implemented is the per-epoch iterator construction, which
is necessary to actually regenerate the batches for each epoch.
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/861

Differential Revision: D17460687

Pulled By: jma127

fbshipit-source-id: 1c2a54f04ac96b3561c100a6fd66a9fccbe3c658

* added multilingual masked LM training (#849)

Summary:
The multilingual-RoBERTa training is working with aconneau XLM data.

Two pieces remaining:

1) `XLM` limits batch to be from same language, I am not 100% sure about the reason for that, but should be easy to implement, basically we can add `batch_by_size_and_language` instead of default `batch_by_size` function. If it's not critical, I would want to leave it out as it keeps the code very clean and simple.

2) `sample_ratio` in `ConcatDataset` works with `int` by tiling the datasets based on ratio. Currently I am handling it by sounding off the ratio to `first decimal` and then multiplying by `10`. We can see if some such simple heuristics are good enough, there are other options (we can talk about them offline).
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/849

Differential Revision: D17162460

fbshipit-source-id: d967f3d872f7a1f0aa4ea418bd362b68af9e432f

* Update README.race.md

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1155

Differential Revision: D17509762

Pulled By: myleott

fbshipit-source-id: 4de535289c1f35abff0d8142d8580f3ede039f47

* Remove extraneous call to RNG in multi-GPU code path

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/865

Differential Revision: D17510276

Pulled By: myleott

fbshipit-source-id: 24119402ad5fe95a1312fadb77bafe49a9197c6b

* fixed train valid epoch iter

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/866

Differential Revision: D17517115

fbshipit-source-id: fd6921e642c99e37fce6ad58b24c93e70a5364e5

* Miscellaneous documentation improvements: (#868)

Summary:
- More clearly document the correspondence between FairseqAdam and torch.optim.AdamW
- Add ResamplingDataset to Sphinx docs
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/868

Differential Revision: D17523244

Pulled By: jma127

fbshipit-source-id: 8e7b34b24889b2c8f70b09a52a625d2af135734b

* fixed corner case in mlm criterion when all tokens get masked

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/869

Reviewed By: myleott

Differential Revision: D17531776

Pulled By: myleott

fbshipit-source-id: 349c9449a0a7db5d3bb8449561302d4220cfa60c

* Issue 1146: Minor fix to roberta pre-training readme (#1165)

Summary:
This is to make this instructions a little more generalizable, since in some systems, bash will parse the spaces within quotes

Addressing https://github.com/pytorch/fairseq/issues/1146
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1165

Differential Revision: D17547810

Pulled By: myleott

fbshipit-source-id: 5a026d42f678126b5ca8bc4477ba8f26ea549dcd

* PR for Issue #1154: Two comments in lstm.py seem to be incorrect

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1185

Differential Revision: D17602249

Pulled By: lematt1991

fbshipit-source-id: bd515b7d2ebce8181a80684f45223a8db7c7e3cd

* Update getting_started.rst (#1188)

Summary:
Hi,

I think there is a minor mistake in the doc. `--distributed-no-spawn` argument is needed for distributed training on multiple machines without `slurm`. Otherwise, the program will start 8 jobs on each GPU, when `nproc_per_node=8`.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1188

Differential Revision: D17627778

Pulled By: myleott

fbshipit-source-id: 35ab6b650dc1132d7cb2d150e80d2ebf0caf3e69

* Explain the language modelling format in RoBERTa pretraining readme

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1174

Differential Revision: D17627767

Pulled By: myleott

fbshipit-source-id: 7b5f77146b8776a5967699e430136039c066c851

* Fixing BMUF warmup and sync strategy

Summary:
Bmuf sync started happening even before warmup is done.
This diff fixes the behavior and do bmuf sync once warmup is done or if it's zero.

TODO: write a unit test case so that these problems can be figure out faster.

Reviewed By: jay-mahadeokar

Differential Revision: D17356277

fbshipit-source-id: 21500e6ed1225b97794e4ee203e5d7d04a2840f8

* Levenshtein Transformer paper code

Summary:
Code for our NeurIPS paper [Levenshtein Transformer](https://arxiv.org/abs/1905.11006)
* Added Levenshtein Transformer model, task and criterion class
* Added iterative NAT Transformer, insertion Transformer and CMLM Transformer model class for baselines
* Add an option for prepending BOS to dictionary class and translation task class

Reviewed By: myleott

Differential Revision: D17297372

fbshipit-source-id: 54eca60831ae95dc721c2c34e882e1810ee575c7

* Fixing example of batched predictions for Roberta (#1195)

Summary:
For batched predictions in Roberta, the README was giving an example that was pretty unclear. After a thorough discussion with ngoyal2707 in issue https://github.com/pytorch/fairseq/issues/1167 he gave a clear example of how batched predictions were supposed to be done. Since I spent a lot of time on this inconsistency, I thought that it might benefit the community if his solution was in the official README 😄 !

For for details, see issue https://github.com/pytorch/fairseq/issues/1167
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1195

Differential Revision: D17639354

Pulled By: myleott

fbshipit-source-id: 3eb60c5804a6481f533b19073da7880dfd0d522d

* RoBERTa now supported on TPU and TensorFlow via transformers library

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1197

Differential Revision: D17651374

Pulled By: myleott

fbshipit-source-id: 5feb986de1e682eb83c4479f419ad51325718572

* Implementation of the WeCNLP abstract "Cross+Self-Attention for Transformer Models" (#1097)

Summary:
This PR implements a new attention module which combines cross-attention (encoder-decoder attention) and the decoder self-attention. This work was accepted as an abstract at WeCNLP 2019 (https://www.wecnlp.ai/wecnlp-2019).

Cross+Self-Attention reduces the amount of parameter and increases the inference speed without any degradation in translation quality.
More details can be found in the attached [abstract](https://github.com/pytorch/fairseq/files/3561282/paper.pdf)
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1097

Differential Revision: D17653168

Pulled By: myleott

fbshipit-source-id: deb834c2c78a229d7418ffbfea20ba3ce252991c

* fix typo in README of examples/translation

Summary: Pull Request resolved: https://github.com/pytorch/fairseq/pull/1200

Differential Revision: D17659658

Pulled By: myleott

fbshipit-source-id: 1863e6d60a439dbb7e71e5da68817c9d53649737

* Fix torch.hub to not depend on libnat

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/878

Differential Revision: D17661768

Pulled By: myleott

fbshipit-source-id: 1e4c5f09eb14c40d491ca2459fd2adb8382fb6d2

* Implementation of the paper "Jointly Learning to Align and Translate with Transformer Models" (#877)

Summary:
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/877

This PR implements guided alignment training described in  "Jointly Learning to Align and Translate with Transformer Models (https://arxiv.org/abs/1909.02074)".

In summary, it allows for training selected heads of the Transformer Model with external alignments computed by Statistical Alignment Toolkits. During inference, attention probabilities from the trained heads can be used to extract reliable alignments. In our work, we did not see any regressions in the translation performance because of guided alignment training.
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1095

Differential Revision: D17170337

Pulled By: myleott

fbshipit-source-id: daa418bef70324d7088dbb30aa2adf9f95774859

* extract FP16OptimizerMixin for share the same logic in PyText (#1180)

Summary:
Pull Request resolved: https://github.com/pytorch/fairseq/pull/1180

Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/874

extract FP16OptimizerMixin for share the same logic in PyText

Reviewed By: hudeven

Differential Revision: D17594102

fbshipit-source-id: 8625a4e4f3e09cbaba6ae92599c1121b86ed4e78

* Native Torchscript Wordpiece Tokenizer Op for BERTSquadQA, Torchscriptify BertSQUADQAModel (#879)

Summary:
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/879

Pull Request resolved: https://github.com/facebookresearch/pytext/pull/1023

Pull Request resolved: https://github.com/pytorch/fairseq/pull/1211

Added a new native op that does wordpiece tokenization while additionally returning token start and end indices in the raw text as required by BertSquadQA. Includes Unit Tests for the native op and also to check its parity with the PyText Wordpiece Tokenizer.

Also combined is a torchscript implementation of the Bert SQUAD QA Model.

There are scripts for evaluation and testing of the torchscript code as well.

Reviewed By: borguz, hikushalhere

Differential Revision: D17455985

fbshipit-source-id: c2617c7ecbce0f733b31d04558da965d0b62637b

* Add periodic CUDA cache cleanup (#882)

Summary:
This adds a periodic call to `torch.cuda.empty_cache()` in order to
mitigate memory fragmentation in the PyTorch CUDA cached allocator
that can cause OOMs on models approaching GPU memory limit.
By default, this will occur every 64 updates.

Performance considerations:

- I've benchmarked this on a reasonably large model with memory
  footprint 16 GB, and the overhead with the default setting is <0.2%.
  With `update-freq > 1`, the cost is mitigated even further.
- This behavior can be disabled with a value of zero.
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/882

Differential Revision: D17742386

Pulled By: jma127

fbshipit-source-id: 68d8f93f798d6818b5efc3d67d43b52dfb8b2865

* add pre-trained wav2vec model

Summary: Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/884

Differential Revision: D17774515

Pulled By: alexeib

fbshipit-source-id: d1ffe8ab723fa284c69b067bbd43d699eaa2f02f

* Setting Global sync to 50 in BMUF

Summary:
In all our final settings, we are using global_sync = 50 and we get comparable results with DDP and caffe2.

Setting the default global-sync-iter = 50
and users can just define --use-bmuf to enable it for training.

Reviewed By: skritika

Differential Revision: D17765094

fbshipit-source-id: 369591eeff266d757f89e1fc8dda01711146fdbc

* fix max lengths in Levenshtein Tramsformer

Summary: Fix the max length calculation in Levenshtein Transformer

Reviewed By: jhcross

Differential Revision: D17672946

fbshipit-source-id: e5efbe7e56cf879d3e822864e4398f99f45b04d4

* ensemble levts

Summary:
Add ensemble wrappers to the levenshtein NAT.
Levenshtein
Final softmax ensemble over the pipeline of three steps: deletion, placeholder insertion, and word selection.
1. Deletion
2. Placeholder Insertion
3. Word Selection

Each step involves scoring, averaging the scores over the ensemble, and then make hard decisions with argmax. Then next step follows. We cannot do the three steps in parallel by design.

Reviewed By: kahne

Differential Revision: D17723202

fbshipit-source-id: 05f7a4fcd922a972cc4796ca397e8220f0b4d53e

* Add printing of PyTorch memory summary on OOM (#885)

Summary:
PyTorch now has more comprehensive memory instrumentation, added in https://github.com/pytorch/pytorch/pull/27361 . This PR makes fairseq print a summary table of the memory state when an OOM occurs.
Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/885

Differential Revision: D17820445

Pulled By: jma127

fbshipit-source-id: 1887417c7648d703f78e1cff9f2a5b89901f49d0

* Fix data loading memory issue in pyspeech

Summary:
We currently shard data when creating the batch iterator. This means we first load all indicese/frame lengths/handles into memory, and then do the sharding. This makes it impossible to train on large datasets with a high amount of workers  because each worker will need to load the entire dataset into memory. For training on a million hours of data (i.e. semi-supervised or unsupervised approaches) this data loading just makes it…
  • Loading branch information
taylanbil authored Nov 19, 2019
1 parent aa2c3b3 commit 130d455
Show file tree
Hide file tree
Showing 335 changed files with 23,157 additions and 3,130 deletions.
11 changes: 10 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,16 @@ ENV/
.mypy_cache/

# Generated files
fairseq/temporal_convolution_tbc
/fairseq/temporal_convolution_tbc
/fairseq/modules/*_layer/*_forward.cu
/fairseq/modules/*_layer/*_backward.cu

# data
data-bin/

# reranking
/examples/reranking/rerank_data

# Cython-generated C++ source files
/fairseq/data/data_utils_fast.cpp
/fairseq/data/token_block_utils_fast.cpp
77 changes: 76 additions & 1 deletion CODE_OF_CONDUCT.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,77 @@
# Code of Conduct
Facebook has adopted a Code of Conduct that we expect project participants to adhere to. Please [read the full text](https://code.fb.com/codeofconduct) so that you can understand what actions will and will not be tolerated.

## Our Pledge

In the interest of fostering an open and welcoming environment, we as
contributors and maintainers pledge to make participation in our project and
our community a harassment-free experience for everyone, regardless of age, body
size, disability, ethnicity, sex characteristics, gender identity and expression,
level of experience, education, socio-economic status, nationality, personal
appearance, race, religion, or sexual identity and orientation.

## Our Standards

Examples of behavior that contributes to creating a positive environment
include:

* Using welcoming and inclusive language
* Being respectful of differing viewpoints and experiences
* Gracefully accepting constructive criticism
* Focusing on what is best for the community
* Showing empathy towards other community members

Examples of unacceptable behavior by participants include:

* The use of sexualized language or imagery and unwelcome sexual attention or
advances
* Trolling, insulting/derogatory comments, and personal or political attacks
* Public or private harassment
* Publishing others' private information, such as a physical or electronic
address, without explicit permission
* Other conduct which could reasonably be considered inappropriate in a
professional setting

## Our Responsibilities

Project maintainers are responsible for clarifying the standards of acceptable
behavior and are expected to take appropriate and fair corrective action in
response to any instances of unacceptable behavior.

Project maintainers have the right and responsibility to remove, edit, or
reject comments, commits, code, wiki edits, issues, and other contributions
that are not aligned to this Code of Conduct, or to ban temporarily or
permanently any contributor for other behaviors that they deem inappropriate,
threatening, offensive, or harmful.

## Scope

This Code of Conduct applies within all project spaces, and it also applies when
an individual is representing the project or its community in public spaces.
Examples of representing a project or community include using an official
project e-mail address, posting via an official social media account, or acting
as an appointed representative at an online or offline event. Representation of
a project may be further defined and clarified by project maintainers.

## Enforcement

Instances of abusive, harassing, or otherwise unacceptable behavior may be
reported by contacting the project team at <opensource-conduct@fb.com>. All
complaints will be reviewed and investigated and will result in a response that
is deemed necessary and appropriate to the circumstances. The project team is
obligated to maintain confidentiality with regard to the reporter of an incident.
Further details of specific enforcement policies may be posted separately.

Project maintainers who do not follow or enforce the Code of Conduct in good
faith may face temporary or permanent repercussions as determined by other
members of the project's leadership.

## Attribution

This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html

[homepage]: https://www.contributor-covenant.org

For answers to common questions about this code of conduct, see
https://www.contributor-covenant.org/faq

10 changes: 4 additions & 6 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Contributing to FAIR Sequence-to-Sequence Toolkit (PyTorch)
# Contributing to Facebook AI Research Sequence-to-Sequence Toolkit (fairseq)
We want to make contributing to this project as easy and transparent as
possible.

Expand All @@ -22,9 +22,7 @@ Complete your CLA here: <https://code.facebook.com/cla>
We use GitHub issues to track public bugs. Please ensure your description is
clear and has sufficient instructions to be able to reproduce the issue.

## Coding Style
We try to follow the PEP style guidelines and encourage you to as well.

## License
By contributing to FAIR Sequence-to-Sequence Toolkit, you agree that your contributions will be licensed
under the LICENSE file in the root directory of this source tree.
By contributing to Facebook AI Research Sequence-to-Sequence Toolkit (fairseq),
you agree that your contributions will be licensed under the LICENSE file in
the root directory of this source tree.
43 changes: 17 additions & 26 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -1,30 +1,21 @@
BSD License
MIT License

For fairseq software
Copyright (c) Facebook, Inc. and its affiliates.

Copyright (c) 2017-present, Facebook, Inc. All rights reserved.
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

Redistribution and use in source and binary forms, with or without modification,
are permitted provided that the following conditions are met:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

* Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.

* Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.

* Neither the name Facebook nor the names of its contributors may be used to
endorse or promote products derived from this software without specific
prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR
ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
33 changes: 0 additions & 33 deletions PATENTS

This file was deleted.

112 changes: 67 additions & 45 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,33 +1,53 @@
# Introduction <img src="fairseq_logo.png" width="50">
# <img src="fairseq_logo.png" width="30"> Introduction

Fairseq(-py) is a sequence modeling toolkit that allows researchers and
developers to train custom models for translation, summarization, language
modeling and other text generation tasks. It provides reference implementations
of various sequence-to-sequence models, including:
modeling and other text generation tasks.

### What's New:

- September 2019: [Nonautoregressive translation code released](examples/nonautoregressive_translation/README.md)
- August 2019: [WMT'19 models released](examples/wmt19/README.md)
- July 2019: fairseq relicensed under MIT license
- July 2019: [RoBERTa models and code released](examples/roberta/README.md)
- June 2019: [wav2vec models and code released](examples/wav2vec/README.md)

### Features:

Fairseq provides reference implementations of various sequence-to-sequence models, including:
- **Convolutional Neural Networks (CNN)**
- [Dauphin et al. (2017): Language Modeling with Gated Convolutional Networks](examples/language_model/conv_lm/README.md)
- [Gehring et al. (2017): Convolutional Sequence to Sequence Learning](examples/conv_seq2seq/README.md)
- [Edunov et al. (2018): Classical Structured Prediction Losses for Sequence to Sequence Learning](https://github.com/pytorch/fairseq/tree/classic_seqlevel)
- [Fan et al. (2018): Hierarchical Neural Story Generation](examples/stories/README.md)
- **_New_** [wav2vec: Unsupervised Pre-training for Speech Recognition (Schneider et al., 2019)](examples/wav2vec/README.md)
- [Language Modeling with Gated Convolutional Networks (Dauphin et al., 2017)](examples/language_model/conv_lm/README.md)
- [Convolutional Sequence to Sequence Learning (Gehring et al., 2017)](examples/conv_seq2seq/README.md)
- [Classical Structured Prediction Losses for Sequence to Sequence Learning (Edunov et al., 2018)](https://github.com/pytorch/fairseq/tree/classic_seqlevel)
- [Hierarchical Neural Story Generation (Fan et al., 2018)](examples/stories/README.md)
- [wav2vec: Unsupervised Pre-training for Speech Recognition (Schneider et al., 2019)](examples/wav2vec/README.md)
- **LightConv and DynamicConv models**
- **_New_** [Wu et al. (2019): Pay Less Attention with Lightweight and Dynamic Convolutions](examples/pay_less_attention_paper/README.md)
- [Pay Less Attention with Lightweight and Dynamic Convolutions (Wu et al., 2019)](examples/pay_less_attention_paper/README.md)
- **Long Short-Term Memory (LSTM) networks**
- [Luong et al. (2015): Effective Approaches to Attention-based Neural Machine Translation](https://arxiv.org/abs/1508.04025)
- [Wiseman and Rush (2016): Sequence-to-Sequence Learning as Beam-Search Optimization](https://arxiv.org/abs/1606.02960)
- Effective Approaches to Attention-based Neural Machine Translation (Luong et al., 2015)
- **Transformer (self-attention) networks**
- [Vaswani et al. (2017): Attention Is All You Need](https://arxiv.org/abs/1706.03762)
- [Ott et al. (2018): Scaling Neural Machine Translation](examples/scaling_nmt/README.md)
- [Edunov et al. (2018): Understanding Back-Translation at Scale](examples/backtranslation/README.md)
- **_New_** [Baevski and Auli (2018): Adaptive Input Representations for Neural Language Modeling](examples/language_model/transformer_lm/README.md)
- **_New_** [Shen et al. (2019): Mixture Models for Diverse Machine Translation: Tricks of the Trade](examples/translation_moe/README.md)

Fairseq features:
- Attention Is All You Need (Vaswani et al., 2017)
- [Scaling Neural Machine Translation (Ott et al., 2018)](examples/scaling_nmt/README.md)
- [Understanding Back-Translation at Scale (Edunov et al., 2018)](examples/backtranslation/README.md)
- [Adaptive Input Representations for Neural Language Modeling (Baevski and Auli, 2018)](examples/language_model/transformer_lm/README.md)
- [Mixture Models for Diverse Machine Translation: Tricks of the Trade (Shen et al., 2019)](examples/translation_moe/README.md)
- [RoBERTa: A Robustly Optimized BERT Pretraining Approach (Liu et al., 2019)](examples/roberta/README.md)
- [Facebook FAIR's WMT19 News Translation Task Submission (Ng et al., 2019)](examples/wmt19/README.md)
- [Jointly Learning to Align and Translate with Transformer Models (Garg et al., 2019)](examples/joint_alignment_translation/README.md )
- **Non-autoregressive Transformers**
- Non-Autoregressive Neural Machine Translation (Gu et al., 2017)
- Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative Refinement (Lee et al. 2018)
- Insertion Transformer: Flexible Sequence Generation via Insertion Operations (Stern et al. 2019)
- Mask-Predict: Parallel Decoding of Conditional Masked Language Models (Ghazvininejad et al., 2019)
- [Levenshtein Transformer (Gu et al., 2019)](examples/nonautoregressive_translation/README.md)


**Additionally:**
- multi-GPU (distributed) training on one machine or across multiple machines
- fast generation on both CPU and GPU with multiple search algorithms implemented:
- beam search
- Diverse Beam Search ([Vijayakumar et al., 2016](https://arxiv.org/abs/1610.02424))
- sampling (unconstrained and top-k)
- sampling (unconstrained, top-k and top-p/nucleus)
- large mini-batch training even on a single GPU via delayed updates
- mixed precision training (trains faster with less GPU memory on [NVIDIA tensor cores](https://developer.nvidia.com/tensor-cores))
- extensible: easily register new models, criterions, tasks, optimizers and learning rate schedulers
Expand All @@ -39,35 +59,33 @@ translation and language modeling datasets.

# Requirements and Installation

* [PyTorch](http://pytorch.org/) version >= 1.0.0
* [PyTorch](http://pytorch.org/) version >= 1.2.0
* Python version >= 3.5
* For training new models, you'll also need an NVIDIA GPU and [NCCL](https://github.com/NVIDIA/nccl)
* **For faster training** install NVIDIA's [apex](https://github.com/NVIDIA/apex) library with the `--cuda_ext` option

To install fairseq:
```bash
pip install fairseq
```

Please follow the instructions here to install PyTorch: https://github.com/pytorch/pytorch#installation.
On MacOS:
```bash
CFLAGS="-stdlib=libc++" pip install fairseq
```

If you use Docker make sure to increase the shared memory size either with
`--ipc=host` or `--shm-size` as command line options to `nvidia-docker run`.

After PyTorch is installed, you can install fairseq with `pip`:
```
pip install fairseq
```

**Installing from source**

To install fairseq from source and develop locally:
```
```bash
git clone https://github.com/pytorch/fairseq
cd fairseq
pip install --editable .
```

**Improved training speed**

Training speed can be further improved by installing NVIDIA's
[apex](https://github.com/NVIDIA/apex) library with the `--cuda_ext` option.
fairseq will automatically switch to the faster modules provided by apex.

# Getting Started

The [full documentation](https://fairseq.readthedocs.io/) contains instructions
Expand All @@ -80,28 +98,32 @@ We provide pre-trained models and pre-processed, binarized test sets for several
as well as example training and evaluation commands.

- [Translation](examples/translation/README.md): convolutional and transformer models are available
- [Language Modeling](examples/language_model/README.md): convolutional models are available
- [Language Modeling](examples/language_model/README.md): convolutional and transformer models are available
- [wav2vec](examples/wav2vec/README.md): wav2vec large model is available

We also have more detailed READMEs to reproduce results from specific papers:
- [Schneider et al. (2019): wav2vec: Unsupervised Pre-training for Speech Recognition](examples/wav2vec/README.md)
- [Shen et al. (2019) Mixture Models for Diverse Machine Translation: Tricks of the Trade](examples/translation_moe/README.md)
- [Wu et al. (2019): Pay Less Attention with Lightweight and Dynamic Convolutions](examples/pay_less_attention_paper/README.md)
- [Edunov et al. (2018): Understanding Back-Translation at Scale](examples/backtranslation/README.md)
- [Edunov et al. (2018): Classical Structured Prediction Losses for Sequence to Sequence Learning](https://github.com/pytorch/fairseq/tree/classic_seqlevel)
- [Fan et al. (2018): Hierarchical Neural Story Generation](examples/stories/README.md)
- [Ott et al. (2018): Scaling Neural Machine Translation](examples/scaling_nmt/README.md)
- [Gehring et al. (2017): Convolutional Sequence to Sequence Learning](examples/conv_seq2seq/README.md)
- [Dauphin et al. (2017): Language Modeling with Gated Convolutional Networks](examples/language_model/conv_lm/README.md)
- [Jointly Learning to Align and Translate with Transformer Models (Garg et al., 2019)](examples/joint_alignment_translation/README.md )
- [Levenshtein Transformer (Gu et al., 2019)](examples/nonautoregressive_translation/README.md)
- [Facebook FAIR's WMT19 News Translation Task Submission (Ng et al., 2019)](examples/wmt19/README.md)
- [RoBERTa: A Robustly Optimized BERT Pretraining Approach (Liu et al., 2019)](examples/roberta/README.md)
- [wav2vec: Unsupervised Pre-training for Speech Recognition (Schneider et al., 2019)](examples/wav2vec/README.md)
- [Mixture Models for Diverse Machine Translation: Tricks of the Trade (Shen et al., 2019)](examples/translation_moe/README.md)
- [Pay Less Attention with Lightweight and Dynamic Convolutions (Wu et al., 2019)](examples/pay_less_attention_paper/README.md)
- [Understanding Back-Translation at Scale (Edunov et al., 2018)](examples/backtranslation/README.md)
- [Classical Structured Prediction Losses for Sequence to Sequence Learning (Edunov et al., 2018)](https://github.com/pytorch/fairseq/tree/classic_seqlevel)
- [Hierarchical Neural Story Generation (Fan et al., 2018)](examples/stories/README.md)
- [Scaling Neural Machine Translation (Ott et al., 2018)](examples/scaling_nmt/README.md)
- [Convolutional Sequence to Sequence Learning (Gehring et al., 2017)](examples/conv_seq2seq/README.md)
- [Language Modeling with Gated Convolutional Networks (Dauphin et al., 2017)](examples/language_model/conv_lm/README.md)

# Join the fairseq community

* Facebook page: https://www.facebook.com/groups/fairseq.users
* Google group: https://groups.google.com/forum/#!forum/fairseq-users

# License
fairseq(-py) is BSD-licensed.
fairseq(-py) is MIT-licensed.
The license applies to the pre-trained models as well.
We also provide an additional patent grant.

# Citation

Expand Down
Loading

0 comments on commit 130d455

Please sign in to comment.