Skip to content

Latest commit

 

History

History
9 lines (7 loc) · 980 Bytes

released_model.md

File metadata and controls

9 lines (7 loc) · 980 Bytes

Released Models

Language Model Released

Language Model Training Data Token-based Size Descriptions
English LM CommonCrawl(en.00) Word-based 8.3 GB Pruned with 0 1 1 1 1;
About 1.85 billion n-grams;
'trie' binary with '-a 22 -q 8 -b 8'
Mandarin LM Small Baidu Internal Corpus Char-based 2.8 GB Pruned with 0 1 2 4 4;
About 0.13 billion n-grams;
'probing' binary with default settings
Mandarin LM Large Baidu Internal Corpus Char-based 70.4 GB No Pruning;
About 3.7 billion n-grams;
'probing' binary with default settings