0.4.8
Minor release
Experimental Support for fast Rust Tokenizers (#482)
While preprocessing is usually not the bottleneck in our pipelines, there's still significant time spent on it (~ 20 % for QA inference). We saw substantial speed-ups with HuggingFace's "FastTokenizers" that are based on rust. We are therefore introducing a basic "experimental" implementation with this release. We are planning to stabilizing it and having a smoother fit into the FARM processor.
Usage:
tokenizer = Tokenizer.load(pretrained_model_name_or_path=""bert-base-german-cased"",
do_lower_case=False,
use_fast=True)
Upgrade to transformers 3.1.0 (#464)
The latest transformers release has quite interesting new features - one of them being basic support of a DPR model class (Dense Passage Retriever). This will simplify our dense passage retriever integration in Haystack and the upcoming DPR training which we plan to have in FARM.
Details
Question Answering
- Add asserts on doc_stride and max_seq_len to prevent issues with sliding window #538
- fix Natural Question inference processing #521
Other
- Fix logging of error msg for FastTokenizer + QA #541
- Fix truncation warnings in tokenizer #528
- Evaluate model on best model when doing early stopping #524
- Bump transformers version to 3.1.0 #515
- Add warmup run to component benchmark #504
- Add optional s3 auth via params #511
- Add option to use fast HF tokenizer. #482
- CodeBERT support for embeddings #488
- Store test eval result in variable #506
- Fix typo f1 micro vs. macro #505
Big thanks to all contributors!
@PhilipMay @lambdaofgod @Timoeller @tanaysoni @brandenchan @bogdankostic @kolk @tholor