Minor release

Experimental Support for fast Rust Tokenizers (#482)

While preprocessing is usually not the bottleneck in our pipelines, there's still significant time spent on it (~ 20 % for QA inference). We saw substantial speed-ups with HuggingFace's "FastTokenizers" that are based on rust. We are therefore introducing a basic "experimental" implementation with this release. We are planning to stabilizing it and having a smoother fit into the FARM processor.

Usage:

tokenizer = Tokenizer.load(pretrained_model_name_or_path=""bert-base-german-cased"",
                           do_lower_case=False, 
                           use_fast=True)

Upgrade to transformers 3.1.0 (#464)

The latest transformers release has quite interesting new features - one of them being basic support of a DPR model class (Dense Passage Retriever). This will simplify our dense passage retriever integration in Haystack and the upcoming DPR training which we plan to have in FARM.

Details

Question Answering

Add asserts on doc_stride and max_seq_len to prevent issues with sliding window #538
fix Natural Question inference processing #521

Other

Fix logging of error msg for FastTokenizer + QA #541
Fix truncation warnings in tokenizer #528
Evaluate model on best model when doing early stopping #524
Bump transformers version to 3.1.0 #515
Add warmup run to component benchmark #504
Add optional s3 auth via params #511
Add option to use fast HF tokenizer. #482
CodeBERT support for embeddings #488
Store test eval result in variable #506
Fix typo f1 micro vs. macro #505

Big thanks to all contributors!
@PhilipMay @lambdaofgod @Timoeller @tanaysoni @brandenchan @bogdankostic @kolk @tholor

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

0.4.8