Skip to content

v1.1.0 - Improvements, Docker Image & REST API

Compare
Choose a tag to compare
@nreimers nreimers released this 15 Mar 13:32
· 31 commits to main since this release
27bef67

This release brings several improvements and is the first step towards the release of a Docker Image + REST API.

Improvements:

  • Docker REST API: We have published Docker images for a REST API, that allows the easy usage of EasyNMT. Just run the Docker image and starts translating using REST API calls: more info
  • Google Colab REST API Hosting: We have published a colab notenbook that shows to to wrap EasyNMT in a REST API and host it on Google Colab with a free GPU. Useful if you need to translate large amounts.
  • Long sentences are translated first: Sentences are sorted before they are translated in order to waste minimal time with padding tokens. In the previous version, the shortest sentences were translated first and then later the longer sentences. Now the order is reversed. This has several advantages: If an OOM happens, it happens at the start of the translation process and not at the end. Also, the estimate from the progress bar is more accurate as the longest and slowest sentences are now translated first.
  • Improve language detection: Automatic language is still an issue, especially for mixed languages. Language detection is now performed on document level and not on sentence level. If you need sentence level lang. detection on sentence level you can set document_language_detection=False for the translate method. Also, text is now lower cased before the language is detected (the lang. detection scripts had issues with all upper case text
  • Max length parameter: When you create your model like this: model = EasyNMT(model_name, max_length=100), then all sentences with more than 100 word pieces will be truncated to at max 100 word pieces. This can prevent OOM with too long sentences.
  • Load model without translator: If you just want to use the language detection methods, you can now load your model like model = EasyNMT(model_name, load_translator=False). This will prevent the loading of the translation engine.

Roadmap

  • As soon as Huggingface transformers v4.4.0 is released, the dependency on fairseq can be removed as the mBART50 and m2m models will be available in HF transformers. This will make the installation on a Windows machine possible