Merlin - bring back auto-language detection #121

smahoney58 · 2017-10-30T14:31:13Z

Newman used to have the capability to auto detect the language used in an email and index it appropriately. Now the user has to pick the language before ingesting. Problems with this include:

1 - how do you know what language is used in the email before you ingest
2 - only works if there are just two languages is in the email dataset (i.e. email datasets that have English, Spanish, and Chinese emails can't be processed since you can only pick one other language).

Currently, the only other language supported is Spanish. Issue #120 is the request to support other languages.

In general, how version 4.x handles multiple languages needs to be re-designed and re-implemented. Almost every dataset we have ingested includes multiple languages.

smahoney58 added enhancement v4.0.0 2019 and removed v4.0.0 labels Oct 5, 2018

smahoney58 assigned justinlueders Oct 5, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merlin - bring back auto-language detection #121

Merlin - bring back auto-language detection #121

smahoney58 commented Oct 30, 2017 •

edited

Loading

Merlin - bring back auto-language detection #121

Merlin - bring back auto-language detection #121

Comments

smahoney58 commented Oct 30, 2017 • edited Loading

smahoney58 commented Oct 30, 2017 •

edited

Loading