Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merlin - bring back auto-language detection #121

Open
smahoney58 opened this issue Oct 30, 2017 · 0 comments
Open

Merlin - bring back auto-language detection #121

smahoney58 opened this issue Oct 30, 2017 · 0 comments
Assignees

Comments

@smahoney58
Copy link
Collaborator

smahoney58 commented Oct 30, 2017

Newman used to have the capability to auto detect the language used in an email and index it appropriately. Now the user has to pick the language before ingesting. Problems with this include:

1 - how do you know what language is used in the email before you ingest
2 - only works if there are just two languages is in the email dataset (i.e. email datasets that have English, Spanish, and Chinese emails can't be processed since you can only pick one other language).

Currently, the only other language supported is Spanish. Issue #120 is the request to support other languages.

In general, how version 4.x handles multiple languages needs to be re-designed and re-implemented. Almost every dataset we have ingested includes multiple languages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants