Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to deal with different languages #373

Closed
rickgoud opened this issue Oct 2, 2020 · 4 comments
Closed

How to deal with different languages #373

rickgoud opened this issue Oct 2, 2020 · 4 comments

Comments

@rickgoud
Copy link

rickgoud commented Oct 2, 2020

Hi,

Great repo! However only works for English. How would it work if we wanted to add classifiers for Dutch? How to add them the best way and how does presidio know which classifiers to use depending on the language, or will it always run all (which feels like to much overkill, so should be a more intelligent way I don't fully understand yet.

Any help would be greatly appreciated!!

Regards,
Rick

@omri374
Copy link
Contributor

omri374 commented Oct 4, 2020

Hi @rickgoud, thanks for your comment!

To get started with additional languages, see here.
Note that you would have to create your own recognizers in Dutch. Presidio's architecture supports multiple languages and also supports spaCy and Stanza NLP frameworks which probably have language models in Dutch (I haven't checked myself).

For best practices on developing new recognizers, see here.

Each call to Presidio has a language parameter. Presidio assumes you know what language you're sending to it. If that's not the case, consider using a language detection mechanism prior to calling Presidio. In addition, each recognizer (in charge of detecting one or more PII entities), is configured to support specific languages. This is in a nutshell how Presidio can be set up to support multiple languages at once or call a subset of recognizers given the input language.

@omri374 omri374 closed this as completed Oct 25, 2020
@Oheed911
Copy link

i can not find the files there, can you provide me the links, if they are updated? Thanks

@omri374
Copy link
Contributor

omri374 commented Dec 21, 2023

Hi @Oheed911, please see the updated link here: https://microsoft.github.io/presidio/analyzer/languages/

@Oheed911
Copy link

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants