Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot process Chinese correctly #99

Open
TomoakiChenSinica opened this issue Jul 18, 2023 · 0 comments
Open

Cannot process Chinese correctly #99

TomoakiChenSinica opened this issue Jul 18, 2023 · 0 comments

Comments

@TomoakiChenSinica
Copy link

Language
Which language(s) this issue relates to.
Chinese

Describe the bug
A clear and concise description of what the bug is.
I cannot process chinese sentence correctly.

To Reproduce
Steps to reproduce the behavior

  1. I ran a code like the code block in Screenshots.
  2. I got the result like:
{"Language":"zh","Length":5,"Value":"往前走五步","TokensData":[[{"Bounds":[0,4],"Tag":"PROPN"}]]}

Expected behavior
A clear and concise description of what you expected to happen.
Tokenize and tag correctly

Screenshots
If applicable, add a code example to help explain your problem.

Here is my code:

Catalyst.Models.Chinese.Register(); //You need to pre-register each language (and install the respective NuGet Packages)

Storage.Current = new DiskStorage("catalyst-models");
var nlp = await Pipeline.ForAsync(Language.Chinese);
var doc = new Document("諸葛亮是三國時代著名軍師", Language.Chinese);
nlp.ProcessSingle(doc);
Console.WriteLine(doc.ToJson());   

Additional context
Thank you for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant