Cannot process Chinese correctly #99

TomoakiChenSinica · 2023-07-18T05:39:44Z

Language
Which language(s) this issue relates to.
Chinese

Describe the bug
A clear and concise description of what the bug is.
I cannot process chinese sentence correctly.

To Reproduce
Steps to reproduce the behavior

I ran a code like the code block in Screenshots.
I got the result like:

{"Language":"zh","Length":5,"Value":"往前走五步","TokensData":[[{"Bounds":[0,4],"Tag":"PROPN"}]]}

Expected behavior
A clear and concise description of what you expected to happen.
Tokenize and tag correctly

Screenshots
If applicable, add a code example to help explain your problem.

Here is my code:

Catalyst.Models.Chinese.Register(); //You need to pre-register each language (and install the respective NuGet Packages)

Storage.Current = new DiskStorage("catalyst-models");
var nlp = await Pipeline.ForAsync(Language.Chinese);
var doc = new Document("諸葛亮是三國時代著名軍師", Language.Chinese);
nlp.ProcessSingle(doc);
Console.WriteLine(doc.ToJson());

Additional context
Thank you for your help!

TomoakiChenSinica added the language-bug label Jul 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot process Chinese correctly #99

Cannot process Chinese correctly #99

TomoakiChenSinica commented Jul 18, 2023

Cannot process Chinese correctly #99

Cannot process Chinese correctly #99

Comments

TomoakiChenSinica commented Jul 18, 2023