accent encoding / expected charcodes are wrong #2

drzraf · 2017-04-28T17:33:20Z

Most character are using the "\uXXXX" notation, but the values for XXXX are taken from the ISO version of the snowball files.
The FrenchStemmer r_un_accent function endup using things like:
if (!sbp.e_s_b(1, "\u00E9"))
instead if
if (!sbp.e_s_b(1, "\uC3A9"))

That's an issue for program like https://github.com/MihaiValentin/lunr-languages which intend to use this stemmer for client-side full-text search purposes.

The text was updated successfully, but these errors were encountered:

drzraf changed the title ~~accent are wrong~~ accent encoding / expected charcodes are wrong Apr 28, 2017

drzraf mentioned this issue Apr 28, 2017

trimmer MihaiValentin/lunr-languages#12

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

accent encoding / expected charcodes are wrong #2

accent encoding / expected charcodes are wrong #2

drzraf commented Apr 28, 2017

accent encoding / expected charcodes are wrong #2

accent encoding / expected charcodes are wrong #2

Comments

drzraf commented Apr 28, 2017