Support CJK (mutiple language) search #2052

HillLiu · 2023-03-28T01:47:15Z

Problem

Anyone who has a CJK search requirement, There is a quick hack way.

Proposed Solution

Just need add two extra JS.

Add them to book.toml additional-js

https://github.com/allfunc/docker-mdbook/blob/main/mdbook-demo/book.toml#L20

Play it with docker directly.

curl https://raw.githubusercontent.com/allfunc/docker-mdbook/main/bin/preview.sh | bash -s -- start

Notes

The trick is to replace Elasticlunr.js with fzf.

The text was updated successfully, but these errors were encountered:

refs: Support CJK (mutiple language) search #2052 rust-lang/mdBook#2052

CoralPink · 2023-03-28T04:45:25Z

This is great! It now responds to Japanese!

(Sorry, I used it without permission...😓)

wc7086 · 2023-05-14T10:52:27Z

ajitid/fzf-for-js#112

duskmoon314 · 2024-01-24T03:19:54Z

I just created a modified fork in this way, and it worked well. The modified version and the modified commit.

It is better to adjust the makeTeaser function according to fzf's score if we want to replace all of Elasticlunr's functions. But I still need to dig into it.

CoralPink · 2024-01-24T05:46:35Z

This topic is interesting to me as well, and I would like to see it develop!

I have always wanted to know how to reflect search scores.

duskmoon314 · 2024-01-25T02:57:27Z

I have always wanted to know how to reflect search scores.

Fzf calculates the score according to how the result matches the query and where it occurs. (If my understanding is correct).

I just took a look at makeTeaser, it uses a sliding window to extract the most valuable part which contains the result, showing user where the outcome occurs.

But it depends on two assumptions:

Sentences ended with . (dot + whitespace) src
Words can be split with (whitespace) src

As far as I know, these two assumptions are incorrect in Chinese and Japanese. Though it can be used with Chinese and cause a long context with incorrectly emphasized phrases.

To my understanding, we must figure out how makeTeaser should work in different languages to push this issue further.

CoralPink · 2024-01-25T06:37:10Z

I see!
To be honest, I was using this code without really understanding it😅

By the way, this is a bit off topic, but .......

https://rust-lang.github.io/mdBook/format/configuration/renderers.html?highlight=score#outputhtmlsearch

I think this is an elasticlunr option that can be changed by the user through book.toml, but is the score boost applied the same way in fzf?

duskmoon314 · 2024-01-25T07:06:08Z

I think this is an elasticlunr option that can be changed by the user through book.toml, but is the score boost applied the same way in fzf?

No. There are not many options for fzf's searching function. I think the way to calculate the score is taken from the original Go implementation, and the API doesn't provide a way to control it. API doc

To be honest, I was using this code without really understanding it

My understanding:

function fzfLoad(index) {
    // The argument `index` is generated by crate elasticlunr-rs
    // It contains all pages with their title, breadcrumbs, contents, and some metadata

    // Extract docs from index
    // `docs` is an obj: { [id: number]: { title, breadcrumbs, body, ... } }
    const docs = index.documentStore.docs;

    // Init fzf
    // The first argument is the list to search. (I have tried using Object.entries, but I failed to get it to work)
    // The second argument is the option. `selector` tells fzf what the real content to search.
    const fzf = new Fzf(Object.keys(docs), {
        selector: (id) => {

            // These lines concatenate title, breadcrumbs and body to let fzf search them at once.
            const doc = docs[id];
            return `${doc.title} ${doc.breadcrumbs} ${doc.body}`
        }
    });

    // To be compatible with the original elasticlunr's usage,
    // this obj is needed to provide a `search` method.
    return {

        // `search` takes two arguments, the `term` to search and `options` for elasticlunr
        search: (term, _options) => {

            // We use fzf to search the term
            const entries = fzf.find(term);

            // Then we form the result
            // `doc` and `ref` are used to make the teaser and the link
            const res = entries.map((entry) => {
                const { item, _score } = entry;
                return {
                    doc: docs[item],
                    ref: item,
                }
            });
            return res;
        }
    }
}

CoralPink · 2024-01-25T08:18:16Z

No. There are not many options for fzf's searching function. I think the way to calculate the score is taken from the original Go implementation, and the API doesn't provide a way to control it. API doc

Ugh, I knew it!
If I wanted to use the score boost, I would have to change the order myself 😱

... On the other hand, you mean that we might be able to do it if we really wanted to.
(I don't know if it's worth the hassle 😮)

Thanks for chatting with us!
I'm getting a little curious about search logic!

Ref rust-lang/mdBook#2052

madjxatw · 2024-02-06T04:47:34Z

It works but with a little flaw that the matched words are not bolded/highlighted.

HillLiu added the C-enhancement Category: Enhancement or feature request label Mar 28, 2023

CoralPink added a commit to CoralPink/commentary that referenced this issue Mar 28, 2023

I hear it can handle Japanese searches, so I'll give it a try 🦖

8e94446

refs: Support CJK (mutiple language) search #2052 rust-lang/mdBook#2052

CoralPink added a commit to CoralPink/commentary that referenced this issue Mar 28, 2023

I hear it can handle Japanese searches, so I'll give it a try 🦖

f52a941

refs: Support CJK (mutiple language) search #2052 rust-lang/mdBook#2052

wc7086 mentioned this issue May 14, 2023

添加中文搜索功能 sunface/rust-course#86

Open

makenowjust added a commit to makenowjust/kantan-regex-book that referenced this issue Jan 29, 2024

Enable CJK search

3a3ed54

Ref rust-lang/mdBook#2052

makenowjust mentioned this issue Jan 29, 2024

Enable CJK search makenowjust/kantan-regex-book#6

Merged

ehuss added the A-Search Area: Search label Feb 13, 2024

duskmoon314 mentioned this issue May 22, 2024

Non-English search support #2393

Open

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support CJK (mutiple language) search #2052

Support CJK (mutiple language) search #2052

HillLiu commented Mar 28, 2023 •

edited

Loading

CoralPink commented Mar 28, 2023

wc7086 commented May 14, 2023 •

edited

Loading

duskmoon314 commented Jan 24, 2024

CoralPink commented Jan 24, 2024

duskmoon314 commented Jan 25, 2024

CoralPink commented Jan 25, 2024

duskmoon314 commented Jan 25, 2024

CoralPink commented Jan 25, 2024

madjxatw commented Feb 6, 2024

Support CJK (mutiple language) search #2052

Support CJK (mutiple language) search #2052

Comments

HillLiu commented Mar 28, 2023 • edited Loading

Problem

Proposed Solution

Just need add two extra JS.

Add them to book.toml additional-js

Play it with docker directly.

Notes

CoralPink commented Mar 28, 2023

wc7086 commented May 14, 2023 • edited Loading

duskmoon314 commented Jan 24, 2024

CoralPink commented Jan 24, 2024

duskmoon314 commented Jan 25, 2024

CoralPink commented Jan 25, 2024

duskmoon314 commented Jan 25, 2024

CoralPink commented Jan 25, 2024

madjxatw commented Feb 6, 2024

HillLiu commented Mar 28, 2023 •

edited

Loading

wc7086 commented May 14, 2023 •

edited

Loading