Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support CJK (mutiple language) search #2052

Open
HillLiu opened this issue Mar 28, 2023 · 9 comments
Open

Support CJK (mutiple language) search #2052

HillLiu opened this issue Mar 28, 2023 · 9 comments
Labels
A-Search Area: Search C-enhancement Category: Enhancement or feature request

Comments

@HillLiu
Copy link

HillLiu commented Mar 28, 2023

Problem

Anyone who has a CJK search requirement, There is a quick hack way.

Proposed Solution

Just need add two extra JS.

Add them to book.toml additional-js

Play it with docker directly.

curl https://raw.githubusercontent.com/allfunc/docker-mdbook/main/bin/preview.sh | bash -s -- start

Notes

The trick is to replace Elasticlunr.js with fzf.

@HillLiu HillLiu added the C-enhancement Category: Enhancement or feature request label Mar 28, 2023
CoralPink added a commit to CoralPink/commentary that referenced this issue Mar 28, 2023
CoralPink added a commit to CoralPink/commentary that referenced this issue Mar 28, 2023
@CoralPink
Copy link

This is great! It now responds to Japanese!

(Sorry, I used it without permission...😓)

@wc7086
Copy link

wc7086 commented May 14, 2023

ajitid/fzf-for-js#112

@duskmoon314
Copy link

I just created a modified fork in this way, and it worked well. The modified version and the modified commit.

It is better to adjust the makeTeaser function according to fzf's score if we want to replace all of Elasticlunr's functions. But I still need to dig into it.

@CoralPink
Copy link

This topic is interesting to me as well, and I would like to see it develop!

I have always wanted to know how to reflect search scores.

@duskmoon314
Copy link

I have always wanted to know how to reflect search scores.

Fzf calculates the score according to how the result matches the query and where it occurs. (If my understanding is correct).


I just took a look at makeTeaser, it uses a sliding window to extract the most valuable part which contains the result, showing user where the outcome occurs.

But it depends on two assumptions:

  1. Sentences ended with . (dot + whitespace) src
  2. Words can be split with (whitespace) src

As far as I know, these two assumptions are incorrect in Chinese and Japanese. Though it can be used with Chinese and cause a long context with incorrectly emphasized phrases.

To my understanding, we must figure out how makeTeaser should work in different languages to push this issue further.

@CoralPink
Copy link

I see!
To be honest, I was using this code without really understanding it😅

By the way, this is a bit off topic, but .......

https://rust-lang.github.io/mdBook/format/configuration/renderers.html?highlight=score#outputhtmlsearch

I think this is an elasticlunr option that can be changed by the user through book.toml, but is the score boost applied the same way in fzf?

@duskmoon314
Copy link

I think this is an elasticlunr option that can be changed by the user through book.toml, but is the score boost applied the same way in fzf?

No. There are not many options for fzf's searching function. I think the way to calculate the score is taken from the original Go implementation, and the API doesn't provide a way to control it. API doc

To be honest, I was using this code without really understanding it

My understanding:

function fzfLoad(index) {
    // The argument `index` is generated by crate elasticlunr-rs
    // It contains all pages with their title, breadcrumbs, contents, and some metadata

    // Extract docs from index
    // `docs` is an obj: { [id: number]: { title, breadcrumbs, body, ... } }
    const docs = index.documentStore.docs;

    // Init fzf
    // The first argument is the list to search. (I have tried using Object.entries, but I failed to get it to work)
    // The second argument is the option. `selector` tells fzf what the real content to search.
    const fzf = new Fzf(Object.keys(docs), {
        selector: (id) => {

            // These lines concatenate title, breadcrumbs and body to let fzf search them at once.
            const doc = docs[id];
            return `${doc.title} ${doc.breadcrumbs} ${doc.body}`
        }
    });

    // To be compatible with the original elasticlunr's usage,
    // this obj is needed to provide a `search` method.
    return {

        // `search` takes two arguments, the `term` to search and `options` for elasticlunr
        search: (term, _options) => {

            // We use fzf to search the term
            const entries = fzf.find(term);

            // Then we form the result
            // `doc` and `ref` are used to make the teaser and the link
            const res = entries.map((entry) => {
                const { item, _score } = entry;
                return {
                    doc: docs[item],
                    ref: item,
                }
            });
            return res;
        }
    }
}

@CoralPink
Copy link

No. There are not many options for fzf's searching function. I think the way to calculate the score is taken from the original Go implementation, and the API doesn't provide a way to control it. API doc

Ugh, I knew it!
If I wanted to use the score boost, I would have to change the order myself 😱

... On the other hand, you mean that we might be able to do it if we really wanted to.
(I don't know if it's worth the hassle 😮)

Thanks for chatting with us!
I'm getting a little curious about search logic!

makenowjust added a commit to makenowjust/kantan-regex-book that referenced this issue Jan 29, 2024
@madjxatw
Copy link

madjxatw commented Feb 6, 2024

It works but with a little flaw that the matched words are not bolded/highlighted.

@ehuss ehuss added the A-Search Area: Search label Feb 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Search Area: Search C-enhancement Category: Enhancement or feature request
Projects
None yet
Development

No branches or pull requests

6 participants