Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can you give some guides to use otherTree on Numeric Field? #115

Open
hengfeiyang opened this issue May 6, 2022 · 4 comments
Open

Can you give some guides to use otherTree on Numeric Field? #115

hengfeiyang opened this issue May 6, 2022 · 4 comments

Comments

@hengfeiyang
Copy link

I'm trying to use otherTree(B+Tree / BKD Tree) to replace text token on Numeric and DateTime Fields.

For now in Bluge, every thing is text or token. it seems many works to do.

Can you give some guides for this?

@mschoch
Copy link
Member

mschoch commented May 6, 2022

Sure, so one of the last things I did for the Bleve project before leaving was to prototype something I called "index sections". The code for this is in this commit: blevesearch/bleve@674c535

The basic idea (as I recall, I have not read the details of this commit recently) was to introduce a new concept to the bleve index. Today, for a field there is just a single type of index, the inverted index. Our idea was that there should be support for other indexes, but because of the design of Bleve, we allow people to mix different data types in fields. So, we can't just say field X is type Y and use the appropriate index, because at runtime someone can hand us some document with a different type for that field.

Anyway, the idea was that all of the existing code, folds into an implementation of a "inverted index" type. There are some basic things that all index sections need to support to co-exist with the inverted index, but we also allow for the new index sections to support new access methods.

So, this was my approach to extend the existing structures to support new sections in the index (that use some new data-structures appropriate for certain type of data). Something pretty similar should be possible as code is still roughly organized the same way.

If you have other ideas about how to go about this, please share.

@mschoch
Copy link
Member

mschoch commented May 6, 2022

Also, I just remembered, that commit includes an example second index type, it uses a sorted slice and binary search for numeric values. This was not intended to be shipped, but rather to illustrate how a new index section would work.

@hengfeiyang
Copy link
Author

Thank you, if have any progress i'll let you know.

@hengfeiyang
Copy link
Author

hengfeiyang commented May 8, 2022

I design like this, What do you think? @mschoch

  1. Change the numeric field behavior like keyword field, store a []byte value of numeric, because we still need search the field like field:value.
  2. Add a type attribute on field, when load index, it create a BKDTree on numeric field for range query.
  3. The query like match query_string field:value just through the default reverted index.
  4. When execute range query will through BKDTree.

As these, it can support range query and don't need create many tokens on numeric field.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants