Skip to content
This repository has been archived by the owner on Aug 2, 2022. It is now read-only.

[On Hold] Add bit hamming #284

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

luyuncheng
Copy link

*Issue #283,

Description of changes:

As I see #264 that add Hamming distance in custom scoring it is a great functionality. i see there is bit_hamming space space_bit_hamming in nmslib. i think we can add this into plugin.

i refer to the code space_bit_hamming and space_bit_hamming_test, may be we could add "SpaceBitVector" into plugin and support bit_hamming space which is no optimized index.

i also refer to PR: #161 which add no optimized index for "negdotprod", i see the nmslib's python_binding code python_binding_nmslib, may be we could add a "save_data" into plugin and can store index and dataset for "no optimized index".

so, i submit this PR.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@codecov
Copy link

codecov bot commented Dec 18, 2020

Codecov Report

Merging #284 (c2abcb2) into master (8c802d9) will increase coverage by 1.62%.
The diff coverage is 85.07%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master     #284      +/-   ##
============================================
+ Coverage     79.27%   80.89%   +1.62%     
- Complexity      359      413      +54     
============================================
  Files            58       62       +4     
  Lines          1404     1581     +177     
  Branches        126      147      +21     
============================================
+ Hits           1113     1279     +166     
- Misses          243      249       +6     
- Partials         48       53       +5     
Impacted Files Coverage Δ Complexity Δ
...istroforelasticsearch/knn/index/v206/KNNIndex.java 74.71% <75.00%> (-2.04%) 18.00 <7.00> (+10.00) ⬇️
...index/codec/KNN80Codec/KNN80DocValuesConsumer.java 75.00% <83.33%> (+4.11%) 18.00 <0.00> (+3.00)
...relasticsearch/knn/index/KNNVectorFieldMapper.java 79.22% <86.66%> (+0.27%) 17.00 <0.00> (+1.00)
...nn/index/codec/KNN80Codec/KNN80CompoundFormat.java 96.15% <90.00%> (-3.85%) 4.00 <1.00> (+1.00) ⬇️
...forelasticsearch/knn/index/codec/KNNCodecUtil.java 87.50% <90.90%> (+4.16%) 5.00 <3.00> (+3.00)
...opendistroforelasticsearch/knn/index/KNNQuery.java 81.81% <100.00%> (+9.59%) 10.00 <2.00> (+3.00)
...pendistroforelasticsearch/knn/index/KNNWeight.java 97.67% <100.00%> (+0.53%) 13.00 <3.00> (+3.00)
...endistroforelasticsearch/knn/index/SpaceTypes.java 100.00% <100.00%> (ø) 7.00 <2.00> (+2.00)
...earch/knn/plugin/script/KNNWhitelistExtension.java 100.00% <0.00%> (ø) 3.00% <0.00%> (?%)
... and 7 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8c802d9...62455ed. Read the comment docs.

@jmazanec15
Copy link
Member

Hi @luyuncheng,

Thanks for the PR! I will take a look some time early next week. My concern about the "non-optimized" indices is that we will end up storing two different files for the HNSW structure in the Lucene index. I need to think about the implications of doing this some more.

@luyuncheng
Copy link
Author

Hi @luyuncheng,

Thanks for the PR! I will take a look some time early next week. My concern about the "non-optimized" indices is that we will end up storing two different files for the HNSW structure in the Lucene index. I need to think about the implications of doing this some more.

#161 in this PR, it Write footer for .dat file. I reuse this as "non-optimized" data storage.

As i checked the nmslib's code, When we add "non-optimized" data storage, K-NN plugin can support all nmslib's Space Types

@jmazanec15
Copy link
Member

@luyuncheng Apologies, have not gotten a chance to look at this yet. Will review in January.

luyuncheng and others added 2 commits January 6, 2021 21:40
@luyuncheng
Copy link
Author

luyuncheng commented Jan 6, 2021

Previously, this PR using string to parse the bit vector, this may occupy too much memory!
As i checked the code: space_bit_hamming.h in nmslib, i think use a int32 array to represent a bit vector is better. so, add modified some code to use vector as bit vector

@vamshin vamshin changed the title Add bit hamming [On Hold] Add bit hamming Jan 6, 2021
@vamshin
Copy link
Member

vamshin commented Jan 7, 2021

This Pr is put on hold. Please refer #283 (comment)

luyuncheng and others added 2 commits January 11, 2021 18:07
…us field create.

2. Free memory data in destructor
1. FIXED SpaceType Get From IndexSettings may be EMPTY when a anonymo…
Base automatically changed from master to main February 12, 2021 18:16
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants