-
Notifications
You must be signed in to change notification settings - Fork 150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add compile time feature-gate for lookups and removals #170
Add compile time feature-gate for lookups and removals #170
Conversation
@@ -609,12 +638,14 @@ class index_punned_dense_gt { | |||
auto typed_result = typed_->copy(config); | |||
if (!typed_result) | |||
return result.failed(std::move(typed_result.error)); | |||
#if USEARCH_LOOKUP_LABEL | |||
if (!result.index.free_ids_.reserve(free_ids_.size())) | |||
return result.failed(std::move(typed_result.error)); | |||
for (std::size_t i = 0; i != free_ids_.size(); ++i) | |||
result.index.free_ids_.push(free_ids_[i]); | |||
|
|||
result.index.labeled_lookup_ = labeled_lookup_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where would copy be used?
Can the old index be used while/after copying?
if so, should this not keep track of the old lookup table lock as well?
Thanks, @Ngalstyan4! It’s a valid suggestion. We can also solve it differently, by adding a single template argument. Will discuss that today ;) |
@Ngalstyan4, just to clarify, memory savings from removing the lookup table are more important than the ability to remove entries? |
Right |
# [0.23.0](v0.22.3...v0.23.0) (2023-08-05) ### Add * `Matches` and `BatchMatches` simple API ([1b40f13](1b40f13)) * Add node offsets in a serialized file ([c600ffd](c600ffd)) * Batch add ([74860d6](74860d6)) * Batch add test ([5f99b05](5f99b05)) * Changing the metric at runtime ([d7bfac7](d7bfac7)) * Compactions ([434c1da](434c1da)) * efficiency estimate in `recall_members` ([64a60b4](64a60b4)) * Exact search shortcut ([a005084](a005084)), closes [#176](#176) * Multi-`Index` lookups ([c5b7ccd](c5b7ccd)) * Parallel View ([ed3f845](ed3f845)) * Prefetching functionality for external memory ([b544ddb](b544ddb)), closes [#170](#170) [#171](#171) * Streaming and in-memory serialization in C++ ([7da44a2](7da44a2)) * Vector alignment ([ea230e0](ea230e0)) ### Break * Final patches for 1.0 release ([8d557e2](8d557e2)) ### Docs * add descriptions of match-related classes ([637e5ef](637e5ef)) * Annotating C 99 and GoLang interfaces ([4b910a8](4b910a8)) * Documenting Python tests ([1f89e0a](1f89e0a)) * Shorten name ([9a6a01c](9a6a01c)) * Spelling and details ([6f25ed9](6f25ed9)) * Spelling and links ([20566e0](20566e0)) * TypeScript docs factual errors ([fe8103c](fe8103c)) * Update benchmarking sections ([96baa09](96baa09)) ### Fix * `reset` and serialization code ([11d7844](11d7844)) * Avoid exception in `.xbin` file is missing ([4863bea](4863bea)) * Avoid spawning needless threads ([9dff0fb](9dff0fb)) * Concurrent file access issues in tests ([5ae6db1](5ae6db1)) * Dead-lock on post-removal insertions ([284b058](284b058)), closes [#175](#175) * Excpetion handling for `index_dense_metadata` ([d9627ba](d9627ba)) * Heap overflow for fractional-size scalars ([459abcd](459abcd)) * Imports in Python benchmarks ([cffe507](cffe507)) * Inferring OS-dependant file path in Python ([7743709](7743709)), closes [#174](#174) * JavaScript bindings ([ee04856](ee04856)) * JS keys should be `bigint` ([e1fbec4](e1fbec4)), closes [#178](#178) * Memory leak and multi-index lookup overflow ([597b0d5](597b0d5)) * Narrowing conversions for WASM 32-bit builds ([79add97](79add97)) * Portable way of matching 32-bit builds ([604e634](604e634)) * Progress reporting issue ([b2565e5](b2565e5)) * Reclaiming file descriptor ([05e908f](05e908f)) * Report error if `reserve` hasn't been called ([f94f358](f94f358)) * Typo in metric name ([34f5530](34f5530)) * Undefined behaviour on duplicate labels ([c04a5cc](c04a5cc)) ### Improve * `usearch_remove` C99 interface ([2072540](2072540)) * Align allocations to page size by default ([134a6f0](134a6f0)) * Broader types support in `usearch.io` ([b1a1439](b1a1439)) * Exposing search stats to users ([2779ffc](2779ffc)) * Feature-complete GoLang bindings ([e2058d1](e2058d1)) * More flexibility for Python args ([6aa06cb](6aa06cb)) * Out-of-bounds checks ([54cecb6](54cecb6)) * Task scheduling with STL threads ([9131287](9131287)) ### Make * Add CMake for C builds ([4d2127b](4d2127b)) * All targets enabled for debugging ([ea0f835](ea0f835)) * Build only WASM tests ([372738b](372738b)) * Typescript ([dacfbed](dacfbed)) * Upgrade to the newest SimSIMD ([368d853](368d853)) ### Refactor * `label_t` to `key_t` ([0d6c800](0d6c800)) * Add ([5d62180](5d62180)) * Index serialization in a file ([ba72585](ba72585)) * JS and GoLang tests ([a45fc40](a45fc40)) * Keep only batch requests in CPython ([44c0318](44c0318)) * Rename `f8` to `i8` to match IEEE ([c37f80b](c37f80b)) * Revert `Matches` ([5731e70](5731e70)) * Splitting proximity-graphs and vectors ([e996b38](e996b38)) * Use Executor instead of std::thread ([c3a3693](c3a3693)) * Vector alignment issue ([b02d0ad](b02d0ad)) ### Test * Set vector alignment ([0acb54a](0acb54a)) * Wrong buffer size caused illegal access ([830e280](830e280))
9163d79
to
bc26b4e
Compare
d880374
to
2306ff2
Compare
871b788
to
6461df2
Compare
06c4d75
to
74e98fa
Compare
d79b296
to
5ff8e83
Compare
ff5cf1d
to
7d17eab
Compare
I am not sure you guys want this in upstream usearch. But, we use this for the following:
For our use-case, we only need vector insertions and search.
Currently, usearch always builds a lookup table for vector labels at index creation time (e.g. here) to support
contains, get, remove
queries. I think there are many use-cases that do not need these (including our use-case) so it would be good to not pay the cpu and memory cost of this lookup table creation.This PR feature-gates the index creation functions and all the methods depending on them so one can easily disable them and remove their runtime cost.
Perhaps the one-time lookup table creation cost and the small maintenance cost is trivial for most use-cases. But, for our case, we need to re-initialize the usearch context for each search query so it is crucial to minimize its memory footprint. This is likely not the typical case for usearch, but this is one case to consider.
I considered leaving the code in and just adding a runtime flag to enable lookups. But preferred this approach because with this the compiler and the IDE did more to help ensure available APIs are used.