Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add compile time feature-gate for lookups and removals #170

Closed

Conversation

Ngalstyan4
Copy link
Contributor

I am not sure you guys want this in upstream usearch. But, we use this for the following:

For our use-case, we only need vector insertions and search.
Currently, usearch always builds a lookup table for vector labels at index creation time (e.g. here) to support contains, get, remove queries. I think there are many use-cases that do not need these (including our use-case) so it would be good to not pay the cpu and memory cost of this lookup table creation.

This PR feature-gates the index creation functions and all the methods depending on them so one can easily disable them and remove their runtime cost.

Perhaps the one-time lookup table creation cost and the small maintenance cost is trivial for most use-cases. But, for our case, we need to re-initialize the usearch context for each search query so it is crucial to minimize its memory footprint. This is likely not the typical case for usearch, but this is one case to consider.

I considered leaving the code in and just adding a runtime flag to enable lookups. But preferred this approach because with this the compiler and the IDE did more to help ensure available APIs are used.

@@ -609,12 +638,14 @@ class index_punned_dense_gt {
auto typed_result = typed_->copy(config);
if (!typed_result)
return result.failed(std::move(typed_result.error));
#if USEARCH_LOOKUP_LABEL
if (!result.index.free_ids_.reserve(free_ids_.size()))
return result.failed(std::move(typed_result.error));
for (std::size_t i = 0; i != free_ids_.size(); ++i)
result.index.free_ids_.push(free_ids_[i]);

result.index.labeled_lookup_ = labeled_lookup_;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where would copy be used?
Can the old index be used while/after copying?
if so, should this not keep track of the old lookup table lock as well?

@ashvardanian ashvardanian changed the base branch from main to main-dev July 31, 2023 08:42
@ashvardanian
Copy link
Contributor

Thanks, @Ngalstyan4! It’s a valid suggestion. We can also solve it differently, by adding a single template argument. Will discuss that today ;)

@ashvardanian
Copy link
Contributor

@Ngalstyan4, just to clarify, memory savings from removing the lookup table are more important than the ability to remove entries?

@Ngalstyan4
Copy link
Contributor Author

Right

ashvardanian pushed a commit that referenced this pull request Aug 5, 2023
# [0.23.0](v0.22.3...v0.23.0) (2023-08-05)

### Add

* `Matches` and `BatchMatches` simple API ([1b40f13](1b40f13))
* Add node offsets in a serialized file ([c600ffd](c600ffd))
* Batch add ([74860d6](74860d6))
* Batch add test ([5f99b05](5f99b05))
* Changing the metric at runtime ([d7bfac7](d7bfac7))
* Compactions ([434c1da](434c1da))
* efficiency estimate in `recall_members` ([64a60b4](64a60b4))
* Exact search shortcut ([a005084](a005084)), closes [#176](#176)
* Multi-`Index` lookups ([c5b7ccd](c5b7ccd))
* Parallel View ([ed3f845](ed3f845))
* Prefetching functionality for external memory ([b544ddb](b544ddb)), closes [#170](#170) [#171](#171)
* Streaming and in-memory serialization in C++ ([7da44a2](7da44a2))
* Vector alignment ([ea230e0](ea230e0))

### Break

* Final patches for 1.0 release ([8d557e2](8d557e2))

### Docs

* add descriptions of match-related classes ([637e5ef](637e5ef))
* Annotating C 99 and GoLang interfaces ([4b910a8](4b910a8))
* Documenting Python tests ([1f89e0a](1f89e0a))
* Shorten name ([9a6a01c](9a6a01c))
* Spelling and details ([6f25ed9](6f25ed9))
* Spelling and links ([20566e0](20566e0))
* TypeScript docs factual errors ([fe8103c](fe8103c))
* Update benchmarking sections ([96baa09](96baa09))

### Fix

* `reset` and serialization code ([11d7844](11d7844))
* Avoid exception in `.xbin` file is missing ([4863bea](4863bea))
* Avoid spawning needless threads ([9dff0fb](9dff0fb))
* Concurrent file access issues in tests ([5ae6db1](5ae6db1))
* Dead-lock on post-removal insertions ([284b058](284b058)), closes [#175](#175)
* Excpetion handling for `index_dense_metadata` ([d9627ba](d9627ba))
* Heap overflow for fractional-size scalars ([459abcd](459abcd))
* Imports in Python benchmarks ([cffe507](cffe507))
* Inferring OS-dependant file path in Python ([7743709](7743709)), closes [#174](#174)
* JavaScript bindings ([ee04856](ee04856))
* JS keys should be `bigint` ([e1fbec4](e1fbec4)), closes [#178](#178)
* Memory leak and multi-index lookup overflow ([597b0d5](597b0d5))
* Narrowing conversions for WASM 32-bit builds ([79add97](79add97))
* Portable way of matching 32-bit builds ([604e634](604e634))
* Progress reporting issue ([b2565e5](b2565e5))
* Reclaiming file descriptor ([05e908f](05e908f))
* Report error if `reserve` hasn't been called ([f94f358](f94f358))
* Typo in metric name ([34f5530](34f5530))
* Undefined behaviour on duplicate labels ([c04a5cc](c04a5cc))

### Improve

* `usearch_remove` C99 interface ([2072540](2072540))
* Align allocations to page size by default ([134a6f0](134a6f0))
* Broader types support in `usearch.io` ([b1a1439](b1a1439))
* Exposing search stats to users ([2779ffc](2779ffc))
* Feature-complete GoLang bindings ([e2058d1](e2058d1))
* More flexibility for Python args ([6aa06cb](6aa06cb))
* Out-of-bounds checks ([54cecb6](54cecb6))
* Task scheduling with STL threads ([9131287](9131287))

### Make

* Add CMake for C builds ([4d2127b](4d2127b))
* All targets enabled for debugging ([ea0f835](ea0f835))
* Build only WASM tests ([372738b](372738b))
* Typescript ([dacfbed](dacfbed))
* Upgrade to the newest SimSIMD ([368d853](368d853))

### Refactor

* `label_t` to `key_t` ([0d6c800](0d6c800))
* Add ([5d62180](5d62180))
* Index serialization in a file ([ba72585](ba72585))
* JS and GoLang tests ([a45fc40](a45fc40))
* Keep only batch requests in CPython ([44c0318](44c0318))
* Rename `f8` to `i8` to match IEEE ([c37f80b](c37f80b))
* Revert `Matches` ([5731e70](5731e70))
* Splitting proximity-graphs and vectors ([e996b38](e996b38))
* Use Executor instead of std::thread ([c3a3693](c3a3693))
* Vector alignment issue ([b02d0ad](b02d0ad))

### Test

* Set vector alignment ([0acb54a](0acb54a))
* Wrong buffer size caused illegal access ([830e280](830e280))
@ashvardanian ashvardanian force-pushed the main-dev branch 5 times, most recently from 9163d79 to bc26b4e Compare August 10, 2023 11:10
@ashvardanian ashvardanian force-pushed the main-dev branch 5 times, most recently from 871b788 to 6461df2 Compare September 30, 2023 21:39
@ashvardanian ashvardanian force-pushed the main-dev branch 10 times, most recently from 06c4d75 to 74e98fa Compare October 24, 2023 21:26
@ashvardanian ashvardanian force-pushed the main-dev branch 2 times, most recently from d79b296 to 5ff8e83 Compare October 24, 2023 21:39
@ashvardanian ashvardanian force-pushed the main-dev branch 6 times, most recently from ff5cf1d to 7d17eab Compare November 13, 2023 20:35
@Ngalstyan4 Ngalstyan4 closed this Mar 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants