Add compile time feature-gate for lookups and removals #170

Ngalstyan4 · 2023-07-31T08:34:37Z

I am not sure you guys want this in upstream usearch. But, we use this for the following:

For our use-case, we only need vector insertions and search.
Currently, usearch always builds a lookup table for vector labels at index creation time (e.g. here) to support contains, get, remove queries. I think there are many use-cases that do not need these (including our use-case) so it would be good to not pay the cpu and memory cost of this lookup table creation.

This PR feature-gates the index creation functions and all the methods depending on them so one can easily disable them and remove their runtime cost.

Perhaps the one-time lookup table creation cost and the small maintenance cost is trivial for most use-cases. But, for our case, we need to re-initialize the usearch context for each search query so it is crucial to minimize its memory footprint. This is likely not the typical case for usearch, but this is one case to consider.

I considered leaving the code in and just adding a runtime flag to enable lookups. But preferred this approach because with this the compiler and the IDE did more to help ensure available APIs are used.

Ngalstyan4 · 2023-07-31T08:36:05Z

include/usearch/index_punned_dense.hpp

@@ -609,12 +638,14 @@ class index_punned_dense_gt {
        auto typed_result = typed_->copy(config);
        if (!typed_result)
            return result.failed(std::move(typed_result.error));
+#if USEARCH_LOOKUP_LABEL
        if (!result.index.free_ids_.reserve(free_ids_.size()))
            return result.failed(std::move(typed_result.error));
        for (std::size_t i = 0; i != free_ids_.size(); ++i)
            result.index.free_ids_.push(free_ids_[i]);

        result.index.labeled_lookup_ = labeled_lookup_;


where would copy be used?
Can the old index be used while/after copying?
if so, should this not keep track of the old lookup table lock as well?

ashvardanian · 2023-07-31T08:46:25Z

Thanks, @Ngalstyan4! It’s a valid suggestion. We can also solve it differently, by adding a single template argument. Will discuss that today ;)

See also: unum-cloud#170, unum-cloud#171

ashvardanian · 2023-08-04T15:03:12Z

@Ngalstyan4, just to clarify, memory savings from removing the lookup table are more important than the ability to remove entries?

Ngalstyan4 · 2023-08-04T15:49:40Z

Right

# [0.23.0](v0.22.3...v0.23.0) (2023-08-05) ### Add * `Matches` and `BatchMatches` simple API ([1b40f13](1b40f13)) * Add node offsets in a serialized file ([c600ffd](c600ffd)) * Batch add ([74860d6](74860d6)) * Batch add test ([5f99b05](5f99b05)) * Changing the metric at runtime ([d7bfac7](d7bfac7)) * Compactions ([434c1da](434c1da)) * efficiency estimate in `recall_members` ([64a60b4](64a60b4)) * Exact search shortcut ([a005084](a005084)), closes [#176](#176) * Multi-`Index` lookups ([c5b7ccd](c5b7ccd)) * Parallel View ([ed3f845](ed3f845)) * Prefetching functionality for external memory ([b544ddb](b544ddb)), closes [#170](#170) [#171](#171) * Streaming and in-memory serialization in C++ ([7da44a2](7da44a2)) * Vector alignment ([ea230e0](ea230e0)) ### Break * Final patches for 1.0 release ([8d557e2](8d557e2)) ### Docs * add descriptions of match-related classes ([637e5ef](637e5ef)) * Annotating C 99 and GoLang interfaces ([4b910a8](4b910a8)) * Documenting Python tests ([1f89e0a](1f89e0a)) * Shorten name ([9a6a01c](9a6a01c)) * Spelling and details ([6f25ed9](6f25ed9)) * Spelling and links ([20566e0](20566e0)) * TypeScript docs factual errors ([fe8103c](fe8103c)) * Update benchmarking sections ([96baa09](96baa09)) ### Fix * `reset` and serialization code ([11d7844](11d7844)) * Avoid exception in `.xbin` file is missing ([4863bea](4863bea)) * Avoid spawning needless threads ([9dff0fb](9dff0fb)) * Concurrent file access issues in tests ([5ae6db1](5ae6db1)) * Dead-lock on post-removal insertions ([284b058](284b058)), closes [#175](#175) * Excpetion handling for `index_dense_metadata` ([d9627ba](d9627ba)) * Heap overflow for fractional-size scalars ([459abcd](459abcd)) * Imports in Python benchmarks ([cffe507](cffe507)) * Inferring OS-dependant file path in Python ([7743709](7743709)), closes [#174](#174) * JavaScript bindings ([ee04856](ee04856)) * JS keys should be `bigint` ([e1fbec4](e1fbec4)), closes [#178](#178) * Memory leak and multi-index lookup overflow ([597b0d5](597b0d5)) * Narrowing conversions for WASM 32-bit builds ([79add97](79add97)) * Portable way of matching 32-bit builds ([604e634](604e634)) * Progress reporting issue ([b2565e5](b2565e5)) * Reclaiming file descriptor ([05e908f](05e908f)) * Report error if `reserve` hasn't been called ([f94f358](f94f358)) * Typo in metric name ([34f5530](34f5530)) * Undefined behaviour on duplicate labels ([c04a5cc](c04a5cc)) ### Improve * `usearch_remove` C99 interface ([2072540](2072540)) * Align allocations to page size by default ([134a6f0](134a6f0)) * Broader types support in `usearch.io` ([b1a1439](b1a1439)) * Exposing search stats to users ([2779ffc](2779ffc)) * Feature-complete GoLang bindings ([e2058d1](e2058d1)) * More flexibility for Python args ([6aa06cb](6aa06cb)) * Out-of-bounds checks ([54cecb6](54cecb6)) * Task scheduling with STL threads ([9131287](9131287)) ### Make * Add CMake for C builds ([4d2127b](4d2127b)) * All targets enabled for debugging ([ea0f835](ea0f835)) * Build only WASM tests ([372738b](372738b)) * Typescript ([dacfbed](dacfbed)) * Upgrade to the newest SimSIMD ([368d853](368d853)) ### Refactor * `label_t` to `key_t` ([0d6c800](0d6c800)) * Add ([5d62180](5d62180)) * Index serialization in a file ([ba72585](ba72585)) * JS and GoLang tests ([a45fc40](a45fc40)) * Keep only batch requests in CPython ([44c0318](44c0318)) * Rename `f8` to `i8` to match IEEE ([c37f80b](c37f80b)) * Revert `Matches` ([5731e70](5731e70)) * Splitting proximity-graphs and vectors ([e996b38](e996b38)) * Use Executor instead of std::thread ([c3a3693](c3a3693)) * Vector alignment issue ([b02d0ad](b02d0ad)) ### Test * Set vector alignment ([0acb54a](0acb54a)) * Wrong buffer size caused illegal access ([830e280](830e280))

Add compile time feature-gate for lookups and removals

95e4cf1

Ngalstyan4 commented Jul 31, 2023

View reviewed changes

ashvardanian changed the base branch from main to main-dev July 31, 2023 08:42

Ngalstyan4 mentioned this pull request Jul 31, 2023

Add external retriever to usearch so vector nodes can be externally stored and managed #171

Closed

3 tasks

ashvardanian added a commit to ashvardanian/usearch that referenced this pull request Aug 4, 2023

Add: Prefetching functionality for external memory

b544ddb

See also: unum-cloud#170, unum-cloud#171

ashvardanian force-pushed the main-dev branch 5 times, most recently from 9163d79 to bc26b4e Compare August 10, 2023 11:10

ashvardanian force-pushed the main-dev branch from d880374 to 2306ff2 Compare September 10, 2023 10:34

ashvardanian force-pushed the main-dev branch 5 times, most recently from 871b788 to 6461df2 Compare September 30, 2023 21:39

ashvardanian force-pushed the main-dev branch 10 times, most recently from 06c4d75 to 74e98fa Compare October 24, 2023 21:26

ashvardanian force-pushed the main-dev branch 2 times, most recently from d79b296 to 5ff8e83 Compare October 24, 2023 21:39

ashvardanian force-pushed the main-dev branch 6 times, most recently from ff5cf1d to 7d17eab Compare November 13, 2023 20:35

Ngalstyan4 closed this Mar 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add compile time feature-gate for lookups and removals #170

Add compile time feature-gate for lookups and removals #170

Ngalstyan4 commented Jul 31, 2023

Ngalstyan4 Jul 31, 2023

ashvardanian commented Jul 31, 2023

ashvardanian commented Aug 4, 2023

Ngalstyan4 commented Aug 4, 2023

Add compile time feature-gate for lookups and removals #170

Add compile time feature-gate for lookups and removals #170

Conversation

Ngalstyan4 commented Jul 31, 2023

Ngalstyan4 Jul 31, 2023

Choose a reason for hiding this comment

ashvardanian commented Jul 31, 2023

ashvardanian commented Aug 4, 2023

Ngalstyan4 commented Aug 4, 2023