Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Buffers, IP, Factories & more #40

Merged
merged 16 commits into from
Sep 20, 2023
Merged

Conversation

asilvas
Copy link
Contributor

@asilvas asilvas commented Sep 5, 2023

Resolves #37 and much more:

  • Considerable changes to accommodate new features and future growth (which wouldn't have made sense had I not added new features in same PR)
  • Support for IndexFlatIP
  • Support for serialization via toBuffer & fromBuffer (10x+ faster than re-adding vectors, but still somehow 2x slower than reading from disk)
  • Support for factory indexes to unlock most of faiss potential
  • Support for training

Sorry for the big sweeping changes. I don't like to put this much into 1 PR, but due to considerable complications with NAPI they were necessary.

@ewfian
Copy link
Owner

ewfian commented Sep 5, 2023

@asilvas Thanks a lot for this great PR, please allow me some time to review.

src/faiss.cc Outdated
{
Napi::Env env = info.Env();
if (info[0].IsExternal())

if (!info.IsConstructCall())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

greatly simplified the constructors responsibilities and removed the need to pass around external data. This moves the responsibility of each feature to different functions.

@asilvas
Copy link
Contributor Author

asilvas commented Sep 9, 2023

Let me know if there's anything I can do to progress this PR. Thanks

@ewfian
Copy link
Owner

ewfian commented Sep 11, 2023

@asilvas Sorry for slow reply. I have been occupied with work recently. I'll deal with it later this week.

});

it('Flat /w IP', () => {
const index = Index.fromFactory(2, 'Flat', 0 /* METRIC_INNER_PRODUCT */);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add MetricType enum into lib/index.js, so that we can use enum directly

const faiss = require('bindings')('faiss-node');

faiss.MetricType = void 0;
var MetricType;
(function (MetricType) {
    MetricType[MetricType["METRIC_INNER_PRODUCT"] = 0] = "METRIC_INNER_PRODUCT";
    MetricType[MetricType["METRIC_L2"] = 1] = "METRIC_L2";
    MetricType[MetricType["METRIC_L1"] = 2] = "METRIC_L1";
    MetricType[MetricType["METRIC_Linf"] = 3] = "METRIC_Linf";
    MetricType[MetricType["METRIC_Lp"] = 4] = "METRIC_Lp";
    MetricType[MetricType["METRIC_Canberra"] = 20] = "METRIC_Canberra";
    MetricType[MetricType["METRIC_BrayCurtis"] = 21] = "METRIC_BrayCurtis";
    MetricType[MetricType["METRIC_JensenShannon"] = 22] = "METRIC_JensenShannon";
    MetricType[MetricType["METRIC_Jaccard"] = 23] = "METRIC_Jaccard";
})(MetricType || (faiss.MetricType = MetricType = {}));

module.exports = faiss;

src/faiss.cc Outdated
std::string fname;
if (info[0].IsExternal())
{
fname = *info[0].As<Napi::External<std::string>>().Data();
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that this branch will not reach. maybe we can get fname directly:
std::string fname = info[0].As<Napi::String>().Utf8Value();

src/faiss.cc Outdated
std::string description;
if (info[1].IsExternal())
{
description = *info[1].As<Napi::External<std::string>>().Data();
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as previous comment

src/faiss.cc Outdated
static Napi::Object NewInstance(Napi::Env env, const std::vector<napi_value> &args)
{
Napi::EscapableHandleScope scope(env);
Napi::Object obj = GetInstanceData(env, T::CLASS_NAME).New(args);
Copy link
Owner

@ewfian ewfian Sep 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found before in this experimental branch that it is simpler to store the pointer to the constructor using the following way:
https://github.com/ewfian/faiss-node/blob/feat/support-IndexFlatIP/src/faiss.cc#L71

inline static Napi::FunctionReference *constructor = new Napi::FunctionReference();

This eliminates the need to use GetInstanceData and AttachInstanceData

Please confirm if you can modify it as above?

Copy link
Contributor Author

@asilvas asilvas Sep 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My original changes had used that pattern but was having stability issues with the Bun runtime so I opted for the more complex pattern. It resolved one of the bugs, but still has stability issues, so I'll revert for now until Bun is more reliable with NAPI packages.

src/faiss.cc Outdated

static Napi::Object Init(Napi::Env env, Napi::Object exports)
{
Napi::HandleScope scope(env);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that HandleScope is no longer needed.

});
// clang-format on

constructor = new Napi::FunctionReference();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not as clean as constructing within Base, but that was causing stability issues. I don't fully understand the difference, but this works around whatever was going on.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What kind of stability issues is it and how to reproduce it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bun runtime (https://bun.sh/) segfaults via bun test.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bun test still failed in test/IndexFlatL2.test.js
image

after remove this case, it works
image

my test is that the following code will never be reached.
image

and the Error was thrown by napi internally
image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I'm fine merging with that issue for now since I created the issue on the Bun side. oven-sh/bun#4526

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you, i will remove this test case and the following useless code

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems that this case still failed on bun test

image

Copy link
Contributor Author

@asilvas asilvas Sep 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes all tests which rely on throwing won't work in Bun until the issue I linked is resolved. That's fine, and not this packages problem. Technically we can work around it by replacing all the exception handling to use native exceptions, but I can throw up another PR later.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Invalid argument is not thrown by our code and it was thrown by napi, maybe the bun was not well handle this case

@ewfian
Copy link
Owner

ewfian commented Sep 19, 2023

Another point that I'm a little concerned about is that Index is an abstract class in faiss. It has only a few methods, but we warped many things that are not originally on Index.
maybe we should have a IndexFlat class and a pure Index class?

btw, I'm sorry that it's time for me to go to bed. i will reply tomorrow.

@asilvas
Copy link
Contributor Author

asilvas commented Sep 19, 2023

Another point that I'm a little concerned about is that Index is an abstract class in faiss. It has only a few methods, but we warped many things that are not originally on Index. maybe we should have a IndexFlat class and a pure Index class?

btw, I'm sorry that it's time for me to go to bed. i will reply tomorrow.

Index is abstract, and any index you've created that does not implement a function (ala add_with_ids) will throw. The behavior is highly consistent with the native implementation. All faiss-node needs to be is a wrapper -- let native lib throw when used incorrectly.

@ewfian
Copy link
Owner

ewfian commented Sep 20, 2023

Index is abstract, and any index you've created that does not implement a function (ala add_with_ids) will throw. The behavior is highly consistent with the native implementation. All faiss-node needs to be is a wrapper -- let native lib throw when used incorrectly.

I reconfirmed and it's as you said. But I will remove fromFactory from IndexFlatIP and IndexFlatL2

@ewfian ewfian merged commit 0171f4f into ewfian:main Sep 20, 2023
36 checks passed
@asilvas asilvas deleted the buffers-base-factory branch September 20, 2023 12:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ability to encode/decode buffers when FS not available?
2 participants