Is ArrayLake a VectorDB? #6

alxmrs · 2024-01-14T04:42:26Z

Not that EarthMover would necessarily want to be known as yet another startup within this very competitive space. However, I think it would be cool to see a comparison between other Vector DBs and what a managed Zarr dataset could do. It seems like it would be easy to put a proof of concept together with faiss or annoy.

I think approximate similarity search algorithms could be interesting for scientific use cases (can it provide better lookups than metadata based search?). Further, I like that ArrayLake + Zarr address the Cloud and State Management shaped problems while stepping aside so the ML practitioner can choose their preferred tool for similarity search.

rabernat · 2024-01-15T13:14:04Z

Thanks for the suggestion Alex! You're correct that it's fairly easy to create a vector search interface on top of Xarray + Zarr. Here's an example: https://gist.github.com/rabernat/40f53bba3a81aeb420e14872388c6fc1

In contrast to most vector DB's on the market today, all of the index building and search happen on the client side--Arraylake doesn't provide any server-side implementations for any of this. So I'd be hesitant to characterize Arraylake as a VectorDB.

alxmrs · 2024-01-17T10:48:58Z

You’re totally right. And, that’s why I like it so much! Like, you let the user bring in faiss themselves to tune an index while making the hard stuff, like transactions and concurrency, easy. Your caution makes sense, but I do see the appeal of a more DIY VectorDB. Similar to how you’ve written that the best data API is a cloud-optimized store in a bucket, I like the appeal of a simple, “serverless” embedding store.

…

On Mon, Jan 15, 2024 at 9:14 PM Ryan Abernathey ***@***.***> wrote: Thanks for the suggestion Alex! You're correct that it's fairly easy to create a vector search interface on top of Xarray + Zarr. Here's an example: https://gist.github.com/rabernat/40f53bba3a81aeb420e14872388c6fc1 In contrast to most vector DB's on the market today, all of the index building and search happen on the client side--Arraylake doesn't provide any server-side implementations for any of this. So I'd be hesitant to characterize Arraylake as a VectorDB. — Reply to this email directly, view it on GitHub <#6 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AARXAB5OCIBQEIC6X66B7RDYOUTSNAVCNFSM6AAAAABBZ2ECWSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOJSGE2TKOBSGU> . You are receiving this because you authored the thread.Message ID: ***@***.***>

rabernat · 2024-01-17T12:41:34Z

😍

Would you like to help turn my gist into a proper Python package? Could be a good project for your sabbatical? 😉

alxmrs · 2024-01-17T13:28:25Z

That sounds like a fun project. I’ll consider it, but I don’t expect to have the time 😉.

…

On Wed, Jan 17, 2024 at 8:41 PM Ryan Abernathey ***@***.***> wrote: 😍 Would you like to help turn my gist into a proper Python package? Could be a good project for your sabbatical? 😉 — Reply to this email directly, view it on GitHub <#6 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AARXAB6PSLL2T5PFWVVA6NTYO7BITAVCNFSM6AAAAABBZ2ECWSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOJVG4ZTENRRGE> . You are receiving this because you authored the thread.Message ID: ***@***.***>

ljstrnadiii · 2024-04-18T17:42:23Z

That would be a cool package. Lot's to figure out like how many chunks per potentially distributed/sharded index and how we would reduce. I have had great success with the ResultsHeap class in faiss to "reduce" searches over sharded index(es). I have thought though that xarray could be well suited to this type of problem.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is ArrayLake a VectorDB? #6

Is ArrayLake a VectorDB? #6

alxmrs commented Jan 14, 2024

rabernat commented Jan 15, 2024

alxmrs commented Jan 17, 2024 via email

rabernat commented Jan 17, 2024

alxmrs commented Jan 17, 2024 via email

ljstrnadiii commented Apr 18, 2024

Is ArrayLake a VectorDB? #6

Is ArrayLake a VectorDB? #6

Comments

alxmrs commented Jan 14, 2024

rabernat commented Jan 15, 2024

alxmrs commented Jan 17, 2024 via email

rabernat commented Jan 17, 2024

alxmrs commented Jan 17, 2024 via email

ljstrnadiii commented Apr 18, 2024