Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch support #1

Open
asg017 opened this issue Jun 4, 2024 · 2 comments
Open

Batch support #1

asg017 opened this issue Jun 4, 2024 · 2 comments

Comments

@asg017
Copy link
Owner

asg017 commented Jun 4, 2024

Currently the rembed() function makes a new HTTP request for every item. For example, for this query:

select rembed('myModel', field)
from my_table;

If my_table has 100,000 rows, then 100,000 sequential HTTP requests would be sent.

This isn't ideal, most of these providers support multiple inputs in a single request, which should help with rate limits and speed. But finding a good SQL API that works with SQLite can be tricky.

A few different options:

Option 1: Table function with JSON array input

with subset as (
  select json_group_array(
    'id', rowid,
    'contents', my_table.field
  ) as value
  from my_table
)
select 
  rowid, 
  embedding
from subset
join rembed_each('myModel', json(subset.value))

Option 2: input in (...) with serialized rembed_item()

select 
  rowid,
  embedding
from rembed('myModel')
where inputs in (select rembed_item(id, field) from my_table);
@alexpaden
Copy link

alexpaden commented Jul 24, 2024

I want to see batch fixed as I'm trying in-memory sqlite search

re: my bad I'm using ts rn

@ajram23
Copy link

ajram23 commented Sep 6, 2024

+n to this please!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants