Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JS/TS + Misc Feedback #254

Closed
transitive-bullshit opened this issue Mar 29, 2023 · 7 comments
Closed

JS/TS + Misc Feedback #254

transitive-bullshit opened this issue Mar 29, 2023 · 7 comments

Comments

@transitive-bullshit
Copy link

Hey @jeffchuber, I'm just going to throw a lot of random feedback at you mainly from a JS/TS dev's perspective focused on DX.

Feel free to ignore my suggestions – I just want to try and give you an honest brain dump that will hopefully be helpful. The more important stuff is bolded.

JS/TS SDK

  • remove axios as a dependency; fetch is a web standard and is the default for node.js >= 18 and all serverless/edge environments. here is a good resource that I've linked several other repos to for how to handle fetch as a library author that is future proof for where the node.js community is headed
  • add support for keyword-based arguments
    • this isn't just a subjective thing; most quality JS/TS packages avoid positional arguments except for really simple use cases because the added verbosity and typing tends to make function invocations much clearer)
    • if I have to pass an undefined as a positional argument, there's something wrong w/ the SDK 😉
    • (super nit) pretty sure metadata is already plural, so metadatas is a lil awkward

e.g.,

# python
collection.add(
    embeddings=[[1.1, 2.3, 3.2], [4.5, 6.9, 4.4], [1.1, 2.3, 3.2], ...
    metadatas=[{"chapter": "3", "verse": "16"}, {"chapter": "3", "verse": "5"}, {"chapter": "29", "verse": "11"}, ...],
    ids=["id1", "id2", "id3", ...]
)
# TS
await collection.add(
    ["id1", "id2", "id3", ...], 
    [[1.1, 2.3, 3.2], [4.5, 6.9, 4.4], [1.1, 2.3, 3.2], ...], 
    [{"chapter": "3", "verse": "16"}, {"chapter": "3", "verse": "5"}, {"chapter": "29", "verse": "11"}, ...], 
)
  • make it clear that the add function upserts documents instead of strictly creating new documents
  • I would've expected add to take in a single typed array, with an optional second argument for optional params
  • allow getCollection to support an optional typescript generic so all collection methods are properly typed

Some example TS pseudocode that shows an idealized version of the package in action:

import { ChromaClient } from 'chromadb'

interface MyMetadata { foo: string; bar: number }

const client = new ChromaClient()
const collection = await client.getCollection<MyMetadata>({
  indexName: "my_collection", 
  embeddingSize: 1536,
  embeddingFunction: () => { ... }
})

// now collection is properly typed for my particular documents

await collection.upsert([
  {
    id: 'foo-1',
    embedding: [1.1, 2.3, 3.2],
    metadata: { foo: 'foo', bar: 5 }
  },
  // ...
])

// or if you want to pass documents instead of embeddings

await collection.upsertDocuments([
  {
    id: 'foo-1',
    document: "example text",
    metadata: { foo: 'foo', bar: 5 }
  },
  // ...
])

// ideally these upsert / add methods would automatically batch things for you with a configurable batchSize

Here's a solid Pinecone TS wrapper by @rileytomasek which does a lot of similar things: https://github.com/rileytomasek/pinecone-client

Docs

  • some of the JS examples still use python (e.g., let collection2 = await client.getCollection("my_collection", embedding_function=emb_fn))
  • in the JS examples, prefer ESM import vs commonjs require which are more or less deprecated going forwards (e.g., the embeddings page)
  • some of the JS examples use snake_case which is very pythonic and 99% of the JS/TS community uses camelCase (api_key for instance)
  • it's currently really unclear how auth is handled connecting to the db; I would expect to be able to pass an api key to the ChromaClient constructor, but I'm guessing this feature doesn't exist yet? maybe add a note to the docs for now?
  • in general, I find the handling of documents vs embeddings to be a little clunky right now. e.g., trying to support both with the same functions really doesn't make sense to me since afaict they seem to be mutually exclusive.
    • I'd consider making collection.query({ embeddings: [...] }) separate from collection.queryDocuments({ documents: [...] }) or something to make the difference explicit, since right now it's confusing that I could pass both and unclear which one would actually be used
    • same with where vs where_document – I like that it's explicit here about the difference
    • but in general, the concept of a document versus metadata seems a bit conflated. in mongodb, for instance (as I'm sure you're well aware), the document is the entire record, not just the part it's indexed based on. whereas in chroma, the document is like a shorthand for an implicit pre-embedding but the metadata not being part of the document is a bit weird. sorry I know this isn't the best explanation, but getting your core public abstractions to make sense at this early stage is imho really critical
  • add cmd+k search to docs 😄
  • is there an AWS AMI for chroma? I hate cloudformation, but I'd gladly self-host on AWS if I had a one-click install via the AMI marketplace

Hope this feedback is helpful && again, feel free to ignore & prioritize as you wish 🙏

@jeffchuber
Copy link
Contributor

jeffchuber commented Mar 29, 2023

@transitive-bullshit this is really great, thanks for writing this up!

Here is a quick set of reactions

remove axios

🫡 great!

keyword based arguments

we have been discussing either moving to, or also supporting row-based insert in addition to column-based insert. This because the ergonomics are much more natural for a non-data-science and app-dev focused use case. This +1s that!

nits on the docs

really great, thank you, we will update

auth

right now chroma does not handle auth. What do you think is best to do here? Have "db-like" auth similar to how you would do something with postgres?

query being overloaded

that's really interesting feedback... i hadnt heard that before! going to need to think about that. :)

AMI

none yet! but after the refactor, #214, lands - something we could def do!

@jeffchuber
Copy link
Contributor

@transitive-bullshit do you think its better to support named and positional args (non-breaking change) or hard break change over to named? Not sure what the std convention is here in the JS world.

@transitive-bullshit
Copy link
Author

@transitive-bullshit do you think its better to support named and positional args (non-breaking change) or hard break change over to named? Not sure what the std convention is here in the JS world.

Ideally, the only version you'd support would be named args. It's simpler, and at imho at this stage of your roadmap, this level of breaking change will be welcomed by JS/TS devs as long as it's communicated properly.

Positional args for anything more than one or maybe 2 positions in JS/TS are pretty frowned upon imho. Feel free to double check w/ @sw-yx

@rileytomasek
Copy link

I agree with @transitive-bullshit. Named args will also make it easier to avoid future breaking changes or awkward function calls.

@jeffchuber
Copy link
Contributor

@transitive-bullshit @rileytomasek @sw-yx named args it is! thanks everyone :)

@swyxio
Copy link
Contributor

swyxio commented May 11, 2023

ONE ARG TO RULE THEM

AND IN DARKNESS BIND THEM

jeffchuber added a commit that referenced this issue May 17, 2023
In this PR
- docstrings! 
- types!
- named params! 

This is a breaking change to the JS Client - it switches from positional
to named arguments originally brought up by @transitive-bullshit in
#254

Example: 

```
// before
const collection = await chroma.createCollection("test");

//after
const collection = await chroma.createCollection({ name: "test" });
```

This is especially nice when you are only using a few params. 

This PR does not bump the `npm` version number - we will cut a new
release on Monday.

We will need to land an update to our docs at that time as well. TBD
@jeffchuber
Copy link
Contributor

Implemented basically all of this! Thanks again @transitive-bullshit for the awesome issue 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants