-
Notifications
You must be signed in to change notification settings - Fork 243
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update main github readme #508
Conversation
@@ -27,71 +25,121 @@ Lance makes machine learning workflows with ML data easy (images, videos, point | |||
|
|||
* Version, compare and diff ML datasets easily. | |||
|
|||
* Search for nearest neighbors in under 1 millisecond. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The example is not clear that per-query-latency
is under 1 milliseconds.
README.md
Outdated
|
||
1. For 100 randomly sampled query vectors, we get <1ms average response time (on a 2023 m2 macbook air) | ||
|
||
![img.png](img.png) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the image, change the dataset file name to sift_1m.lance
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think in the tarball the directory is vec_data.lance, so that's what ppl will see by default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, lemme make a new tarball, this is the SIFT dataset, right
**Converting to Lance** | ||
|
||
```python | ||
import lance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have the import before installation, require pip install pylance
above this
README.md
Outdated
|
||
**Fast updates** (ROADMAP): Updates will be supported via write-ahead logs. | ||
|
||
**Vector index**: Vector index for similarity search over embedding space |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps move this to the first one? as the key value prop
You can easily import a DataFrame or a Parquet file to Lance using Apache Arrow-first APIs: | ||
**Vector search** | ||
|
||
Download an indexed [sift dataset](https://eto-public.s3.us-west-2.amazonaws.com/datasets/sift/sift_ivf256_pq16.tar.gz), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be moved to wget/curl + unzip via shell? or you prefer this for general os - windows included?
No description provided.