Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update main github readme #508

Merged
merged 3 commits into from
Feb 2, 2023
Merged

update main github readme #508

merged 3 commits into from
Feb 2, 2023

Conversation

changhiskhan
Copy link
Contributor

No description provided.

@@ -27,71 +25,121 @@ Lance makes machine learning workflows with ML data easy (images, videos, point

* Version, compare and diff ML datasets easily.

* Search for nearest neighbors in under 1 millisecond.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example is not clear that per-query-latency is under 1 milliseconds.

README.md Outdated

1. For 100 randomly sampled query vectors, we get <1ms average response time (on a 2023 m2 macbook air)

![img.png](img.png)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the image, change the dataset file name to sift_1m.lance?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think in the tarball the directory is vec_data.lance, so that's what ppl will see by default

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, lemme make a new tarball, this is the SIFT dataset, right

**Converting to Lance**

```python
import lance
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have the import before installation, require pip install pylance above this

README.md Outdated

**Fast updates** (ROADMAP): Updates will be supported via write-ahead logs.

**Vector index**: Vector index for similarity search over embedding space
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps move this to the first one? as the key value prop

You can easily import a DataFrame or a Parquet file to Lance using Apache Arrow-first APIs:
**Vector search**

Download an indexed [sift dataset](https://eto-public.s3.us-west-2.amazonaws.com/datasets/sift/sift_ivf256_pq16.tar.gz),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be moved to wget/curl + unzip via shell? or you prefer this for general os - windows included?

@changhiskhan changhiskhan merged commit be10c4d into main Feb 2, 2023
@changhiskhan changhiskhan deleted the changhiskhan/root-readme branch February 2, 2023 05:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants