From ff16ce1db88bbac14cb19c05266b1286c224e65d Mon Sep 17 00:00:00 2001
From: Alex Cureton-Griffiths
-Embedding image and sentence into fixed-length vectors via CLIP
+Embed images and sentences into fixed-length vectors with CLIP
@@ -18,7 +18,7 @@
-CLIP-as-service is a low-latency high-scalability service for embedding images and texts. It can be easily integrated as a microservice into neural search solutions.
+CLIP-as-service is a low-latency high-scalability service for embedding images and text. It can be easily integrated as a microservice into neural search solutions.
⚡ **Fast**: Serve CLIP models with ONNX runtime and PyTorch JIT with 800QPS[*]. Non-blocking duplex streaming on requests and responses, designed for large data and long-running tasks.
@@ -26,9 +26,9 @@ CLIP-as-service is a low-latency high-scalability service for embedding images a
🐥 **Easy-to-use**: No learning curve, minimalist design on client and server. Intuitive and consistent API for image and sentence embedding.
-👒 **Modern**: Async client support. Easily switch between gRPC, HTTP, Websocket protocols with TLS and compressions.
+👒 **Modern**: Async client support. Easily switch between gRPC, HTTP, WebSocket protocols with TLS and compression.
-🍱 **Integration**: Smoothly integrated with neural search ecosystem including [Jina](https://github.com/jina-ai/jina) and [DocArray](https://github.com/jina-ai/docarray). Build cross-modal and multi-modal solution in no time.
+🍱 **Integration**: Smooth integration with neural search ecosystem including [Jina](https://github.com/jina-ai/jina) and [DocArray](https://github.com/jina-ai/docarray). Build cross-modal and multi-modal solutions in no time.
[*] with default config (single replica, PyTorch no JIT) on GeForce RTX 3090.
@@ -138,15 +138,15 @@ You can change `0.0.0.0` to the intranet or public IP address to test the connec
print(r.shape) # [3, 512]
```
-More comprehensive server & client configs can be found in the docs.
+More comprehensive server and client configuration can be found in the [docs](https://clip-as-service.jina.ai/).
-### Text-to-image cross-modal search in 10 Lines
+### Text-to-image cross-modal search in 10 lines
-Let's build a text-to-image search using CLIP-as-service. Namely, user input a sentence and the program returns the matched images. We will use [Totally Looks Like](https://sites.google.com/view/totally-looks-like-dataset) dataset and [DocArray](https://github.com/jina-ai/docarray) package. Note that DocArray is included within `clip-client` as an upstream dependency, so you don't need to install it separately.
+Let's build a text-to-image search using CLIP-as-service. Namely, a user can input a sentence and the program returns matching images. We'll use the [Totally Looks Like](https://sites.google.com/view/totally-looks-like-dataset) dataset and [DocArray](https://github.com/jina-ai/docarray) package. Note that DocArray is included within `clip-client` as an upstream dependency, so you don't need to install it separately.
#### Load images
-First we load images. You can simply pull it from Jina Cloud:
+First we load images. You can simply pull them from Jina Cloud:
```python
from docarray import DocumentArray
@@ -157,7 +157,7 @@ da = DocumentArray.pull('ttl-original', show_progress=True, local_cache=True)
or download TTL dataset, unzip, load manually
-Alternatively, you can go to [Totally Looks Like](https://sites.google.com/view/totally-looks-like-dataset) official website, unzip and load images as follows:
+Alternatively, you can go to [Totally Looks Like](https://sites.google.com/view/totally-looks-like-dataset) official website, unzip and load images:
```python
from docarray import DocumentArray
@@ -167,21 +167,22 @@ da = DocumentArray.from_files(['left/*.jpg', 'right/*.jpg'])
-
+
- |
- |
- |
@@ -262,21 +263,21 @@ Now you can input arbitrary English sentences and view the top-9 matched images.
- |
- |
- |
@@ -327,7 +328,7 @@ da.summary()
#### Encode sentences
-Now encode these 6403 sentences, it may take 10s or less depending on your GPU and network:
+Now encode these 6,403 sentences, it may take 10 seconds or less depending on your GPU and network:
```python
from clip_client import Client
@@ -340,7 +341,7 @@ r = c.encode(da, show_progress=True)
- |
- |
- |
- |
@@ -401,7 +402,7 @@ Fun time! Note, unlike the previous example, here the input is an image, the sen
- |
@@ -419,7 +420,7 @@ Fun time! Note, unlike the previous example, here the input is an image, the sen
- |
- |
- |
- |
@@ -449,7 +450,7 @@ Fun time! Note, unlike the previous example, here the input is an image, the sen
- |