Skip to content
This repository has been archived by the owner on Sep 18, 2024. It is now read-only.

docs: add tiny model #763

Merged
merged 6 commits into from
Jul 25, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### Docs

- Add tiny model and citation to Readme and docs. ([#763](https://github.com/jina-ai/finetuner/pull/763))

- Fix huggingface link of jina embeddings. ([#761](https://github.com/jina-ai/finetuner/pull/761))

- Remove redundant text in jina embedding page. ([#762](https://github.com/jina-ai/finetuner/pull/762))
Expand Down
25 changes: 25 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,15 @@ without worrying about resource availability, complex integration, or infrastruc

## [Documentation](https://finetuner.jina.ai/)

## Pretrained Text Embedding Models

| name | parameter | dimension | Huggingface |
|------------------------|-----------|-----------|--------------------------------------------------------|
| jina-embedding-t-en-v1 | 14m | 312 | [link](https://huggingface.co/jinaai/jina-embedding-t-en-v1) |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is 312 correct? weird number

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it is 312

| jina-embedding-s-en-v1 | 35m | 512 | [link](https://huggingface.co/jinaai/jina-embedding-s-en-v1) |
| jina-embedding-b-en-v1 | 110m | 768 | [link](https://huggingface.co/jinaai/jina-embedding-b-en-v1) |
| jina-embedding-l-en-v1 | 330m | 1024 | [link](https://huggingface.co/jinaai/jina-embedding-l-en-v1) |

## Benchmarks

<table>
Expand Down Expand Up @@ -172,6 +181,22 @@ Check out our published blogposts and tutorials to see Finetuner in action!

<!-- end finetuner-articles -->

<!-- start citations -->
If you find Jina Embeddings useful in your research, please cite the following paper:

```text
@misc{günther2023jina,
title={Jina Embeddings: A Novel Set of High-Performance Sentence Embedding Models},
author={Michael Günther and Louis Milliken and Jonathan Geuter and Georgios Mastrapas and Bo Wang and Han Xiao},
year={2023},
eprint={2307.11224},
archivePrefix={arXiv},
primaryClass={cs.CL}
}

```
<!-- end citations -->

<!-- start support-pitch -->
## Support

Expand Down
28 changes: 23 additions & 5 deletions docs/get-started/pretrained.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ we have introduced a suite of pre-trained text embedding models licensed under A
These models have a variety of use cases, including information retrieval, semantic textual similarity, text reranking, and more.
The suite consists of the following models:

- `jina-embedding-t-en-v1` [**[Huggingface](https://huggingface.co/jinaai/jina-embedding-t-en-v1)**]: The fastest embedding model in the world with 14 million parameters.
- `jina-embedding-s-en-v1` [**[Huggingface](https://huggingface.co/jinaai/jina-embedding-s-en-v1)**]: This is a compact model with just 35 million parameters, that performs lightning-fast inference while delivering impressive performance.
- `jina-embedding-b-en-v1` [**[Huggingface](https://huggingface.co/jinaai/jina-embedding-b-en-v1)**]: This model has a size of 110 million parameters, performs fast inference and delivers better performance than our smaller model.
- `jina-embedding-l-en-v1` [**[Huggingface](https://huggingface.co/jinaai/jina-embedding-l-en-v1)**]: This is a relatively large model with a size of 330 million parameters, that performs single-gpu inference and delivers better performance than the other models.
Expand Down Expand Up @@ -36,12 +37,29 @@ Each Jina embedding model can encode up to 512 tokens,
with any further tokens being truncated.
The models have different output dimensionalities, as shown in the table below:

|Name|param |context| Dimension |
|------------------------------|-----|------|-----------|
|jina-embedding-s-en-v1|35m |512| 512 |
|jina-embedding-b-en-v1|110m |512| 768 |
|jina-embedding-l-en-v1|330m |512| 1024 |
| Name | param |context| Dimension |
|------------------------|-------|------|-----------|
| jina-embedding-t-en-v1 | 14m |512| 312 |
| jina-embedding-s-en-v1 | 35m |512| 512 |
| jina-embedding-b-en-v1 | 110m |512| 768 |
| jina-embedding-l-en-v1 | 330m |512| 1024 |

## Performance

Please refer to the [Huggingface](https://huggingface.co/jinaai/jina-embedding-s-en-v1) page.

## Citations

If you find Jina Embeddings useful in your research, please cite the following paper:

```text
@misc{günther2023jina,
title={Jina Embeddings: A Novel Set of High-Performance Sentence Embedding Models},
author={Michael Günther and Louis Milliken and Jonathan Geuter and Georgios Mastrapas and Bo Wang and Han Xiao},
year={2023},
eprint={2307.11224},
archivePrefix={arXiv},
primaryClass={cs.CL}
}

```
Loading