Skip to content

Commit

Permalink
docs: add tiny model (#763)
Browse files Browse the repository at this point in the history
* docs: add tiny model

* docs: add tiny model

* chore: update readme

* docs: add paper to pretrained models

* chore: add changelog

* chore: update readme
  • Loading branch information
bwanglzu committed Jul 25, 2023
1 parent c387752 commit b232b3f
Show file tree
Hide file tree
Showing 3 changed files with 50 additions and 5 deletions.
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### Docs

- Add tiny model and citation to Readme and docs. ([#763](https://github.com/jina-ai/finetuner/pull/763))

- Fix huggingface link of jina embeddings. ([#761](https://github.com/jina-ai/finetuner/pull/761))

- Remove redundant text in jina embedding page. ([#762](https://github.com/jina-ai/finetuner/pull/762))
Expand Down
25 changes: 25 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,15 @@ without worrying about resource availability, complex integration, or infrastruc

## [Documentation](https://finetuner.jina.ai/)

## Pretrained Text Embedding Models

| name | parameter | dimension | Huggingface |
|------------------------|-----------|-----------|--------------------------------------------------------|
| jina-embedding-t-en-v1 | 14m | 312 | [link](https://huggingface.co/jinaai/jina-embedding-t-en-v1) |
| jina-embedding-s-en-v1 | 35m | 512 | [link](https://huggingface.co/jinaai/jina-embedding-s-en-v1) |
| jina-embedding-b-en-v1 | 110m | 768 | [link](https://huggingface.co/jinaai/jina-embedding-b-en-v1) |
| jina-embedding-l-en-v1 | 330m | 1024 | [link](https://huggingface.co/jinaai/jina-embedding-l-en-v1) |

## Benchmarks

<table>
Expand Down Expand Up @@ -172,6 +181,22 @@ Check out our published blogposts and tutorials to see Finetuner in action!

<!-- end finetuner-articles -->

<!-- start citations -->
If you find Jina Embeddings useful in your research, please cite the following paper:

```text
@misc{günther2023jina,
title={Jina Embeddings: A Novel Set of High-Performance Sentence Embedding Models},
author={Michael Günther and Louis Milliken and Jonathan Geuter and Georgios Mastrapas and Bo Wang and Han Xiao},
year={2023},
eprint={2307.11224},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```
<!-- end citations -->

<!-- start support-pitch -->
## Support

Expand Down
28 changes: 23 additions & 5 deletions docs/get-started/pretrained.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ we have introduced a suite of pre-trained text embedding models licensed under A
These models have a variety of use cases, including information retrieval, semantic textual similarity, text reranking, and more.
The suite consists of the following models:

- `jina-embedding-t-en-v1` [**[Huggingface](https://huggingface.co/jinaai/jina-embedding-t-en-v1)**]: The fastest embedding model in the world with 14 million parameters.
- `jina-embedding-s-en-v1` [**[Huggingface](https://huggingface.co/jinaai/jina-embedding-s-en-v1)**]: This is a compact model with just 35 million parameters, that performs lightning-fast inference while delivering impressive performance.
- `jina-embedding-b-en-v1` [**[Huggingface](https://huggingface.co/jinaai/jina-embedding-b-en-v1)**]: This model has a size of 110 million parameters, performs fast inference and delivers better performance than our smaller model.
- `jina-embedding-l-en-v1` [**[Huggingface](https://huggingface.co/jinaai/jina-embedding-l-en-v1)**]: This is a relatively large model with a size of 330 million parameters, that performs single-gpu inference and delivers better performance than the other models.
Expand Down Expand Up @@ -36,12 +37,29 @@ Each Jina embedding model can encode up to 512 tokens,
with any further tokens being truncated.
The models have different output dimensionalities, as shown in the table below:

|Name|param |context| Dimension |
|------------------------------|-----|------|-----------|
|jina-embedding-s-en-v1|35m |512| 512 |
|jina-embedding-b-en-v1|110m |512| 768 |
|jina-embedding-l-en-v1|330m |512| 1024 |
| Name | param |context| Dimension |
|------------------------|-------|------|-----------|
| jina-embedding-t-en-v1 | 14m |512| 312 |
| jina-embedding-s-en-v1 | 35m |512| 512 |
| jina-embedding-b-en-v1 | 110m |512| 768 |
| jina-embedding-l-en-v1 | 330m |512| 1024 |

## Performance

Please refer to the [Huggingface](https://huggingface.co/jinaai/jina-embedding-s-en-v1) page.

## Citations

If you find Jina Embeddings useful in your research, please cite the following paper:

```text
@misc{günther2023jina,
title={Jina Embeddings: A Novel Set of High-Performance Sentence Embedding Models},
author={Michael Günther and Louis Milliken and Jonathan Geuter and Georgios Mastrapas and Bo Wang and Han Xiao},
year={2023},
eprint={2307.11224},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```

0 comments on commit b232b3f

Please sign in to comment.