Semantic search engine written in Java as a university project
Warning
This project is absolutely not production ready. It was developed as a university project, as a proof of concept. If you see this message, it means that I am already developing a new version of this project, which is based on a microservice architecture, and is much more optimized.
-
You need to define some env vars:
export MILVUS_HOST=localhost MILVUS_PORT=19530 MONGODB_URI=mongodb://localhost:27017/ RABBITMQ_HOST=localhost RABBITMQ_USERNAME=user RABBITMQ_PASSWORD=pass MODEL_PATH=models/model.onnx
-
You need an embedding model in ONNX format. I used this model: cointegrated/LaBSE-en-ru. To convert, I used the utility from this article: Export to ONNX. Also, you can use any embedding model in ONNX format with a vector dimension of 768. Put the model in a models folder, like
models/model.onnx
-
Build and run the project with:
./gradlew run
-
The API will be available on port 4567
Note
The project was written with an emphasis on the fact that it will be possible to run as many indexing workers as you want. But, due to the tight deadlines, there was not enough time for optimization, and each indexing worker loads a model into its memory. Run with caution!
flowchart TD
U(User)
A(User API)
S(Search Service)
I(Indexing Service)
E(Embedding Model)
DM[(Mongo)]
DV[(Milvus)]
R[(RabbitMQ)]
U -->|API Request| A
A -->|Send indexing task to queue| R
R -->|Receive task| I
I -->|Store keywords| DM
I -->|Generate more indexing tasks| R
I -->|Extract from text| E
E -->|Store embedding| DV
A -->|Search request| S
S -->|Extract from query| E
S -->|Query by keywords| DM
S -->|Query by embedding| DV