Semantic search engine

Semantic search engine written in Java as a university project

Warning

This project is absolutely not production ready. It was developed as a university project, as a proof of concept. If you see this message, it means that I am already developing a new version of this project, which is based on a microservice architecture, and is much more optimized.

How to run

You need to define some env vars:

export MILVUS_HOST=localhost MILVUS_PORT=19530 MONGODB_URI=mongodb://localhost:27017/ RABBITMQ_HOST=localhost RABBITMQ_USERNAME=user RABBITMQ_PASSWORD=pass MODEL_PATH=models/model.onnx

You need an embedding model in ONNX format. I used this model: cointegrated/LaBSE-en-ru. To convert, I used the utility from this article: Export to ONNX. Also, you can use any embedding model in ONNX format with a vector dimension of 768. Put the model in a models folder, like models/model.onnx
Build and run the project with:
```
./gradlew run
```
The API will be available on port 4567

Note

The project was written with an emphasis on the fact that it will be possible to run as many indexing workers as you want. But, due to the tight deadlines, there was not enough time for optimization, and each indexing worker loads a model into its memory. Run with caution!

Shitty project architecture

flowchart TD
    U(User)

    A(User API)
    S(Search Service)
    I(Indexing Service)
    E(Embedding Model)

    DM[(Mongo)]
    DV[(Milvus)]
    R[(RabbitMQ)]

    U -->|API Request| A
    A -->|Send indexing task to queue| R
    R -->|Receive task| I
    I -->|Store keywords| DM
    I -->|Generate more indexing tasks| R
    I -->|Extract from text| E
    E -->|Store embedding| DV

    A -->|Search request| S
    S -->|Extract from query| E
    S -->|Query by keywords| DM
    S -->|Query by embedding| DV

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
common		common
gradle/wrapper		gradle/wrapper
indexer		indexer
search		search
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle		settings.gradle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Semantic search engine

How to run

Shitty project architecture

About

Releases

Packages

Languages

potat-dev/semantic-search-coursework

Folders and files

Latest commit

History

Repository files navigation

Semantic search engine

How to run

Shitty project architecture

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages