This repo contains a sample of a distributed architecture solution using Neum AI with Celery and Redis Queues. By design, Neum AI framework provides constructs to parallelize workloads in order to process larger data sets.
To leverage this repo, you will need to install dependencies:
pip install -r requirements.txt
In addition:
- Install the the redis CLI to run it locally.
- You will need an Open AI API Key for the OpenAI embedding model. To get an API Key visit OpenAI. Make sure you have configured billing for the account.
- You will need a Weaviate Cloud Service URL and API Key for the Weavaite vector database. To get a URL and API Key visit Weaviate Cloud Service.
In the main.py
file, you need to configure the Open AI and Weaviate connectors.
Alternatively, you can re-configure your pipeline by using Neum AI connectors.
To get everything ready to run our solution, we first need to get our redis
queues running. To do this, we will use the redis
CLI:
sudo service redis-server start
Once we have the redis
queues running, we can now start our Celery
based workers. We will have each running on its own command line.
data_extraction worker
celery -A tasks worker --concurrency 1 -Q data_extraction
data_processing worker
celery -A tasks worker --concurrency 1 -Q data_processing
data_embed_ingest worker
celery -A tasks worker --concurrency 1 -Q data_embed_ingest
Once everything is running, we can now trigger out pipeline. This will distribute the tasks from it into the different queues as it processes the data.
python main.py