- Create a sample project that highlights the power of Timescale's pgai extension, demonstrating the ability to vectorize data within PostgreSQL.
- Create a sample project that highlights the power of Hasura DDN, allowing realtime authorized queries across data sources.
Below, you'll find some information to help get you started with this project. The steps will ensure you've installed all dependencies and will show you a finished API that leverages the various data sources running in containers.
For the first part of the workshop, we'll focus on the TimescaleDB instance and running LLMs directly on data using SQL. After that, we'll take a look at how you can use LLMs directly via your API 🤙
- Docker
- Hasura DDN CLI
- A totally-free-forever Hasura Cloud account
- Ollama installed and running the
llama3.1
model locally
git clone https://github.com/Birmingham-AI/realtime-vector.git
From the root of the project, and with the Docker daemon running, build the images and start them up in the background
using the start.sh
script.
First, make it executable:
chmod +x ./start.sh
Then, run it:
./start.sh
Click here to open the development console (Hasura's GUI) to explore the API.
When you're ready to bring everything down, you can press ctrl + c
in your terminal to kill the active process. Then,
either manually stop all the docker containers or — if you're lazy efficient — use this script:
chmod +x ./kill.sh
Then, execute it:
./kill.sh
The Docker compose provides the best overview of what we'll be building:
services:
ollama:
image: ollama/ollama
ports:
- "11435:11434"
volumes:
- ollama_data:/root/.ollama
timescaledb:
image: timescale/timescaledb:latest-pg16
ports:
- "5432:5432"
environment:
POSTGRES_USER: postgres
POSTGRES_PASSWORD: postgres
POSTGRES_DB: changelog_db
volumes:
- timescale_data:/var/lib/postgresql/data
- ./init-scripts/timescaledb:/docker-entrypoint-initdb.d
depends_on:
- ollama
mongodb:
image: mongo:latest
container_name: mongodb
ports:
- "27017:27017"
volumes:
- mongo_data:/data/db
mongo-seed:
build:
context: ./init-scripts/mongodb/
dockerfile: Dockerfile
links:
- mongodb
depends_on:
- mongodb
volumes:
ollama_data:
timescale_data:
mongo_data:
However, this will also be our directory structure:
realtime-vector/
├── README.md
├── docker-compose.yaml
├── hasura
│ ├── app
│ ├── compose.yaml
│ ├── engine
│ ├── globals
│ ├── hasura.yaml
│ ├── otel-collector-config.yaml
│ └── supergraph.yaml
├── init-scripts
│ ├── mongodb
│ └── timescaledb
└── start.sh
Timescale is Postgres made powerful.
3.2M+ Timescale databases power apps across IoT, sensors, AI, dev tools, crypto, and finance—all built on PostgreSQL. We use PostgreSQL for everything; we built our cloud so you can too.
- Timescale is cloud-hosted Postgres.
- Timescaledb is a Postgres extension for timeseries, events, and analytics workloads.
- pgai is a stack of Postgres extensions for AI workloads:
- pgvectorscale - powerful vector index/search building on pgvector
- pgai - makes working with LLMs directly from SQL possible and easy
The Hasura Data Delivery Network (DDN) is an open-sourced method for developing composite APIs. You can create a GraphQL API on top of nearly any data source. And, you can connect multiple types of data sources together seamlessly.
Why are we talking about it at an AI meet-up? Well, because you can also incorporate TypeScript (or Python) function directly into your API. This means you can call LLMs — such as OpenAI or, in this case, Ollama — and transform or enrich data from your API before it's returned to a client.
CREATE TABLE developer (
id bigint not null primary key generated by default as identity,
name text NOT NULL,
email text UNIQUE NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE TABLE repository (
id int not null PRIMARY KEY generated by default as identity,
name text UNIQUE NOT NULL,
description text,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE TABLE commit (
id int not null PRIMARY KEY generated by default as identity,
developer_id INTEGER REFERENCES developer(id),
repository_id INTEGER REFERENCES repository(id),
hash text UNIQUE NOT NULL,
message text NOT NULL,
description text NOT NULL,
commit_time TIMESTAMPTZ NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
There are other collections present, but these are the two we care about for the purpose of this application:
An example:
{
"_id": "64dbbcf123456789abcde012",
"created_at": "2024-08-09T12:34:56Z",
"description": "Updated the code to prevent Homer from eating the nuclear power plant's donuts.",
"developer": "lenny.leonard@sprinfield.com",
"pull_request_id": "PR-742",
"repository": "springfield_power_plant",
"status": "merged",
"title": "Fix Donut Consumption Bug",
"updated_at": "2024-08-09T14:00:00Z"
}
{
"_id": "64dbcdf223456789abcde345",
"commit_time": "2024-08-09T15:45:00Z",
"description": "Refactored the codebase to optimize Springfield's traffic light system.",
"developer": "lisa.simpson@sprinfield.com",
"hash": "abcd1234efgh5678ijkl9101",
"message": "Optimized traffic light timings to reduce delays",
"pull_request_id": "PR-555",
"repository": "springfield_infrastructure"
}