Vector Data with Timescale and Hasura DDN

Outcomes

Create a sample project that highlights the power of Timescale's pgai extension, demonstrating the ability to vectorize data within PostgreSQL.
Create a sample project that highlights the power of Hasura DDN, allowing realtime authorized queries across data sources.

Getting started

Below, you'll find some information to help get you started with this project. The steps will ensure you've installed all dependencies and will show you a finished API that leverages the various data sources running in containers.

For the first part of the workshop, we'll focus on the TimescaleDB instance and running LLMs directly on data using SQL. After that, we'll take a look at how you can use LLMs directly via your API 🤙

Step 1. Install dependencies

Docker
Hasura DDN CLI
A totally-free-forever Hasura Cloud account
Ollama installed and running the llama3.1 model locally

Step 2. Clone the repo

git clone https://github.com/Birmingham-AI/realtime-vector.git

Step 3. Build and run the images

From the root of the project, and with the Docker daemon running, build the images and start them up in the background using the start.sh script.

First, make it executable:

chmod +x ./start.sh

Then, run it:

./start.sh

Step 4. Explore the API

Click here to open the development console (Hasura's GUI) to explore the API.

Step 5. Clean up

When you're ready to bring everything down, you can press ctrl + c in your terminal to kill the active process. Then, either manually stop all the docker containers or — if you're ~~lazy~~ efficient — use this script:

chmod +x ./kill.sh

Then, execute it:

./kill.sh

Project architecture

The Docker compose provides the best overview of what we'll be building:

services:
  ollama:
    image: ollama/ollama
    ports:
      - "11435:11434"
    volumes:
      - ollama_data:/root/.ollama

  timescaledb:
    image: timescale/timescaledb:latest-pg16
    ports:
      - "5432:5432"
    environment:
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: postgres
      POSTGRES_DB: changelog_db
    volumes:
      - timescale_data:/var/lib/postgresql/data
      - ./init-scripts/timescaledb:/docker-entrypoint-initdb.d
    depends_on:
      - ollama

  mongodb:
    image: mongo:latest
    container_name: mongodb
    ports:
      - "27017:27017"
    volumes:
      - mongo_data:/data/db

  mongo-seed:
    build:
      context: ./init-scripts/mongodb/
      dockerfile: Dockerfile
    links:
      - mongodb
    depends_on:
      - mongodb

volumes:
  ollama_data:
  timescale_data:
  mongo_data:

However, this will also be our directory structure:

realtime-vector/
├── README.md
├── docker-compose.yaml
├── hasura
│   ├── app
│   ├── compose.yaml
│   ├── engine
│   ├── globals
│   ├── hasura.yaml
│   ├── otel-collector-config.yaml
│   └── supergraph.yaml
├── init-scripts
│   ├── mongodb
│   └── timescaledb
└── start.sh

TimescaleDB with pgai

Timescale is Postgres made powerful.

3.2M+ Timescale databases power apps across IoT, sensors, AI, dev tools, crypto, and finance—all built on PostgreSQL. We use PostgreSQL for everything; we built our cloud so you can too.

Timescale is cloud-hosted Postgres.
Timescaledb is a Postgres extension for timeseries, events, and analytics workloads.
pgai is a stack of Postgres extensions for AI workloads:
- pgvectorscale - powerful vector index/search building on pgvector
- pgai - makes working with LLMs directly from SQL possible and easy

Hasura DDN

The Hasura Data Delivery Network (DDN) is an open-sourced method for developing composite APIs. You can create a GraphQL API on top of nearly any data source. And, you can connect multiple types of data sources together seamlessly.

Why are we talking about it at an AI meet-up? Well, because you can also incorporate TypeScript (or Python) function directly into your API. This means you can call LLMs — such as OpenAI or, in this case, Ollama — and transform or enrich data from your API before it's returned to a client.

PostgreSQL

Developer table

CREATE TABLE developer (
    id bigint not null primary key generated by default as identity,
    name text NOT NULL,
    email text UNIQUE NOT NULL,
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

Repository table

CREATE TABLE repository (
    id int not null PRIMARY KEY generated by default as identity,
    name text UNIQUE NOT NULL,
    description text,
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

Commit table

CREATE TABLE commit (
    id int not null PRIMARY KEY generated by default as identity,
    developer_id INTEGER REFERENCES developer(id),
    repository_id INTEGER REFERENCES repository(id),
    hash text UNIQUE NOT NULL,
    message text NOT NULL,
    description text NOT NULL,
    commit_time TIMESTAMPTZ NOT NULL,
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

MongoDB

There are other collections present, but these are the two we care about for the purpose of this application:

Pull Requests collection

An example:

{
  "_id": "64dbbcf123456789abcde012",
  "created_at": "2024-08-09T12:34:56Z",
  "description": "Updated the code to prevent Homer from eating the nuclear power plant's donuts.",
  "developer": "lenny.leonard@sprinfield.com",
  "pull_request_id": "PR-742",
  "repository": "springfield_power_plant",
  "status": "merged",
  "title": "Fix Donut Consumption Bug",
  "updated_at": "2024-08-09T14:00:00Z"
}

Commits collection

{
  "_id": "64dbcdf223456789abcde345",
  "commit_time": "2024-08-09T15:45:00Z",
  "description": "Refactored the codebase to optimize Springfield's traffic light system.",
  "developer": "lisa.simpson@sprinfield.com",
  "hash": "abcd1234efgh5678ijkl9101",
  "message": "Optimized traffic light timings to reduce delays",
  "pull_request_id": "PR-555",
  "repository": "springfield_infrastructure"
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vector Data with Timescale and Hasura DDN

Outcomes

Getting started

Step 1. Install dependencies

Step 2. Clone the repo

Step 3. Build and run the images

Step 4. Explore the API

Step 5. Clean up

Project architecture

TimescaleDB with pgai

Hasura DDN

PostgreSQL

Developer table

Repository table

Commit table

MongoDB

Pull Requests collection

Commits collection

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
hasura		hasura
init-scripts		init-scripts
resources		resources
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
compose.yaml		compose.yaml
kill.sh		kill.sh
start.sh		start.sh

Birmingham-AI/realtime-vector

Folders and files

Latest commit

History

Repository files navigation

Vector Data with Timescale and Hasura DDN

Outcomes

Getting started

Step 1. Install dependencies

Step 2. Clone the repo

Step 3. Build and run the images

Step 4. Explore the API

Step 5. Clean up

Project architecture

TimescaleDB with pgai

Hasura DDN

PostgreSQL

Developer table

Repository table

Commit table

MongoDB

Pull Requests collection

Commits collection

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages