Skip to content

This repository is for our August 2024 meet-up wherein we'll learn how you can quickly and easily vectorize data using Timescale and query it in realtime with Hasura.

Notifications You must be signed in to change notification settings

Birmingham-AI/realtime-vector

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Vector Data with Timescale and Hasura DDN

Outcomes

  • Create a sample project that highlights the power of Timescale's pgai extension, demonstrating the ability to vectorize data within PostgreSQL.
  • Create a sample project that highlights the power of Hasura DDN, allowing realtime authorized queries across data sources.

Getting started

Below, you'll find some information to help get you started with this project. The steps will ensure you've installed all dependencies and will show you a finished API that leverages the various data sources running in containers.

For the first part of the workshop, we'll focus on the TimescaleDB instance and running LLMs directly on data using SQL. After that, we'll take a look at how you can use LLMs directly via your API 🤙

Step 1. Install dependencies

Step 2. Clone the repo

git clone https://github.com/Birmingham-AI/realtime-vector.git

Step 3. Build and run the images

From the root of the project, and with the Docker daemon running, build the images and start them up in the background using the start.sh script.

First, make it executable:

chmod +x ./start.sh

Then, run it:

./start.sh

Step 4. Explore the API

Click here to open the development console (Hasura's GUI) to explore the API.

Step 5. Clean up

When you're ready to bring everything down, you can press ctrl + c in your terminal to kill the active process. Then, either manually stop all the docker containers or — if you're lazy efficient — use this script:

chmod +x ./kill.sh

Then, execute it:

./kill.sh

Project architecture

The Docker compose provides the best overview of what we'll be building:

services:
  ollama:
    image: ollama/ollama
    ports:
      - "11435:11434"
    volumes:
      - ollama_data:/root/.ollama

  timescaledb:
    image: timescale/timescaledb:latest-pg16
    ports:
      - "5432:5432"
    environment:
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: postgres
      POSTGRES_DB: changelog_db
    volumes:
      - timescale_data:/var/lib/postgresql/data
      - ./init-scripts/timescaledb:/docker-entrypoint-initdb.d
    depends_on:
      - ollama

  mongodb:
    image: mongo:latest
    container_name: mongodb
    ports:
      - "27017:27017"
    volumes:
      - mongo_data:/data/db

  mongo-seed:
    build:
      context: ./init-scripts/mongodb/
      dockerfile: Dockerfile
    links:
      - mongodb
    depends_on:
      - mongodb

volumes:
  ollama_data:
  timescale_data:
  mongo_data:

However, this will also be our directory structure:

realtime-vector/
├── README.md
├── docker-compose.yaml
├── hasura
│   ├── app
│   ├── compose.yaml
│   ├── engine
│   ├── globals
│   ├── hasura.yaml
│   ├── otel-collector-config.yaml
│   └── supergraph.yaml
├── init-scripts
│   ├── mongodb
│   └── timescaledb
└── start.sh

TimescaleDB with pgai

Timescale is Postgres made powerful.

3.2M+ Timescale databases power apps across IoT, sensors, AI, dev tools, crypto, and finance—all built on PostgreSQL. We use PostgreSQL for everything; we built our cloud so you can too.

  • Timescale is cloud-hosted Postgres.
  • Timescaledb is a Postgres extension for timeseries, events, and analytics workloads.
  • pgai is a stack of Postgres extensions for AI workloads:
    • pgvectorscale - powerful vector index/search building on pgvector
    • pgai - makes working with LLMs directly from SQL possible and easy

Hasura DDN

The Hasura Data Delivery Network (DDN) is an open-sourced method for developing composite APIs. You can create a GraphQL API on top of nearly any data source. And, you can connect multiple types of data sources together seamlessly.

Why are we talking about it at an AI meet-up? Well, because you can also incorporate TypeScript (or Python) function directly into your API. This means you can call LLMs — such as OpenAI or, in this case, Ollama — and transform or enrich data from your API before it's returned to a client.

PostgreSQL

Developer table

CREATE TABLE developer (
    id bigint not null primary key generated by default as identity,
    name text NOT NULL,
    email text UNIQUE NOT NULL,
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

Repository table

CREATE TABLE repository (
    id int not null PRIMARY KEY generated by default as identity,
    name text UNIQUE NOT NULL,
    description text,
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

Commit table

CREATE TABLE commit (
    id int not null PRIMARY KEY generated by default as identity,
    developer_id INTEGER REFERENCES developer(id),
    repository_id INTEGER REFERENCES repository(id),
    hash text UNIQUE NOT NULL,
    message text NOT NULL,
    description text NOT NULL,
    commit_time TIMESTAMPTZ NOT NULL,
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

MongoDB

There are other collections present, but these are the two we care about for the purpose of this application:

Pull Requests collection

An example:

{
  "_id": "64dbbcf123456789abcde012",
  "created_at": "2024-08-09T12:34:56Z",
  "description": "Updated the code to prevent Homer from eating the nuclear power plant's donuts.",
  "developer": "lenny.leonard@sprinfield.com",
  "pull_request_id": "PR-742",
  "repository": "springfield_power_plant",
  "status": "merged",
  "title": "Fix Donut Consumption Bug",
  "updated_at": "2024-08-09T14:00:00Z"
}

Commits collection

{
  "_id": "64dbcdf223456789abcde345",
  "commit_time": "2024-08-09T15:45:00Z",
  "description": "Refactored the codebase to optimize Springfield's traffic light system.",
  "developer": "lisa.simpson@sprinfield.com",
  "hash": "abcd1234efgh5678ijkl9101",
  "message": "Optimized traffic light timings to reduce delays",
  "pull_request_id": "PR-555",
  "repository": "springfield_infrastructure"
}

About

This repository is for our August 2024 meet-up wherein we'll learn how you can quickly and easily vectorize data using Timescale and query it in realtime with Hasura.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published