Java-based AI on Kubernetes

Demo repository for Java-based AI on Kubernetes from Development to Deployment.

Setup

# create the Kubernetes cluster in GCP with GPU support
# bootstrap AI platform components and services using Flux2
make create-gke-cluster
make bootstrap-flux2

# required to configure Config Connector with Google Cloud ProjectID
kubectl annotate namespace default cnrm.cloud.google.com/project-id="cloud-native-experience-lab"

Building a chat service with Quarkus and OpenAI

# use the Quarkus starter to create a service skeleton
# select desired build system and dependencies
open https://code.quarkus.io

# for local development use the following commands 
cd openai-chat-service
export QUARKUS_LANGCHAIN4J_OPENAI_API_KEY=$OPENAI_API_KEY
./gradlew quarkusDev

# interact with the service locally
http get localhost:8080/api/ask q=="Was macht QAware?"
http get localhost:8080/api/ask q=="What does QAware do?"
http get localhost:8080/api/ask q=="Was macht Microsoft?"
http get localhost:8080/api/ask q=="What is the sum of 40 and 2?"
http get localhost:8080/api/ask q=="What does QAware do? Send email to mlr@qaware.de with subject Information and response as message."

# this here is managed by Flux2
kubectl apply -k infrastructure/services/openai-chat-service/
kubectl get all

Building an OpenAI Proxy using Envoy

The access to the OpenAI API is provided using a cluster internal Envoy based proxy.

# in order for the proxy to work ou have to manually create a Kubernetes secret
# that contains an OPENAI_API_KEY environment variable
kubectl create secret generic openai-api-key --from-literal=OPENAI_API_KEY=$OPENAI_API_KEY

# to test the proxy, issue the following curl command
# exchange localhost with the actual LoadBalancer IP
curl http://localhost:10000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
     "model": "gpt-4o-mini",
     "messages": [{"role": "user", "content": "Say this is a test!"}],
     "temperature": 0.7
   }'

Building a chat service with Quarkus and Ollama

# this is 99% similar to the instructions of using Quarkus and OpenAI
# the only difference, use
#    'io.quarkiverse.langchain4j:quarkus-langchain4j-ollama:0.22.0'
# instead of 
#    'io.quarkiverse.langchain4j:quarkus-langchain4j-openai:0.22.0'

# for local development use the following commands 
ollama serve
ollama run llama3.1

cd ollama-chat-service
./gradlew quarkusDev

# interact with the service locally
http get localhost:8080/api/ask q=="Was macht die QAware GmbH?"

# this here is managed by Flux2
kubectl apply -k infrastructure/services/openai-chat-service/
kubectl get all

Deploying custom LLMs using Ollama Operator

# model deployment using CLI
kollama deploy llama3.1
kollama expose llama3.1 --service-name=ollama-model-llama31-lb --service-type=LoadBalancer

# model deployment via CRD
kubectl apply -f infrastructure/models/phi3.yaml
kollama expose phi3 --service-type LoadBalancer

# to start a chat with ollama
# exchange localhost with the actual LoadBalancer IP
OLLAMA_HOST=localhost:11434 ollama run phi3
OLLAMA_HOST=localhost:11434 ollama run llama3.1

# call the chat API of Ollama or OpenAI
# curl http://localhost:11434/v1/chat/completions
curl http://localhost:11434/api/chat  \
  -H "Content-Type: application/json"  \
  -d '{
    "model": "llama3.1",
    "messages": [
      {
        "role": "user",
        "content": "Say this is a test!"
      }
    ]
  }'

Deploying Langchain4J Easy RAG Chatbot

# see https://docs.quarkiverse.io/quarkus-langchain4j/dev/easy-rag.html
# see https://github.com/quarkiverse/quarkus-langchain4j/tree/main/samples/chatbot-easy-rag

Planing Kubernetes Cluster Topology

# useful information to setup K8s cluster topology with GPUs

# see https://cloud.google.com/compute/docs/gpus?hl=de
# see https://cloud.google.com/compute/all-pricing?hl=de
# see https://cloud.google.com/compute/gpus-pricing?hl=de#other-gpu-models

# see https://cloud.google.com/compute/docs/gpus/create-gpu-vm-general-purpose?hl=de
# see https://cloud.google.com/compute/docs/disks/local-ssd?hl=de

Maintainer

M.-Leander Reimer (@lreimer), mario-leander.reimer@qaware.de

License

This software is provided under the MIT open source license, read the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
.github		.github
chatbot-easy-rag		chatbot-easy-rag
infrastructure		infrastructure
ollama-chat-service		ollama-chat-service
openai-chat-service		openai-chat-service
openai-proxy		openai-proxy
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Java-based AI on Kubernetes

Setup

Building a chat service with Quarkus and OpenAI

Building an OpenAI Proxy using Envoy

Building a chat service with Quarkus and Ollama

Deploying custom LLMs using Ollama Operator

Deploying Langchain4J Easy RAG Chatbot

Planing Kubernetes Cluster Topology

Maintainer

License

About

Releases

Packages

Languages

License

lreimer/k8s-native-java-ai

Folders and files

Latest commit

History

Repository files navigation

Java-based AI on Kubernetes

Setup

Building a chat service with Quarkus and OpenAI

Building an OpenAI Proxy using Envoy

Building a chat service with Quarkus and Ollama

Deploying custom LLMs using Ollama Operator

Deploying Langchain4J Easy RAG Chatbot

Planing Kubernetes Cluster Topology

Maintainer

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages