☁️ Google's Vertex AI has expanded its capabilities by introducing Generative AI. This advanced technology comes with a specialized in-console studio experience, a dedicated API and Python SDK designed for deploying and managing instances of Google's powerful PaLM language models. With a distinct focus on text generation, summarization, chat completion, and embedding creation, PaLM models are reshaping the boundaries of natural language processing and machine learning.
⚡ Redis Enterprise offers robust vector database features, with an API for index creation, management, distance metric selection, similarity search, and hybrid filtering. When coupled with its versatile data structures - including lists, hashes, JSON, and sets - Redis Enterprise shines as the optimal solution for crafting high-quality Large Language Model (LLM)-based applications. It embodies a streamlined, "shared-nothing" architecture and exceptional SLAs, making it an instrumental tool for production environments.
This repo serves as a foundational architecture for building LLM applications with Redis and GCP services.
- Primary Storage >>> GCP BigQuery
- Foundation Models >>> GCP Vertex AI
- PaLM API for text embedding creation
- PaLM API for text generation
- PaLM API for chat completion
- High-Performance Data Layer >>> Redis Enterprise
- Vector database for semantic search + context retrieval
- LLM Cache
- LLM Memory for application chat history and session metadata
- Load Libraries and Tools: Before building language modeling applications, we install the right Python libraries, connect to the proper datastores, and authenticate with GCP.
- Create BigQuery Table: Drawing in data from one or more sources, we populate an enriched table in BigQuery that holds the primary data for building language model applications. This could be a custom knowledge base, domain-specific proprietary data, customer records (typically any kind of data that has text fields).
- Generate Embeddings: Leveraging Google’s Vertex PaLM API, we generate semantic text embeddings that characterize & represent “chunks” of underlying text. These embeddings are lists of numbers that capture the meaning and context and can be used for similarity search between a user input question or prompt and source text.
- Load Embeddings: We store the rich embeddings in Redis Enterprise as an additional low-latency data layer on top of BigQuery.
- Create Vector Index: We create a search index in Redis Enterprise that enables real-time semantic search. While BigQuery holds the primary data, Redis holds the embeddings.
This architecture contains many essential elements required to build real-world LLM applications that can enhance your business. A few examples include:
Open the code tutorial in a Colab notebook (recommended) to get your hands dirty with LLMs on GCP. This guide is a step-by-step walkthrough of setting up the required data, services, and databases in order to build LLM applications, and then highlights a few Redis & LLM design principles including: Semantic Search, Retrieval, Caching, and Memory storage.