A multi-lingual chat assistant for Kelvyn Park Junior and Senior High School that supports English and Spanish.
This chatbot is implemented using Amazon Bedrock Knowledge Base with Claude 3 Haiku as the foundation model to facilitate real-time information exchange. The system architecture is designed to provide up-to-date information about the school through various data sources and a user-friendly interface.
-
Data Sources:
- Amazon S3: Handles PDF and XML files like handbooks and newsletter summary.
- Web Crawler: Extracts information from the school website.
-
Automatic Data Ingestion:
- School documents (handbooks, newsletters) are automatically ingested and synced to the data source using Amazon Simple Email Service.
-
Knowledge Base:
- Utilizes Amazon Bedrock Knowledge Base to store and manage information.
- OpenSearch Embeddings are used to enhance search capabilities.
-
Language Model:
- Claude 3 Haiku serves as the foundation model for natural language processing to understand user queries and generate responses.
- Cluade 3.5 Sonnet is used to create a rich summary of the school newsletters in XML format which is used as one of the data sources for the Bedrock Knowledge Base.
-
User Interface:
- Web interface created using React.
- Deployed using Amazon Amplify for seamless user experience.
-
Backend Communication:
- Implements WebSocket API using Amazon API Gateway for real-time communication between frontend and backend.
- Upload Client sends an email with PDF to Simple Email Service.
- Email is saved in S3 Bucket for emails.
- Email Handler Lambda extracts PDF and generates its summary using Claude 3.5 Sonnet.
- Content is uploaded to S3 Data Source.
- This triggers ingestion into Bedrock Knowledge Base.
- Embeddings are stored in OpenSearch.
- User posts a question through the web interface.
- Query is sent to Bedrock Knowledge Base.
- Relevant embeddings are loaded.
- Bedrock Caller generates an answer using Claude 3 Haiku.
- Answer is streamed back to the user through API Gateway and Amplify.
- AWS CLI (version 2.15.57 Python/3.11.8 Windows/10 exe/AMD64)
- AWS Account with Administrative User configured, region us-west-2
- Node.js (version 20.15.1)
- Typescript 3.8 or later
- Docker
- IDE like VSCode (recommended)
- AWS CDK Toolkit (version 2.148.1)
- GitHub access token with repo access
- AWS SES domain identity
- Claude 3 Haiku and Claude 3.5 Sonnet models in Amazon Bedrock
- Clone this repository
git clone https://github.com/ASUCICREPO/kelvyn-park-chat-assistant.git
- Go into the repository directory
cd kelvyn-park-chat-assistant
- Install dependencies
npm install
- Deploy to the default environment with the GitHub token and SES domain identity name values in context variables.
cdk deploy -c githubtoken=<your_github_access_token> -c domain=<your_domain>
- In your AWS Account, go to Amazon Bedrock->Knowledge Base-> -> Data Sources -> Add. Select Web Crawler->Next. In Data Source Name field enter "kp-website-datasource" and in Source URLs enter "https://kphermosa.org/". Select Website Domain Range in Synch Scope as "Default" and go forward to Add the data source to your knowledge base. Select the newly added datasource and run synch job.
- Go to AWS Amplify and got to your amplify app. Run deployment on the "main" branch.
- Go to the SES Rule set and verify that it is set to "Active".
Developer: Priyam Bansal, Aryan Khanna
Architect: Arun Arunachalam
This project is designed and developed with guidance and support from the ASU Cloud Innovation Center.