Hello Bhaiya 👋

A chatbot for migrant workers to understand their employment rights

Problem statement

Singapore has 1,427,500 migrant workers, comprising 38% of its labour force (MOM 2019a). According to Transient Workers Count Too (TWC2), migrant workers are commonly exploited by employers (i.e. unpaid salaries, illegal wage deductions etc.) and are disadvantaged by factors such as language and cultural displacement etc. It is important that migrant workers are aware of their employment rights so that they may seek the appropriate legal recourse. While there is information online, many articles are not easily understood with limited english proficiency. Additionally, workers may also not have enough time to read through long documents to find a relevant clause to their situation.

Our solution

Integrate a chatbot with a (Retrieval-Augmented Generation) RAG model that uses the Employment Act 1968 content indexed in its retrieval database; together with prompt engineering, enabling the chatbot to provide detailed answers to questions about the Act in an easily comprehensible manner.

Tech stack

LangChain
OpenAI/ AzureOpenAI
Llama-parse
Gitlab
Streamlit
Chromadb

Workflow

Challenges

Working with a legal document
- Singapore Employment Act: Well-structured, consistent formatting
- Each section covers a “topic”
- How to chunk and retain the structure of the document:
  - We converted the Employment Act PDF into markdown format using Llama-parse
  - Removed the repetitive headers in the document
  - Use LangChain’s RecursiveCharacterTextSplitter where separators=['(?<=\n)##', '(?<=\n)###'] to chunk by sections (topics).
Curveball: Azure OpenAI tokens not available

Tradeoff analysis	Purchase OpenAI tokens	Deploy Mistral-7B-v0.1 on local machine
Cost	For occasional or small-scale usage, purchasing tokens may be more cost-effective than deploying and managing infrastructure locally. For larger-scale usage, the cost of purchasing tokens may become significant compared to deploying and running models locally.	F.O.C, however, deploying and managing Mistral-7B-v0.1 on a local machine requires more technical expertise and resources for maintenance
Scalability	OpenAI tokens offer scalability, allowing users to access more powerful models and APIs as needed without the hassle of managing hardware.	Local deployment may have scalability limitations compared to cloud-based solutions, especially for resource-intensive models or high-concurrency scenarios.
Learning Curve	Purchasing OpenAI tokens allows easy access to pre-trained models and APIs without the need for managing infrastructure.	Initial setup may involve configuring hardware, software dependencies, and model deployment frameworks, which can be time-consuming and complex.

Limitations & Future Work

Currently input and outputs are in English only -> Add translation for both inputs from user and outputs to user if other languages are used
Requires access to API key for ChatGPT3.5

Setup

1. conda environment

Create the required environment: conda env create --file=conda.yml

2. create a .env file with the following params

openai_api_key = #your token here filepath=data/employment_act_markdown.txt deployment_name_model=gpt-35-turbo-0613 deployment_name_text_embedding=text-embedding-ada-002

3a. run chatbot

To run the chatbot locally: streamlit run src/hellobaya.py

3b. run chatbot using Docker

i. Build docker: docker build -t hello-baya:v1 .
ii. Run docker: docker run --rm -it -p 8501:8501 hello-baya:v1

Acknowledgements

This project was completed as part of a group project with 3 other AI Engineer Apprentices at AISG for AIAP Batch 15:
i. Jasmine Ng
ii. Jaymes Lee
iii. Lee Pei Yueng

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
data		data
images		images
notebooks		notebooks
src		src
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
conda.yml		conda.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hello Bhaiya 👋

Problem statement

Our solution

Tech stack

Workflow

Challenges

Limitations & Future Work

Setup

1. conda environment

2. create a .env file with the following params

3a. run chatbot

3b. run chatbot using Docker

Acknowledgements

About

Releases

Packages

Languages

rosamundlim/hello-bhaiya

Folders and files

Latest commit

History

Repository files navigation

Hello Bhaiya 👋

Problem statement

Our solution

Tech stack

Workflow

Challenges

Limitations & Future Work

Setup

1. conda environment

2. create a .env file with the following params

3a. run chatbot

3b. run chatbot using Docker

Acknowledgements

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages