LangChain Math Real Analysis Helper

This project is a LangChain prompt engineering application. The vision for this project is to load a bunch of real analysis math html files from a website, extract their info using Beautiful Soup LangChain HTML loader, break the material into chunks, and use ChatOpenAI to interact with the math material. I will try to restrict ChatOpenAI from using its own knowledge of real analysis and instead use just the documents. Can it come up with proofs? Let's find out.

Final YouTube video

Explanatory YouTube video

Main Features

Chat OpenAi: Underlying LLM that user asks questions to in QA format
LangChain: Framework that facilitiates document chunking, retrieval, and construction of LLM pipeline
BSHTMLLoader: Extracts text from raw HTML, making the documents look nice, which makes the LLM better at understanding the context
RecursiveCharacterTextSplitter: Splits documents into smaller logical chunks, decreasing the context required to be fed into the LLM
OpenAI Embeddings: Converts document texts to vectors for fast similarity querying
Chroma DB Vectorstore: Stores document embeddings in memory for quick retrieval
ConversationalRetrievalChain: LangChain-provided
LangChain Playground: Allows easy testing of LLM
Docker: facilitates deployment

How does it work?

Containerfile curls 4 HTML real analysis pages and converts them to UTF8 to avoid character errors
UTF8 files are loaded in via Beautiful Soup into LangChain documents
LangChain documents are broken up into smaller documents
Smaller documents are converted into vectors via OpenAI Embeddings
Embeddings are stored into in-memory ChromaDB
ChromaDB is converted to retrieval object which is then fed into LangChain ConversationalRetrievalChain
ConversationalRetrievalChain is exposed to the user via LangChain FastAPI playground

Conclusions

ConversationalRetrievalChain will only answer questions based off the context. Its power is quite weak. If we want more expresiveness, we need to use a different class or write our own. However, ConversationalRetrievalChain does have use cases: when you want the LLM to answer based only off of the context, then this is the right class. You wouldn't want to hallucinate text from documents.
I had to choose the right chunk size and overlap size for the RecursiveCharacterTextSplitter. If it's too small, the generated documents won't have enough info for the LLM to understand.

Examples

I asked it to answer one of the homework questions

I asked it to answer question 1.4.5, and it could not do so because the answer was not directly in the text: HTML

However, it did understand the question, which is great. This is because it found the right document chunk.

I asked it to retrieve me information from the HTML

Then, I asked it a question directly from the text about equivalence relations. Here is the part I wanted.

Success!

I asked it to retrieve me information about dogs not found in the documents

It couldn't do it, since the information was not in the context.

How to Run

Clone the repository.
Make .env file with OPENAI_API_KEY set to your open API key
Run docker-compose up --build
Open http://0.0.0.0:8000/test/playground/

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.dockerignore		.dockerignore
.gitignore		.gitignore
Containerfile		Containerfile
LICENSE		LICENSE
README.md		README.md
curl_math_pages.sh		curl_math_pages.sh
docker-compose.yml		docker-compose.yml
equivalence.png		equivalence.png
main.py		main.py
picture0.png		picture0.png
picture1.png		picture1.png
picture2.png		picture2.png
requirements.txt		requirements.txt
retrieved_documents.png		retrieved_documents.png
urls.txt		urls.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LangChain Math Real Analysis Helper

Main Features

How does it work?

Conclusions

Examples

I asked it to answer one of the homework questions

I asked it to retrieve me information from the HTML

I asked it to retrieve me information about dogs not found in the documents

How to Run

License

About

Releases

Packages

Languages

License

marcmaliar/langchain-real-analysis-helper

Folders and files

Latest commit

History

Repository files navigation

LangChain Math Real Analysis Helper

Main Features

How does it work?

Conclusions

Examples

I asked it to answer one of the homework questions

I asked it to retrieve me information from the HTML

I asked it to retrieve me information about dogs not found in the documents

How to Run

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages