CALMS : Context-Aware Language Model for Science

CALMS is a retrieval and tool augmented large language model (LLM) to assist scientists, design experiments around and perform science using complex scientific instrumentation.

tool_video_merged3.mp4

Getting started

conda create --name calms python=3.11.5
git clone https://github.com/mcherukara/CALMS
Navigate to the folder, activate your conda environment, then:

pip install -r requirements_H100.txt
Start the app:

The VERY FIRST time you run each model, you will have to compute embeddings over the document stores. You can do this by setting init_docs = True in params.py before starting the chat app. This will take a LONG time but only needs to be run once
python chat_app.py --openai

for OpenAI models (choose which one (GPT3.5, GPT4 etc. ) in params.py)

(OR)

python chat_app.py —hf

for open-source models (choose which one (Vicuna etc.) in params.py)

Recommend at least 50 GB of GPU memory for LLAMA family of models

Please note you will have to provide your own OpenAI and Materials Project API keys

Navigate to localhost:2023 for the open-source model and localhost:2024 for the openai model

Ports can be set in chat_app.py

DISCLAIMER

The content presented in this paper has been generated using pre-trained Large Language Models (LLMs), specifically GPT 3.5 and Vicuna, by injecting contextual prompts into these LLM pipelines through a retrieval and augmentation tool. The generated content is reported as is, without any manipulation or alteration of the LLM outputs. The authors acknowledge that LLM-generated content may contain errors, biases, or inaccuracies, which could significantly impact the scientific workflows in which they are incorporated. It is important to note that the current code base is not production-ready and requires additional checks and balances before being used for large-scale deployment. Furthermore, the authors disclaim any responsibility or liability for the accuracy, completeness, or reliability of LLM-generated content presented in this paper.

Name		Name	Last commit message	Last commit date
Latest commit History 165 Commits
DOC_STORE		DOC_STORE
dev_tests_and_results		dev_tests_and_results
old_scripts		old_scripts
ops_demo		ops_demo
tokenizer/vicuna-13b-v1.5-16k		tokenizer/vicuna-13b-v1.5-16k
web_scraper		web_scraper
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
baseline_questions.txt		baseline_questions.txt
chat_app.py		chat_app.py
dfrac_tools.py		dfrac_tools.py
llms.py		llms.py
params.py		params.py
requirements.txt		requirements.txt
requirements_H100.txt		requirements_H100.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CALMS : Context-Aware Language Model for Science

Getting started

DISCLAIMER

About

Releases

Packages

Contributors 6

Languages

License

mcherukara/CALMS

Folders and files

Latest commit

History

Repository files navigation

CALMS : Context-Aware Language Model for Science

Getting started

DISCLAIMER

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Languages

Packages