🦉 Athena - Research Companion

Athena is an AI-Assist protoype powered by Cohere-AI and Embed-v3 to faciliate scientific Research. Its key differentiating features include:

Advanced Semantic Search: Outperforms traditional keyword searches with state-of-the-art embeddings, offering a more nuanced and effective data retrieval experience that understands the complex nature of scientific queries.
Human-AI Collaboration: Enables easier review of research literature, highlighting key topics, and augmenting human understanding.
Admin Support: Provides assistance with tasks such as categorization of research articles, e-mail drafting, and tweets generation.

📚 Overview

Data Pipeline

As part of this project we have created two datasets of 50.000 arXiv articles related to AI and NLP using Cohere Embedv3:

Steps:

Retrieve Articles' Metadata from ArXiv. See ./data_pipeline/retrieve_arxiv.py
Embed Articles' Title and Abstract using Embedv3. See ./data_pipeline/embed_arxiv.py
Store Articles' Metadata and Embeddings in Weaviate. See ./data_pipeline/index_arxiv.py

Prompt Templates, Output Formatting, and Validation

Some of our tasks such as enriching abstracts with Wikipedia Links, crafting a glossary, composing e-mails and tweeting rely on a set of:

Prompt Templates

Those prompts are then composed into a LangChain chain as in the following code snippets:

Weaviate Schema

See ArxivArticle Class.

Cohere Engine

The coral.py class provides an abstraction layer over Cohere endpoints.

Streamlit App

See app.py

🚀 Quickstart

Clone the repository:

git@github.com:dcarpintero/athena.git

Create and Activate a Virtual Environment:

Windows:

py -m venv .venv
.venv\scripts\activate

macOS/Linux

python3 -m venv .venv
source .venv/bin/activate

Install dependencies:

pip install -r requirements.txt

Run Data Pipeline (optional)

python retrieve_arxiv.py
python embed_arxiv.py
python index_arxiv.py

Launch Web Application

streamlit run ./app.py

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
.streamlit		.streamlit
data_pipeline		data_pipeline
prompts		prompts
static		static
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
coral.py		coral.py
requirements.txt		requirements.txt
weaviatestore.py		weaviatestore.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🦉 Athena - Research Companion

📚 Overview

Data Pipeline

Prompt Templates, Output Formatting, and Validation

Weaviate Schema

Cohere Engine

Streamlit App

🚀 Quickstart

🔗 References

About

Releases

Languages

License

dcarpintero/athena

Folders and files

Latest commit

History

Repository files navigation

🦉 Athena - Research Companion

📚 Overview

Data Pipeline

Prompt Templates, Output Formatting, and Validation

Weaviate Schema

Cohere Engine

Streamlit App

🚀 Quickstart

🔗 References

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Languages