Skip to content

Commit

Permalink
PyPI memary integration (#29)
Browse files Browse the repository at this point in the history
* Create setup.py

Requirement for PyPi Integration - setup.py

* Create __init__.py

Requirement for PyPi - reviewers please suggest edits if needed

* Create tests folder 

For PyPi integration - see setup.py

* init: .toml file and pypi foundation

* fix: pip installing local dist/ directory

* fix: update test folder

* fix: version number

* docs: add diagram images for README

* docs: fix diagram image locations

* fix: prepare for PyPI

---------

Co-authored-by: kevinl424 <kevin.li.20742@gmail.com>
Co-authored-by: seyeong-han <h960213@gmail.com>
  • Loading branch information
3 people authored May 9, 2024
1 parent b07bed9 commit d87ebd4
Show file tree
Hide file tree
Showing 31 changed files with 215 additions and 1,440 deletions.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
*.env
.venv
.DS_Store
dist/
README_hidden.md
**/__pycache__/
48 changes: 22 additions & 26 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,14 +50,17 @@ ALPHA_VANTAGE_API_KEY="YOUR_API_KEY"
```

2. Remove the quotations "" because python may read escape characters '\' and skip characters

```
DO:
CORRECT:
OPENAI_API_KEY=SKxxxxxxxx
NOT:
INCORRECT:
OPENAI_API_KEY="SKxxxxxxxx"
```

3. How to get API keys:

```
OpenAI key: https://openai.com/index/openai-api
Expand All @@ -76,27 +79,18 @@ Alpha Vantage: (this key is for getting real time stock data)
Reccomend use https://10minutemail.com/ to generate a temporary email to use
```

2. Run:
```
streamlit run streamlit_app/app.py
```

Note: If 'src' not found error, change line 15 in /streamlit_app/app.py
```
From:
parent_dir = os.path.dirname(curr_dir)
4. Run:

To:
parent_dir = os.path.dirname(curr_dir) + '/memary'
```

Also move the '/streamlit_app/data' folder to the 'memary' folder, outside the 'src' folder.
```
cd streamlit_app
streamlit run app.py
```

## Detailed Component Breakdown

### Routing Agent

![agent diagram](https://github.com/kingjulio8238/memary/assets/120517860/e5be38db-8c7a-4df2-8b1d-b578fa9c827f)
![agent diagram](diagrams/routing_agent.png)

- Uses the [ReAct agent](https://react-lm.github.io/) to plan and execute a query given the tools provided. This type of agent can reason over which of the tools to use next to further the response, feed inputs into the selected tool, and repeat the process with the output until it determines that the answer is satisfactory.
- Current tool suite:
Expand Down Expand Up @@ -140,7 +134,7 @@ Alpha Vantage: (this key is for getting real time stock data)

### Memory Module

![Memory Module](https://github.com/kingjulio8238/memary/assets/120517860/5bf361a4-84e5-4a93-bc9b-aa9e42c3dac3)
![Memory Module](diagrams/memory_module.png)

- What is the memory module?

Expand All @@ -153,7 +147,7 @@ The memory module comprises the Memory Stream and Entity Knowledge Store. The me
- Rank Entities by Relevance: Use both frequency and recency to rank entities. An entity frequently mentioned (high count) and referenced recently is likely of high importance, and the user is well aware of this concept.
- Categorize Entities: Group entities into categories based on their nature or the context in which they're mentioned (e.g., technical terms, personal interests). This categorization aids in quickly accessing relevant information tailored to the user's inquiries.
- Highlight Changes Over Time: Identify any significant changes in the entities' ranking or categorization over time. A shift in the most frequently mentioned entities could indicate a change in the user's interests or knowledge.
- Additional information on the memory modules can be found [here](https://github.com/seyeong-han/KnowledgeGraphRAG)
- Additional information on the memory modules can be found [here](https://github.com/seyeong-han/KnowledgeGraphRAG)

- Purpose in larger system
- Compress/summarize the top N ranked entities in the entity knowledge store and pass to the LLM’s finite context window alongside the agent's response and chat history for inference.
Expand All @@ -162,24 +156,25 @@ The memory module comprises the Memory Stream and Entity Knowledge Store. The me
- Future contributions
- We currently extract the top N entities from the entitiy knowledge store and pass these entities into the context window for inference. memary can future benefit from more advanced memory compression techniques such as passing only entities that are in the agent's response to the context window. We look forward to related community contributions.

![Memory Compression](https://github.com/kingjulio8238/memary/assets/120517860/eb911941-9ec0-492f-a47d-5b4196508a1b)
![Memory Compression](diagrams/memory_compression.png)

## Future Integrations

As mentioned above, memary will benefit from the following integrations:
- Create an LLM Judge that scores the ReACT agent forming a feedback loop. See [Zooter](https://arxiv.org/abs/2311.08692) for insights.
As mentioned above, memary will benefit from the following integrations:

- Create an LLM Judge that scores the ReACT agent forming a feedback loop. See [Zooter](https://arxiv.org/abs/2311.08692) for insights.
- Expand the knowledge graph’s capabilities to support multiple modalities, i.e., images.
- Optimize the graph to reduce latency of search times.
- Instead of extracting the top N entities from the entity knowledge store deploy more advanced memory compression techniques such as extracting only the entities included in the agent’s response.
- Create an intuitive UI to switch between models easily. We aim to setup memary so that users can use it for free without any costly API integrations.
- Create an intuitive UI to switch between models easily. We aim to setup memary so that users can use it for free without any costly API integrations.

Currently memary is structured so that the ReAct agent can only process one query at a time. We hope to see **multiprocessing** integrated so that the agent can process many subqueries simultaneously. We expect this to improve the relevancy and accuracy of responses. The source code for both decomposing the query and reranking the many agent responses has been provided, and once multiprocessing has been added to the system, these components can easily be integrated into the main `ChatAgent` class. The diagram below shows how the newly integrated system would work.

![Future Integrations](diagrams/final.png)

### Query Decomposition

![QD Diagram](https://github.com/kingjulio8238/memary/assets/120517860/e8663b07-66c4-4c08-82d3-cef8eb9c2554)
![QD Diagram](diagrams/query_decomposition.png)

- What is query decomposition?
- A preprocessing technique that breaks down complex queries into simpler queries to expedite the LLM’s ability to answer the prompt. It is important to note that this process leaves simple queries unchanged.
Expand All @@ -204,7 +199,7 @@ Currently memary is structured so that the ReAct agent can only process one quer

### Reranking

![Reranking Diagram](https://github.com/kingjulio8238/memary/assets/120517860/3f15b40f-c591-43ab-aa10-727b6997727d)
![Reranking Diagram](diagrams/reranking_diagram.png)

- What is reranking?
- Reranking is the process of scoring nodes based on their relevancy.
Expand All @@ -226,5 +221,6 @@ We welcome contributions from the community and hope to see memary advance as ag

Initial Contributors: [Julian Saks](https://www.linkedin.com/in/juliansaks/), [Kevin Li](https://www.linkedin.com/in/kevin-li8/), [Seyeong Han](https://github.com/seyeong-han), [Arnav Chopra](https://www.linkedin.com/in/arnav-chopra/), [Aishwarya Balaji](https://www.linkedin.com/in/aishwarya--balaji/), [Anshu Siripurapu](https://www.linkedin.com/in/anshusiripurapu/) (Hook 'em!)

## License
## License

memary is released under the MIT License.
Binary file added diagrams/memory_compression.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added diagrams/memory_module.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added diagrams/query_decomposition.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added diagrams/reranking_diagram.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
50 changes: 50 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
[build-system]
requires = [
"hatchling",
]
build-backend = "hatchling.build"

[project]
name = "memary"
version = "0.1.1"
authors = [
{ name="memary Labs", email="hello@memarylabs.com" },
]
description = "Longterm Memory for Autonomous Agents"
readme = "README_hidden.md"
requires-python = ">=3.8, <=3.11.9"
classifiers = [
"Programming Language :: Python :: 3",
"License :: OSI Approved :: MIT License",
"Operating System :: OS Independent",
]
dependencies = [
"neo4j==5.17.0",
"python-dotenv==1.0.1",
"pyvis==0.3.2",
"streamlit==1.31.1",
"llama-index==0.10.11",
"llama-index-agent-openai==0.1.5",
"llama-index-core==0.10.12",
"llama-index-embeddings-openai==0.1.5",
"llama-index-graph-stores-nebula==0.1.2",
"llama-index-graph-stores-neo4j==0.1.1",
"llama-index-legacy==0.9.48",
"llama-index-llms-openai==0.1.5",
"llama-index-multi-modal-llms-openai==0.1.3",
"llama-index-program-openai==0.1.3",
"llama-index-question-gen-openai==0.1.2",
"llama-index-readers-file==0.1.4",
"langchain==0.1.12",
"langchain-openai==0.0.8",
"llama-index-llms-perplexity==0.1.3",
"pandas",
"geocoder",
"googlemaps",
"ansistrip",
"numpy",
]

[project.urls]
Homepage = "https://github.com/kingjulio8238/memary"
Issues = "https://github.com/kingjulio8238/memary/issues"
3 changes: 2 additions & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -20,4 +20,5 @@ llama-index-llms-perplexity==0.1.3
pandas
geocoder
googlemaps
ansistrip
ansistrip
numpy
Empty file added src/memary/__init__.py
Empty file.
Empty file added src/memary/agent/__init__.py
Empty file.
8 changes: 4 additions & 4 deletions src/agent/base_agent.py → src/memary/agent/base_agent.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,10 +25,10 @@
from llama_index.multi_modal_llms.openai import OpenAIMultiModal
import requests

from src.agent.data_types import Message
from src.agent.llm_api.tools import openai_chat_completions_request
from src.memory import EntityKnowledgeStore, MemoryStream
from src.synonym_expand.synonym import custom_synonym_expand_fn
from memary.agent.data_types import Message
from memary.agent.llm_api.tools import openai_chat_completions_request
from memary.memory import EntityKnowledgeStore, MemoryStream
from memary.synonym_expand.synonym import custom_synonym_expand_fn

MAX_ENTITIES_FROM_KG = 5
ENTITY_EXCEPTIONS = ["Unknown relation"]
Expand Down
4 changes: 2 additions & 2 deletions src/agent/chat_agent.py → src/memary/agent/chat_agent.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
import tiktoken
from typing import Optional, List

from src.agent.base_agent import Agent
from src.agent.data_types import Context
from memary.agent.base_agent import Agent
from memary.agent.data_types import Context


class ChatAgent(Agent):
Expand Down
File renamed without changes.
File renamed without changes.
5 changes: 5 additions & 0 deletions src/memary/memory/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
from memary.memory.base_memory import BaseMemory
from memary.memory.memory_stream import MemoryStream
from memary.memory.entity_knowledge_store import EntityKnowledgeStore

__all__ = ["BaseMemory", "MemoryStream", "EntityKnowledgeStore"]
File renamed without changes.
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
import json
import logging

from src.memory import BaseMemory
from src.memory.types import KnowledgeMemoryItem, MemoryItem
from memary.memory import BaseMemory
from memary.memory.types import KnowledgeMemoryItem, MemoryItem


class EntityKnowledgeStore(BaseMemory):
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@
import logging
from datetime import datetime

from src.memory import BaseMemory
from src.memory.types import MemoryItem
from memary.memory import BaseMemory
from memary.memory.types import MemoryItem

logging.basicConfig(level=logging.INFO)

Expand Down
File renamed without changes.
Empty file.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
from langchain_core.output_parsers import JsonOutputParser
from typing import List
import os
from src.synonym_expand.output import Output
from memary.synonym_expand.output import Output
from dotenv import load_dotenv

def custom_synonym_expand_fn(keywords: str) -> List[str]:
Expand Down
5 changes: 0 additions & 5 deletions src/memory/__init__.py

This file was deleted.

5 changes: 2 additions & 3 deletions streamlit_app/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,10 +16,9 @@
parent_dir = os.path.dirname(curr_dir)
#parent_dir = os.path.dirname(curr_dir) + '/memary' #Use this if error: src not found. Also move the '/streamlit_app/data' folder to the 'memary' folder, outside the 'src' folder.

print(parent_dir)
sys.path.append(parent_dir)
sys.path.append(parent_dir + '/src')

from src.agent.chat_agent import ChatAgent
from memary.agent.chat_agent import ChatAgent

load_dotenv()

Expand Down
58 changes: 57 additions & 1 deletion streamlit_app/data/entity_knowledge_store.json
Original file line number Diff line number Diff line change
@@ -1 +1,57 @@
[]
[
{
"entity": "Walt disney pictures",
"count": 1,
"date": "2024-05-08T19:17:52"
},
{
"entity": "Disney",
"count": 1,
"date": "2024-05-08T19:17:52"
},
{
"entity": "Voices of tim allen",
"count": 1,
"date": "2024-05-08T19:17:52"
},
{
"entity": "Pixar",
"count": 1,
"date": "2024-05-08T19:17:52"
},
{
"entity": "Voices of don rickles",
"count": 1,
"date": "2024-05-08T19:17:52"
},
{
"entity": "Pixar animation studios",
"count": 1,
"date": "2024-05-08T19:17:52"
},
{
"entity": "American computer-animated film",
"count": 1,
"date": "2024-05-08T19:17:52"
},
{
"entity": "Toy story",
"count": 1,
"date": "2024-05-08T19:17:52"
},
{
"entity": "1995 american computer-animated buddy comedy film",
"count": 1,
"date": "2024-05-08T19:17:52"
},
{
"entity": "Voices of tom hanks",
"count": 1,
"date": "2024-05-08T19:17:52"
},
{
"entity": "Entirely computer-animated feature film",
"count": 1,
"date": "2024-05-08T19:17:52"
}
]
1 change: 1 addition & 0 deletions streamlit_app/data/external_response.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Toy Story is a 1995 American computer-animated buddy comedy film produced by Pixar Animation Studios and released by Walt Disney Pictures. The feature film directorial debut of John Lasseter, it was the first entirely computer-animated feature film, as well as the first feature film from Pixar.
47 changes: 46 additions & 1 deletion streamlit_app/data/memory_stream.json
Original file line number Diff line number Diff line change
@@ -1 +1,46 @@
[]
[
{
"entity": "American computer-animated film",
"date": "2024-05-08T19:17:52"
},
{
"entity": "Disney",
"date": "2024-05-08T19:17:52"
},
{
"entity": "Voices of tim allen",
"date": "2024-05-08T19:17:52"
},
{
"entity": "1995 american computer-animated buddy comedy film",
"date": "2024-05-08T19:17:52"
},
{
"entity": "Pixar",
"date": "2024-05-08T19:17:52"
},
{
"entity": "Voices of don rickles",
"date": "2024-05-08T19:17:52"
},
{
"entity": "Pixar animation studios",
"date": "2024-05-08T19:17:52"
},
{
"entity": "Entirely computer-animated feature film",
"date": "2024-05-08T19:17:52"
},
{
"entity": "Toy story",
"date": "2024-05-08T19:17:52"
},
{
"entity": "Walt disney pictures",
"date": "2024-05-08T19:17:52"
},
{
"entity": "Voices of tom hanks",
"date": "2024-05-08T19:17:52"
}
]
11 changes: 10 additions & 1 deletion streamlit_app/data/past_chat.json
Original file line number Diff line number Diff line change
@@ -1 +1,10 @@
[]
[
{
"role": "user",
"content": "tell me about toy story"
},
{
"role": "user",
"content": "rag: Toy Story is a 1995 American computer-animated buddy comedy film produced by Pixar Animation Studios and released by Walt Disney Pictures. The feature film directorial debut of John Lasseter, it was the first entirely computer-animated feature film, as well as the first feature film from Pixar."
}
]
Loading

0 comments on commit d87ebd4

Please sign in to comment.