PyPI memary integration (#29)

* Create setup.py Requirement for PyPi Integration - setup.py * Create __init__.py Requirement for PyPi - reviewers please suggest edits if needed * Create tests folder For PyPi integration - see setup.py * init: .toml file and pypi foundation * fix: pip installing local dist/ directory * fix: update test folder * fix: version number * docs: add diagram images for README * docs: fix diagram image locations * fix: prepare for PyPI --------- Co-authored-by: kevinl424 <kevin.li.20742@gmail.com> Co-authored-by: seyeong-han <h960213@gmail.com>
kingjulio8238 · May 9, 2024 · d87ebd4 · d87ebd4
1 parent b07bed9
commit d87ebd4
Show file tree

Hide file tree

Showing 31 changed files with 215 additions and 1,440 deletions.
diff --git a/.gitignore b/.gitignore
@@ -1,4 +1,6 @@
 *.env
 .venv
 .DS_Store
+dist/
+README_hidden.md
 **/__pycache__/
diff --git a/README.md b/README.md
@@ -50,14 +50,17 @@ ALPHA_VANTAGE_API_KEY="YOUR_API_KEY"
 ```
 
 2. Remove the quotations "" because python may read escape characters '\' and skip characters
+
 ```
-DO:
+CORRECT:
 OPENAI_API_KEY=SKxxxxxxxx
-NOT:
+
+INCORRECT:
 OPENAI_API_KEY="SKxxxxxxxx"
 ```
 
 3. How to get API keys:
+
 ```
 OpenAI key: https://openai.com/index/openai-api
 
@@ -76,27 +79,18 @@ Alpha Vantage: (this key is for getting real time stock data)
   Reccomend use https://10minutemail.com/ to generate a temporary email to use
 ```
 
-2. Run:
-   ```
-   streamlit run streamlit_app/app.py
-   ```
-
-    Note: If 'src' not found error, change line 15 in /streamlit_app/app.py
-    ```
-    From:
-    parent_dir = os.path.dirname(curr_dir)
+4. Run:
 
-    To:
-    parent_dir = os.path.dirname(curr_dir) + '/memary'
-    ```
-
-    Also move the '/streamlit_app/data' folder to the 'memary' folder, outside the 'src' folder.
+```
+cd streamlit_app
+streamlit run app.py
+```
 
 ## Detailed Component Breakdown
 
 ### Routing Agent
 
-![agent diagram](https://github.com/kingjulio8238/memary/assets/120517860/e5be38db-8c7a-4df2-8b1d-b578fa9c827f)
+![agent diagram](diagrams/routing_agent.png)
 
 - Uses the [ReAct agent](https://react-lm.github.io/) to plan and execute a query given the tools provided. This type of agent can reason over which of the tools to use next to further the response, feed inputs into the selected tool, and repeat the process with the output until it determines that the answer is satisfactory.
 - Current tool suite:
@@ -140,7 +134,7 @@ Alpha Vantage: (this key is for getting real time stock data)
 
 ### Memory Module
 
-![Memory Module](https://github.com/kingjulio8238/memary/assets/120517860/5bf361a4-84e5-4a93-bc9b-aa9e42c3dac3)
+![Memory Module](diagrams/memory_module.png)
 
 - What is the memory module?
 
@@ -153,7 +147,7 @@ The memory module comprises the Memory Stream and Entity Knowledge Store. The me
    - Rank Entities by Relevance: Use both frequency and recency to rank entities. An entity frequently mentioned (high count) and referenced recently is likely of high importance, and the user is well aware of this concept.
    - Categorize Entities: Group entities into categories based on their nature or the context in which they're mentioned (e.g., technical terms, personal interests). This categorization aids in quickly accessing relevant information tailored to the user's inquiries.
    - Highlight Changes Over Time: Identify any significant changes in the entities' ranking or categorization over time. A shift in the most frequently mentioned entities could indicate a change in the user's interests or knowledge.
-   - Additional information on the memory modules can be found [here](https://github.com/seyeong-han/KnowledgeGraphRAG) 
+   - Additional information on the memory modules can be found [here](https://github.com/seyeong-han/KnowledgeGraphRAG)
 
 - Purpose in larger system
   - Compress/summarize the top N ranked entities in the entity knowledge store and pass to the LLM’s finite context window alongside the agent's response and chat history for inference.
@@ -162,24 +156,25 @@ The memory module comprises the Memory Stream and Entity Knowledge Store. The me
 - Future contributions
   - We currently extract the top N entities from the entitiy knowledge store and pass these entities into the context window for inference. memary can future benefit from more advanced memory compression techniques such as passing only entities that are in the agent's response to the context window. We look forward to related community contributions.
 
-![Memory Compression](https://github.com/kingjulio8238/memary/assets/120517860/eb911941-9ec0-492f-a47d-5b4196508a1b)
+![Memory Compression](diagrams/memory_compression.png)
 
 ## Future Integrations
 
-As mentioned above, memary will benefit from the following integrations: 
-- Create an LLM Judge that scores the ReACT agent forming a feedback loop. See [Zooter](https://arxiv.org/abs/2311.08692) for insights. 
+As mentioned above, memary will benefit from the following integrations:
+
+- Create an LLM Judge that scores the ReACT agent forming a feedback loop. See [Zooter](https://arxiv.org/abs/2311.08692) for insights.
 - Expand the knowledge graph’s capabilities to support multiple modalities, i.e., images.
 - Optimize the graph to reduce latency of search times.
 - Instead of extracting the top N entities from the entity knowledge store deploy more advanced memory compression techniques such as extracting only the entities included in the agent’s response.
-- Create an intuitive UI to switch between models easily. We aim to setup memary so that users can use it for free without any costly API integrations. 
+- Create an intuitive UI to switch between models easily. We aim to setup memary so that users can use it for free without any costly API integrations.
 
 Currently memary is structured so that the ReAct agent can only process one query at a time. We hope to see **multiprocessing** integrated so that the agent can process many subqueries simultaneously. We expect this to improve the relevancy and accuracy of responses. The source code for both decomposing the query and reranking the many agent responses has been provided, and once multiprocessing has been added to the system, these components can easily be integrated into the main `ChatAgent` class. The diagram below shows how the newly integrated system would work.
 
 ![Future Integrations](diagrams/final.png)
 
 ### Query Decomposition
 
-![QD Diagram](https://github.com/kingjulio8238/memary/assets/120517860/e8663b07-66c4-4c08-82d3-cef8eb9c2554)
+![QD Diagram](diagrams/query_decomposition.png)
 
 - What is query decomposition?
   - A preprocessing technique that breaks down complex queries into simpler queries to expedite the LLM’s ability to answer the prompt. It is important to note that this process leaves simple queries unchanged.
@@ -204,7 +199,7 @@ Currently memary is structured so that the ReAct agent can only process one quer
 
 ### Reranking
 
-![Reranking Diagram](https://github.com/kingjulio8238/memary/assets/120517860/3f15b40f-c591-43ab-aa10-727b6997727d)
+![Reranking Diagram](diagrams/reranking_diagram.png)
 
 - What is reranking?
   - Reranking is the process of scoring nodes based on their relevancy.
@@ -226,5 +221,6 @@ We welcome contributions from the community and hope to see memary advance as ag
 
 Initial Contributors: [Julian Saks](https://www.linkedin.com/in/juliansaks/), [Kevin Li](https://www.linkedin.com/in/kevin-li8/), [Seyeong Han](https://github.com/seyeong-han), [Arnav Chopra](https://www.linkedin.com/in/arnav-chopra/), [Aishwarya Balaji](https://www.linkedin.com/in/aishwarya--balaji/), [Anshu Siripurapu](https://www.linkedin.com/in/anshusiripurapu/) (Hook 'em!)
 
-## License 
+## License
+
 memary is released under the MIT License.
diff --git a/diagrams/memory_compression.png b/diagrams/memory_compression.png
diff --git a/diagrams/memory_module.png b/diagrams/memory_module.png
diff --git a/diagrams/query_decomposition.png b/diagrams/query_decomposition.png
diff --git a/diagrams/reranking_diagram.png b/diagrams/reranking_diagram.png
diff --git a/pyproject.toml b/pyproject.toml
@@ -0,0 +1,50 @@
+[build-system]
+requires = [
+    "hatchling",
+]
+build-backend = "hatchling.build"
+
+[project]
+name = "memary"
+version = "0.1.1"
+authors = [
+  { name="memary Labs", email="hello@memarylabs.com" },
+]
+description = "Longterm Memory for Autonomous Agents"
+readme = "README_hidden.md"
+requires-python = ">=3.8, <=3.11.9"
+classifiers = [
+    "Programming Language :: Python :: 3",
+    "License :: OSI Approved :: MIT License",
+    "Operating System :: OS Independent",
+]
+dependencies = [
+    "neo4j==5.17.0",
+    "python-dotenv==1.0.1",
+    "pyvis==0.3.2",
+    "streamlit==1.31.1",
+    "llama-index==0.10.11", 
+    "llama-index-agent-openai==0.1.5",
+    "llama-index-core==0.10.12",
+    "llama-index-embeddings-openai==0.1.5",
+    "llama-index-graph-stores-nebula==0.1.2",
+    "llama-index-graph-stores-neo4j==0.1.1",
+    "llama-index-legacy==0.9.48",
+    "llama-index-llms-openai==0.1.5",
+    "llama-index-multi-modal-llms-openai==0.1.3",
+    "llama-index-program-openai==0.1.3",
+    "llama-index-question-gen-openai==0.1.2",
+    "llama-index-readers-file==0.1.4",
+    "langchain==0.1.12",
+    "langchain-openai==0.0.8",
+    "llama-index-llms-perplexity==0.1.3",
+    "pandas",
+    "geocoder",
+    "googlemaps",
+    "ansistrip",
+    "numpy",
+]
+
+[project.urls]
+Homepage = "https://github.com/kingjulio8238/memary"
+Issues = "https://github.com/kingjulio8238/memary/issues"
diff --git a/requirements.txt b/requirements.txt
@@ -20,4 +20,5 @@ llama-index-llms-perplexity==0.1.3
 pandas
 geocoder
 googlemaps
-ansistrip
+ansistrip
+numpy
diff --git a/src/memary/__init__.py b/src/memary/__init__.py
diff --git a/src/memary/agent/__init__.py b/src/memary/agent/__init__.py
diff --git a/src/agent/base_agent.py → src/memary/agent/base_agent.py b/src/agent/base_agent.py → src/memary/agent/base_agent.py
@@ -25,10 +25,10 @@
 from llama_index.multi_modal_llms.openai import OpenAIMultiModal
 import requests
 
-from src.agent.data_types import Message
-from src.agent.llm_api.tools import openai_chat_completions_request
-from src.memory import EntityKnowledgeStore, MemoryStream
-from src.synonym_expand.synonym import custom_synonym_expand_fn
+from memary.agent.data_types import Message
+from memary.agent.llm_api.tools import openai_chat_completions_request
+from memary.memory import EntityKnowledgeStore, MemoryStream
+from memary.synonym_expand.synonym import custom_synonym_expand_fn
 
 MAX_ENTITIES_FROM_KG = 5
 ENTITY_EXCEPTIONS = ["Unknown relation"]

diff --git a/src/agent/chat_agent.py → src/memary/agent/chat_agent.py b/src/agent/chat_agent.py → src/memary/agent/chat_agent.py
@@ -1,8 +1,8 @@
 import tiktoken
 from typing import Optional, List
 
-from src.agent.base_agent import Agent
-from src.agent.data_types import Context
+from memary.agent.base_agent import Agent
+from memary.agent.data_types import Context
 
 
 class ChatAgent(Agent):

diff --git a/src/agent/data_types.py → src/memary/agent/data_types.py b/src/agent/data_types.py → src/memary/agent/data_types.py
diff --git a/src/agent/llm_api/tools.py → src/memary/agent/llm_api/tools.py b/src/agent/llm_api/tools.py → src/memary/agent/llm_api/tools.py
diff --git a/src/memary/memory/__init__.py b/src/memary/memory/__init__.py
@@ -0,0 +1,5 @@
+from memary.memory.base_memory import BaseMemory
+from memary.memory.memory_stream import MemoryStream
+from memary.memory.entity_knowledge_store import EntityKnowledgeStore
+
+__all__ = ["BaseMemory", "MemoryStream", "EntityKnowledgeStore"]
diff --git a/src/memory/base_memory.py → src/memary/memory/base_memory.py b/src/memory/base_memory.py → src/memary/memory/base_memory.py
diff --git a/src/memory/entity_knowledge_store.py → src/memary/memory/entity_knowledge_store.py b/src/memory/entity_knowledge_store.py → src/memary/memory/entity_knowledge_store.py
@@ -1,8 +1,8 @@
 import json
 import logging
 
-from src.memory import BaseMemory
-from src.memory.types import KnowledgeMemoryItem, MemoryItem
+from memary.memory import BaseMemory
+from memary.memory.types import KnowledgeMemoryItem, MemoryItem
 
 
 class EntityKnowledgeStore(BaseMemory):

diff --git a/src/memory/memory_stream.py → src/memary/memory/memory_stream.py b/src/memory/memory_stream.py → src/memary/memory/memory_stream.py
@@ -2,8 +2,8 @@
 import logging
 from datetime import datetime
 
-from src.memory import BaseMemory
-from src.memory.types import MemoryItem
+from memary.memory import BaseMemory
+from memary.memory.types import MemoryItem
 
 logging.basicConfig(level=logging.INFO)
 

diff --git a/src/memory/types.py → src/memary/memory/types.py b/src/memory/types.py → src/memary/memory/types.py
diff --git a/src/memary/synonym_expand/__init__.py b/src/memary/synonym_expand/__init__.py
diff --git a/src/synonym_expand/output.py → src/memary/synonym_expand/output.py b/src/synonym_expand/output.py → src/memary/synonym_expand/output.py
diff --git a/src/synonym_expand/synonym.py → src/memary/synonym_expand/synonym.py b/src/synonym_expand/synonym.py → src/memary/synonym_expand/synonym.py
@@ -3,7 +3,7 @@
 from langchain_core.output_parsers import JsonOutputParser
 from typing import List
 import os
-from src.synonym_expand.output import Output
+from memary.synonym_expand.output import Output
 from dotenv import load_dotenv
 
 def custom_synonym_expand_fn(keywords: str) -> List[str]:

diff --git a/src/memory/__init__.py b/src/memory/__init__.py
diff --git a/streamlit_app/app.py b/streamlit_app/app.py
@@ -16,10 +16,9 @@
 parent_dir = os.path.dirname(curr_dir)
 #parent_dir = os.path.dirname(curr_dir) + '/memary' #Use this if error: src not found. Also move the '/streamlit_app/data' folder to the 'memary' folder, outside the 'src' folder.
 
-print(parent_dir)
-sys.path.append(parent_dir)
+sys.path.append(parent_dir + '/src')
 
-from src.agent.chat_agent import ChatAgent
+from memary.agent.chat_agent import ChatAgent
 
 load_dotenv()
 

diff --git a/streamlit_app/data/entity_knowledge_store.json b/streamlit_app/data/entity_knowledge_store.json
@@ -1 +1,57 @@
-[]
+[
+    {
+        "entity": "Walt disney pictures",
+        "count": 1,
+        "date": "2024-05-08T19:17:52"
+    },
+    {
+        "entity": "Disney",
+        "count": 1,
+        "date": "2024-05-08T19:17:52"
+    },
+    {
+        "entity": "Voices of tim allen",
+        "count": 1,
+        "date": "2024-05-08T19:17:52"
+    },
+    {
+        "entity": "Pixar",
+        "count": 1,
+        "date": "2024-05-08T19:17:52"
+    },
+    {
+        "entity": "Voices of don rickles",
+        "count": 1,
+        "date": "2024-05-08T19:17:52"
+    },
+    {
+        "entity": "Pixar animation studios",
+        "count": 1,
+        "date": "2024-05-08T19:17:52"
+    },
+    {
+        "entity": "American computer-animated film",
+        "count": 1,
+        "date": "2024-05-08T19:17:52"
+    },
+    {
+        "entity": "Toy story",
+        "count": 1,
+        "date": "2024-05-08T19:17:52"
+    },
+    {
+        "entity": "1995 american computer-animated buddy comedy film",
+        "count": 1,
+        "date": "2024-05-08T19:17:52"
+    },
+    {
+        "entity": "Voices of tom hanks",
+        "count": 1,
+        "date": "2024-05-08T19:17:52"
+    },
+    {
+        "entity": "Entirely computer-animated feature film",
+        "count": 1,
+        "date": "2024-05-08T19:17:52"
+    }
+]
diff --git a/streamlit_app/data/external_response.txt b/streamlit_app/data/external_response.txt
@@ -0,0 +1 @@
+Toy Story is a 1995 American computer-animated buddy comedy film produced by Pixar Animation Studios and released by Walt Disney Pictures. The feature film directorial debut of John Lasseter, it was the first entirely computer-animated feature film, as well as the first feature film from Pixar.
diff --git a/streamlit_app/data/memory_stream.json b/streamlit_app/data/memory_stream.json
@@ -1 +1,46 @@
-[]
+[
+    {
+        "entity": "American computer-animated film",
+        "date": "2024-05-08T19:17:52"
+    },
+    {
+        "entity": "Disney",
+        "date": "2024-05-08T19:17:52"
+    },
+    {
+        "entity": "Voices of tim allen",
+        "date": "2024-05-08T19:17:52"
+    },
+    {
+        "entity": "1995 american computer-animated buddy comedy film",
+        "date": "2024-05-08T19:17:52"
+    },
+    {
+        "entity": "Pixar",
+        "date": "2024-05-08T19:17:52"
+    },
+    {
+        "entity": "Voices of don rickles",
+        "date": "2024-05-08T19:17:52"
+    },
+    {
+        "entity": "Pixar animation studios",
+        "date": "2024-05-08T19:17:52"
+    },
+    {
+        "entity": "Entirely computer-animated feature film",
+        "date": "2024-05-08T19:17:52"
+    },
+    {
+        "entity": "Toy story",
+        "date": "2024-05-08T19:17:52"
+    },
+    {
+        "entity": "Walt disney pictures",
+        "date": "2024-05-08T19:17:52"
+    },
+    {
+        "entity": "Voices of tom hanks",
+        "date": "2024-05-08T19:17:52"
+    }
+]
diff --git a/streamlit_app/data/past_chat.json b/streamlit_app/data/past_chat.json
@@ -1 +1,10 @@
-[]
+[
+    {
+        "role": "user",
+        "content": "tell me about toy story"
+    },
+    {
+        "role": "user",
+        "content": "rag: Toy Story is a 1995 American computer-animated buddy comedy film produced by Pixar Animation Studios and released by Walt Disney Pictures. The feature film directorial debut of John Lasseter, it was the first entirely computer-animated feature film, as well as the first feature film from Pixar."
+    }
+]