Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GraphRAG QA Example #1036

Closed
wants to merge 3 commits into from
Closed

GraphRAG QA Example #1036

wants to merge 3 commits into from

Conversation

akollegger
Copy link

Description

An example of GraphRAG using the SEC Edgar dataset with tool-based access over unstructured, structured, and semistructured data.

Issues

n/a

Type of change

List the type of change like below. Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds new functionality)
  • Breaking change (fix or feature that would break existing design and interface)
  • Others (enhancement, documentation, validation, etc.)

Dependencies

  • custom Streamlit UI
  • calls out to OpenAI for embedding + chat completion

Tests

Describe the tests that you ran to verify your changes.

@rbrugaro
Copy link
Collaborator

@akollegger great to see you contribution! looking forward to trying it out.

I have been working on the below PRs to integrate GraphRAG (microsoft pipeline: graph extraction from text, clustering, community summaries, partial answers, final answer…) using llama-index and neo4j. Please check it out!
#1007
opea-project/GenAIComps#793

What do you think about these two approaches used jointly with an agent in real deployments? I am thinking: agent based on query type can decide if this is better answered by KG query or "microsoft graphRAG" for broader corpus theme/multihop queries. This can also be an efficient cost optimization strategy since "microsoft graphRAG" is more expensive with many more LLM calls.

@arun-gupta

@chensuyue
Copy link
Collaborator

We can't distribute a dataset in this repo. Better to ask user download from public. If it's a new dataset, we need to distribute somewhere else after the legal process done.

@akollegger
Copy link
Author

We can't distribute a dataset in this repo. Better to ask user download from public. If it's a new dataset, we need to distribute somewhere else after the legal process done.

Understood. This data is the last stage of processing public data set from https://www.sec.gov/search-filings/edgar-search-assistance/accessing-edgar-data . I will look for a reasonable public host for the files.

@rbrugaro
Copy link
Collaborator

@akollegger have you had a chance to try to integrate your cypher and company tool into the AgentQnA example? we should create a PR towards that example instead of a completely new example since there will be lots of overlap.

  1. Add a new graph worker agent (worker_agent_tools.py and worker_agent_tools.yaml) https://github.com/opea-project/GenAIExamples/tree/main/AgentQnA/tools with cyper and company tools
  2. Add a new compose to include graphDB and new graph_worker_agent: https://github.com/opea-project/GenAIExamples/tree/main/AgentQnA/docker_compose/intel/cpu/xeon . We are looking into your suggestion on a compose based on selected tools but until then let's just add a new compose for your graphRAG use case.

Once we can get this working with openai model we can swap the model inference and embedding microservices on CPU/gaudi.

@akollegger akollegger closed this Dec 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants