GitHub - awslabs/graphrag-toolkit: Python toolkit for building GraphRAG applications

GraphRAG Toolkit

The graphrag-toolkit is a Python toolkit for building GraphRAG applications. It provides a framework for automating the construction of a graph from unstructured data, and composing question-answering strategies that query this graph when answering user questions.

The toolkit uses low-level LlamaIndex components – data connectors, metadata extractors, and transforms – to implement much of the graph construction process. By default, the toolkit uses Amazon Neptune Analytics or Amazon Neptune Database for its graph store, and Neptune Analytics or Amazon OpenSearch Serverless for its vector store, but it also provides extensibility points for adding alternative graph stores and vector stores. The default backend for LLMs and embedding models is Amazon Bedrock; but, as with the stores, the toolkit can be configured for other LLM and embedding model backends using LlamaIndex abstractions.

If you're running on AWS, there's a quick start AWS CloudFormation template in the examples directory. Note that you must run your application in an AWS region containing the Amazon Bedrock foundation models used by the toolkit (see the configuration section in the documentation for details on the default models used), and must enable access to these models before running any part of the solution.

Installation

The graphrag-toolkit requires python and pip to install. You can install the graphrag-toolkit using pip:

$ pip install https://github.com/awslabs/graphrag-toolkit/archive/refs/tags/v1.1.2.zip

Supported Python versions

The graphrag-toolkit requires Python 3.10 or greater.

Example of use

Indexing

import os

from graphrag_toolkit import LexicalGraphIndex
from graphrag_toolkit.storage import GraphStoreFactory
from graphrag_toolkit.storage import VectorStoreFactory

from llama_index.readers.web import SimpleWebPageReader

import nest_asyncio
nest_asyncio.apply()

def run_extract_and_build():

    graph_store = GraphStoreFactory.for_graph_store(
        'neptune-db://my-graph.cluster-abcdefghijkl.us-east-1.neptune.amazonaws.com'
    )
    
    vector_store = VectorStoreFactory.for_vector_store(
        'aoss://https://abcdefghijkl.us-east-1.aoss.amazonaws.com'
    )

    graph_index = LexicalGraphIndex(
        graph_store, 
        vector_store
    )

    doc_urls = [
        'https://docs.aws.amazon.com/neptune/latest/userguide/intro.html',
        'https://docs.aws.amazon.com/neptune-analytics/latest/userguide/what-is-neptune-analytics.html',
        'https://docs.aws.amazon.com/neptune-analytics/latest/userguide/neptune-analytics-features.html',
        'https://docs.aws.amazon.com/neptune-analytics/latest/userguide/neptune-analytics-vs-neptune-database.html'
    ]

    docs = SimpleWebPageReader(
        html_to_text=True,
        metadata_fn=lambda url:{'url': url}
    ).load_data(doc_urls)

    graph_index.extract_and_build(docs, show_progress=True)

if __name__ == '__main__':
    run_extract_and_build()

Querying

from graphrag_toolkit import LexicalGraphQueryEngine
from graphrag_toolkit.storage import GraphStoreFactory
from graphrag_toolkit.storage import VectorStoreFactory

import nest_asyncio
nest_asyncio.apply()

def run_query():

  graph_store = GraphStoreFactory.for_graph_store(
      'neptune-db://my-graph.cluster-abcdefghijkl.us-east-1.neptune.amazonaws.com'
  )
  
  vector_store = VectorStoreFactory.for_vector_store(
      'aoss://https://abcdefghijkl.us-east-1.aoss.amazonaws.com'
  )
  
  query_engine = LexicalGraphQueryEngine.for_traversal_based_search(
      graph_store, 
      vector_store
  )
  
  response = query_engine.query('''What are the differences between Neptune Database 
                                   and Neptune Analytics?''')
  
  print(response.response)
  
if __name__ == '__main__':
    run_query()

Name		Name	Last commit message	Last commit date
Latest commit History 195 Commits
docs		docs
examples		examples
images		images
src		src
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GraphRAG Toolkit

Installation

Supported Python versions

Example of use

Indexing

Querying

Documentation

Security

License

About

Releases 5

Packages

Contributors 4

Languages

License

awslabs/graphrag-toolkit

Folders and files

Latest commit

History

Repository files navigation

GraphRAG Toolkit

Installation

Supported Python versions

Example of use

Indexing

Querying

Documentation

Security

License

About

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases 5

Packages 0

Contributors 4

Languages

Packages