This solution accelerator is designed as an end-to-end example of a Legal Research Copilot application. It demonstrates the implementation of three information retrieval techniques: vector search, semantic ranking, and GraphRAG on Azure Database for PostgreSQL, and illustrates how they can be combined to deliver high quality responses to legal research questions. The app uses the U.S. Case Law dataset of 0.5 million legal cases as a source of the factual data. For more details on these concepts, please see the accompanying blog to this solution accelerator here.
As the architecture diagram shows, this solution accelerator brings together vector search, semantic ranking, and GraphRAG. Here are some highlights, including the information retrieval pipeline:
Semantic Ranking
Enhances vector search accuracy by re-ranking results with a semantic ranker model, significantly improving top results' relevance (e.g., up to a 10–20% boost in NDCG@10 accuracy). The semantic ranker is available as a standalone solution accelerator, detailed in the blog: Introducing Semantic Ranker Solution Accelerator for Azure Database for PostgreSQL.GraphRAG
An advanced RAG technique proposed by Microsoft Research to improve quality of RAG system responses by extracting knowledge graph from the source data and leveraging it to provide better context to the LLM. The GraphRAG technique consists of three high level steps: 1. Graph extraction 2. Entity summarization 3. Graph query generation at query timeInformation Retrieval Pipeline
We leverage the structure of the citation graph at the query time by using specialized graph query. The graph query is designed to use the prominence of the legal cases as a signal to improve the accuracy of the information retrieval pipeline. The graph query is expressed as a mixture of traditional relational query and OpenCypher graph query and executed on Postgres using the Apache AGE extension. The resulting information retrieval pipeline is shown below.For related solution accelerators and articles please see the following:
- Introducing GraphRAG Solution for Azure Database for PostgreSQL
- Semantic Ranker Solution Accelerator for Azure Database for PostgreSQL
- GraphRAG: Unlocking LLM discovery on narrative private data
- Reciprocal Rank Fusion (RRF) explained in 4 mins
The steps below guides you to deploy the Azure services necessary for this solution accelerator into your Azure subscription.
👉 Follow the steps in the repo above first before moving on to the deployment steps below.
👉 Once you deploy this accelerator, notate the "/score"
REST endpoint URI and the key. You will need these in the steps below when deploying.
- Enter the following to clone the GitHub repo containing exercise resources:
git clone https://github.com/Azure-Samples/graphrag-legalcases-postgres.git cd graphrag-legalcases-postgres
- Use sample .env to create your own .env
cp .env.sample .env
- Edit your new .env file to add your Azure ML Semantic Ranker endpoints
- Use the values obtained during the Prerequisite Steps above.
- Replace the values between the {} with your values for each.
AZURE_ML_SCORING_ENDPOINT={YOUR-AZURE-ML-ENDPOINT} AZURE_ML_ENDPOINT_KEY={YOUR-AZURE-ML-ENDPOINT-KEY}
- Login to your Azure account
azd auth login
- Provision the resources
azd up
- Enter a name that will be used for the resource group.
- This will provision Azure resources and deploy this sample to those resources, including Azure Database for PostgreSQL Flexible Server, Azure OpenAI service, and Azure Container App Service.
This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.
When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repositories using our CLA.
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.
This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft’s Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party’s policies.