Release LLMInspector: Comprehensive Evaluation and Testing for LLM Applications · michelin/LLMInspector

📢 Overview

We are very excited to release Michelin's open-source library: LLMInspector! 🚀 This is our first major step towards building responsible AI.

LLMInspector is an open source library that can be used to evaluate end-to-end AI application, from creating test set to evaluation of LLMs.

Generation of prompts from Goldendataset by exploding the prompts with tag augmentation and paraphrasing.
Generation of prompts with various perturbations applied to test the robustness of the LLM application.
Generation of question and ground truth from documents, that can be used for testing of RAG based application.
Evaluation of RAG based LLM application using LLM based evaluation metrics.
Evaluation of the LLM application through various accuracy based metrics, sentiment analysis, emotion analysis, PII detection, Readability scores.
Adversarial red team testing using curated datasets to probe for risks and vulnerabilities in LLM applications