Skip to content

LLMInspector: Comprehensive Evaluation and Testing for LLM Applications

Latest
Compare
Choose a tag to compare
@KiranPrasath-26 KiranPrasath-26 released this 23 May 06:26
· 15 commits to main since this release
4641ac0

📢 Overview

We are very excited to release Michelin's open-source library: LLMInspector! 🚀 This is our first major step towards building responsible AI.

LLMInspector is an open source library that can be used to evaluate end-to-end AI application, from creating test set to evaluation of LLMs.


🔥 Features

  • Generation of prompts from Goldendataset by exploding the prompts with tag augmentation and paraphrasing.
  • Generation of prompts with various perturbations applied to test the robustness of the LLM application.
  • Generation of question and ground truth from documents, that can be used for testing of RAG based application.
  • Evaluation of RAG based LLM application using LLM based evaluation metrics.
  • Evaluation of the LLM application through various accuracy based metrics, sentiment analysis, emotion analysis, PII detection, Readability scores.
  • Adversarial red team testing using curated datasets to probe for risks and vulnerabilities in LLM applications