📢 Overview
We are very excited to release Michelin's open-source library: LLMInspector! 🚀 This is our first major step towards building responsible AI.
LLMInspector is an open source library that can be used to evaluate end-to-end AI application, from creating test set to evaluation of LLMs.
🔥 Features
- Generation of prompts from Goldendataset by exploding the prompts with tag augmentation and paraphrasing.
- Generation of prompts with various perturbations applied to test the robustness of the LLM application.
- Generation of question and ground truth from documents, that can be used for testing of RAG based application.
- Evaluation of RAG based LLM application using LLM based evaluation metrics.
- Evaluation of the LLM application through various accuracy based metrics, sentiment analysis, emotion analysis, PII detection, Readability scores.
- Adversarial red team testing using curated datasets to probe for risks and vulnerabilities in LLM applications