Skip to content

Latest commit

 

History

History
72 lines (47 loc) · 3.32 KB

README.md

File metadata and controls

72 lines (47 loc) · 3.32 KB

Helm

Helm Logo

Helm is an open-source LLM (Large Language Model) feature-based steering application inspired by Goodfire. The project leverages SAE (Sparse Autoencoder) feature clamping to steer and modulate the outputs of LLM responses, offering a unique and powerful way to control and direct AI-generated content.

Table of Contents

Introduction

Helm is designed to allow users to incorporate multiple feature applications simultaneously, delivering highly precise steering capabilities. This powerful tool can be used to refine the output of language models by clamping various features, ultimately orientating the AI's responses according to the user's needs.

Usage

To get started with Helm, follow the steps below:

git clone git@github.com:morgancmartin/helm.git
cd helm/
source venv/bin/activate
pip install -r requirements.txt
cd ./frontend/
npm run dev

Implementation Details

Helm incorporates several advanced technologies and frameworks to achieve its functionalities:

  • Model: Utilizes the GPT2-small model for generating responses.
  • Frontend: Built using Remix for a seamless user interface experience.
  • Transformer Hooking: Employs the HookedTransformer from the remarkable TransformerLens library.
  • Sparse Autoencoders: Utilizes Joseph Bloom's Open Source Sparse Autoencoders across all Residual Stream Layers of GPT2-small. More details are available on Neuronpedia.
  • Feature Search and Explanation: Utilizies Neuronpedia's feature search and explanation API.

Helm’s capability of simultaneous feature application ensures precision steering, making it exceptionally useful for applications requiring meticulous control over language model outputs.

Demo

For a live demonstration of Helm in action, please check out our video demo here.

Links

Here are some useful resources and technologies associated with Helm:

Future Improvements

In future updates, Helm aims to introduce the following enhancements:

  • Support for Multiple Models: Extend support beyond GPT2-small to incorporate a variety of models, enhancing flexibility and application scope.
  • Max Activating Examples for Feature Cards: Enable the system to present maximum activating examples for feature cards, thereby providing improved insights and control.

Helm continues to evolve and aims to become an integral tool in steering the output of Large Language Models with unparalleled precision and ease. Contributions and feedback are always welcome in this ongoing journey of innovation!