Helm

Helm is an open-source LLM (Large Language Model) feature-based steering application inspired by Goodfire. The project leverages SAE (Sparse Autoencoder) feature clamping to steer and modulate the outputs of LLM responses, offering a unique and powerful way to control and direct AI-generated content.

Introduction

Helm is designed to allow users to incorporate multiple feature applications simultaneously, delivering highly precise steering capabilities. This powerful tool can be used to refine the output of language models by clamping various features, ultimately orientating the AI's responses according to the user's needs.

Usage

To get started with Helm, follow the steps below:

git clone git@github.com:morgancmartin/helm.git
cd helm/
source venv/bin/activate
pip install -r requirements.txt
cd ./frontend/
npm run dev

Implementation Details

Helm incorporates several advanced technologies and frameworks to achieve its functionalities:

Model: Utilizes the GPT2-small model for generating responses.
Frontend: Built using Remix for a seamless user interface experience.
Transformer Hooking: Employs the HookedTransformer from the remarkable TransformerLens library.
Sparse Autoencoders: Utilizes Joseph Bloom's Open Source Sparse Autoencoders across all Residual Stream Layers of GPT2-small. More details are available on Neuronpedia.
Feature Search and Explanation: Utilizies Neuronpedia's feature search and explanation API.

Helm’s capability of simultaneous feature application ensures precision steering, making it exceptionally useful for applications requiring meticulous control over language model outputs.

Demo

For a live demonstration of Helm in action, please check out our video demo here.

Links

Here are some useful resources and technologies associated with Helm:

Goodfire
TransformerLens
Remix
GPT2-Small SAEs on Hugging Face
Neuronpedia: GPT2-Small Residual Stream SAEs

Future Improvements

In future updates, Helm aims to introduce the following enhancements:

Support for Multiple Models: Extend support beyond GPT2-small to incorporate a variety of models, enhancing flexibility and application scope.
Max Activating Examples for Feature Cards: Enable the system to present maximum activating examples for feature cards, thereby providing improved insights and control.

Helm continues to evolve and aims to become an integral tool in steering the output of Large Language Models with unparalleled precision and ease. Contributions and feedback are always welcome in this ongoing journey of innovation!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Helm

Table of Contents

Introduction

Usage

Implementation Details

Demo

Links

Future Improvements

Files

README.md

Latest commit

History

README.md

File metadata and controls

Helm

Table of Contents

Introduction

Usage

Implementation Details

Demo

Links

Future Improvements