Helm is an open-source LLM (Large Language Model) feature-based steering application inspired by Goodfire. The project leverages SAE (Sparse Autoencoder) feature clamping to steer and modulate the outputs of LLM responses, offering a unique and powerful way to control and direct AI-generated content.
Helm is designed to allow users to incorporate multiple feature applications simultaneously, delivering highly precise steering capabilities. This powerful tool can be used to refine the output of language models by clamping various features, ultimately orientating the AI's responses according to the user's needs.
To get started with Helm, follow the steps below:
git clone git@github.com:morgancmartin/helm.git
cd helm/
source venv/bin/activate
pip install -r requirements.txt
cd ./frontend/
npm run dev
Helm incorporates several advanced technologies and frameworks to achieve its functionalities:
- Model: Utilizes the GPT2-small model for generating responses.
- Frontend: Built using Remix for a seamless user interface experience.
- Transformer Hooking: Employs the HookedTransformer from the remarkable TransformerLens library.
- Sparse Autoencoders: Utilizes Joseph Bloom's Open Source Sparse Autoencoders across all Residual Stream Layers of GPT2-small. More details are available on Neuronpedia.
- Feature Search and Explanation: Utilizies Neuronpedia's feature search and explanation API.
Helm’s capability of simultaneous feature application ensures precision steering, making it exceptionally useful for applications requiring meticulous control over language model outputs.
For a live demonstration of Helm in action, please check out our video demo here.
Here are some useful resources and technologies associated with Helm:
- Goodfire
- TransformerLens
- Remix
- GPT2-Small SAEs on Hugging Face
- Neuronpedia: GPT2-Small Residual Stream SAEs
In future updates, Helm aims to introduce the following enhancements:
- Support for Multiple Models: Extend support beyond GPT2-small to incorporate a variety of models, enhancing flexibility and application scope.
- Max Activating Examples for Feature Cards: Enable the system to present maximum activating examples for feature cards, thereby providing improved insights and control.
Helm continues to evolve and aims to become an integral tool in steering the output of Large Language Models with unparalleled precision and ease. Contributions and feedback are always welcome in this ongoing journey of innovation!