Skip to content

Latest commit

 

History

History

mechinterp

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 

Mechanistic Interpretability

This folder contains instructions to run LLM360 models with packages implementing mechanistic intepretability methods and visualizations, such as TransformerLens.

Table of Contents

Installation

Note

This folder is a work-in-progress, more demos and instructions are coming soon.

Note that this installation is more involved than other analyses in this repository. It requires cloning and modifying the code within TranformerLens, as we must extend it for use with the LLM360 models.

To install TransformerLens for LLM360 models, please follow the steps below.

1. Clone TransformerLens repository and install dependencies.

Clone TransformerLens from github and install all dependencies via:

git clone https://github.com/TransformerLensOrg/TransformerLens.git
cd TransformerLens
pip install -e .

2. Edit TransformerLens code to add LLM360 models.

Next, edit the file: transformer_lens/loading_from_pretrained.py.

In the OFFICIAL_MODEL_NAMES list, add the line:

    "LLM360/Amber",

In the get_pretrained_state_dict method, on line 1730, add the code block:

            elif official_model_name.startswith("LLM360/Amber"):
                hf_model = AutoModelForCausalLM.from_pretrained(
                    official_model_name,
                    revision=f"ckpt_{cfg.checkpoint_value}",
                    torch_dtype=dtype,
                    token=huggingface_token,
                    **kwargs,
                )

3. Run TransformerLens.

Following the TransformerLens Getting Started and Main Demo Notebook, import transformer_lens from this repo and run the example code.