This repository contains code for training a sparse autoencoder on activations of LLMs, as in Anthropic's Towards Monosemanticity, as well as an analysis of how feature directions depend on both cooccurrence and LLM output similarity.
See analysis.ipynb.