Awesome Papers for Understanding LLM Mechanism

This list focuses on understanding the internal mechanism of large language models (LLM). Works in this list are accepted by top conferences (e.g. ICML, NeurIPS, ICLR, ACL, EMNLP, NAACL), or written by top research institutions.

Other paper lists focuses on SAE and neuron.

Paper recommendation (accepted by conferences): please contact me.

Papers

Survey

Mechanistic Interpretability for AI Safety A Review
- [2024.8] [safety]
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models
- [2024.7] [interpretability]
Internal Consistency and Self-Feedback in Large Language Models: A Survey
- [2024.7]
A Primer on the Inner Workings of Transformer-based Language Models
- [2024.5] [interpretability]
Usable XAI: 10 strategies towards exploiting explainability in the LLM era
- [2024.3] [interpretability]
A Comprehensive Overview of Large Language Models
- [2023.12] [LLM]
A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions
- [2023.11] [hallucination]
A Survey of Large Language Models
- [2023.11] [LLM]
Explainability for Large Language Models: A Survey
- [2023.11] [interpretability]
A Survey of Chain of Thought Reasoning: Advances, Frontiers and Future
- [2023.10] [chain of thought]
Instruction tuning for large language models: A survey
- [2023.10] [instruction tuning]
From Language Modeling to Instruction Following: Understanding the Behavior Shift in LLMs after Instruction Tuning
- [2023.9] [instruction tuning]
Siren’s Song in the AI Ocean: A Survey on Hallucination in Large Language Models
- [2023.9] [hallucination]
Reasoning with language model prompting: A survey
- [2023.9] [reasoning]
Toward Transparent AI: A Survey on Interpreting the Inner Structures of Deep Neural Networks
- [2023.8] [interpretability]
A Survey on In-context Learning
- [2023.6] [in-context learning]
Scaling Down to Scale Up: A Guide to Parameter-Efficient Fine-Tuning
- [2023.3] [parameter-efficient fine-tuning]

Other good LLM repos

https://github.com/ruizheliUOA/Awesome-Interpretability-in-Large-Language-Models (interpretability)
https://github.com/cooperleong00/Awesome-LLM-Interpretability?tab=readme-ov-file (interpretability)
https://github.com/JShollaj/awesome-llm-interpretability (interpretability)
https://github.com/IAAR-Shanghai/Awesome-Attention-Heads (attention)
https://github.com/zjunlp/KnowledgeEditingPapers (model editing)
https://github.com/Hannibal046/Awesome-LLM (LLM)

Why mechanistic interpretability?

From Insights to Actions: The Impact of Interpretability and Analysis Research on NLP

Interpretability Dreams

A Longlist of Theories of Impact for Interpretability

Name		Name	Last commit message	Last commit date
Latest commit History 101 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome Papers for Understanding LLM Mechanism

Papers

2024

2023

2022

2021

Survey

Other good LLM repos

Why mechanistic interpretability?

Recommended blogs

About

Releases

Packages

Contributors 2

zepingyu0512/awesome-llm-understanding-mechanism

Folders and files

Latest commit

History

Repository files navigation

Awesome Papers for Understanding LLM Mechanism

Papers

2024

2023

2022

2021

Survey

Other good LLM repos

Why mechanistic interpretability?

Recommended blogs

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Packages