Awesome llm misc

Awesome llm misc
- Survey
  - Blogs
- Toolkits
- Unlearning
- Personality
- World Model
- Red teaming (safety)
- Forecasting
- Chat arena
- State Space Model
- New model
- LLM detection
- Interpretability
- Generaliazation
- LLM editting
- AGI insights
- Callibration
- Books
- Privacy
- Misc
- Impacts
- Course & Tutorial
  - CUDA
- Extra reference

Survey

agi-survey - ulab-uiuc
A Survey on Self-Evolution of Large Language Models, arXiv, 2404.14387, arxiv, pdf, cication: -1

Zhengwei Tao, Ting-En Lin, Xiancai Chen, Hangyu Li, Yuchuan Wu, Yongbin Li, Zhi Jin, Fei Huang, Dacheng Tao, Jingren Zhou
State Space Model for New-Generation Network Alternative to Transformers: A Survey, arXiv, 2404.09516, arxiv, pdf, cication: -1

Xiao Wang, Shiao Wang, Yuhe Ding, Yuehang Li, Wentao Wu, Yao Rong, Weizhe Kong, Ju Huang, Shihao Li, Haoxiang Yang
Multilingual Large Language Model: A Survey of Resources, Taxonomy and Frontiers, arXiv, 2404.04925, arxiv, pdf, cication: -1

Libo Qin, Qiguang Chen, Yuhang Zhou, Zhi Chen, Yinghui Li, Lizi Liao, Min Li, Wanxiang Che, Philip S. Yu
A Survey on Multilingual Large Language Models: Corpora, Alignment, and Bias, arXiv, 2404.00929, arxiv, pdf, cication: -1

Yuemei Xu, Ling Hu, Jiayi Zhao, Zihan Qiu, Yuqi Ye, Hanwen Gu
ChatGPT Alternative Solutions: Large Language Models Survey, arXiv, 2403.14469, arxiv, pdf, cication: -1

Hanieh Alipour, Nick Pendar, Kohinoor Roy
Large Language Models and Causal Inference in Collaboration: A Comprehensive Survey, arXiv, 2403.09606, arxiv, pdf, cication: -1

Xiaoyu Liu, Paiheng Xu, Junda Wu, Jiaxin Yuan, Yifan Yang, Yuhang Zhou, Fuxiao Liu, Tianrui Guan, Haoliang Wang, Tong Yu
Knowledge Conflicts for LLMs: A Survey, arXiv, 2403.08319, arxiv, pdf, cication: -1

Rongwu Xu, Zehan Qi, Cunxiang Wang, Hongru Wang, Yue Zhang, Wei Xu
Large Language Models on Tabular Data -- A Survey, arXiv, 2402.17944, arxiv, pdf, cication: -1

Xi Fang, Weijie Xu, Fiona Anting Tan, Jiani Zhang, Ziqing Hu, Yanjun Qi, Scott Nickleach, Diego Socolinsky, Srinivasan Sengamedu, Christos Faloutsos
Large Language Models and Games: A Survey and Roadmap, arXiv, 2402.18659, arxiv, pdf, cication: -1

Roberto Gallotta, Graham Todd, Marvin Zammit, Sam Earle, Antonios Liapis, Julian Togelius, Georgios N. Yannakakis
A Survey on Recent Advances in LLM-Based Multi-turn Dialogue Systems, arXiv, 2402.18013, arxiv, pdf, cication: -1

Zihao Yi, Jiarui Ouyang, Yuwen Liu, Tianhao Liao, Zhe Xu, Ying Shen
A Survey of Large Language Models in Cybersecurity, arXiv, 2402.16968, arxiv, pdf, cication: -1

Gabriel de Jesus Coelho da Silva, Carlos Becker Westphall
Large Language Models for Data Annotation: A Survey, arXiv, 2402.13446, arxiv, pdf, cication: -1

Zhen Tan, Alimohammad Beigi, Song Wang, Ruocheng Guo, Amrita Bhattacharjee, Bohan Jiang, Mansooreh Karami, Jundong Li, Lu Cheng, Huan Liu
Large Language Models: A Survey, arXiv, 2402.06196, arxiv, pdf, cication: -1

Shervin Minaee, Tomas Mikolov, Narjes Nikzad, Meysam Chenaghlu, Richard Socher, Xavier Amatriain, Jianfeng Gao
Continual Learning for Large Language Models: A Survey, arXiv, 2402.01364, arxiv, pdf, cication: -1

Tongtong Wu, Linhao Luo, Yuan-Fang Li, Shirui Pan, Thuy-Trang Vu, Gholamreza Haffari
From Google Gemini to OpenAI Q (Q-Star): A Survey of Reshaping the Generative Artificial Intelligence (AI) Research Landscape*, arXiv, 2312.10868, arxiv, pdf, cication: -1

Timothy R. McIntosh, Teo Susnjak, Tong Liu, Paul Watters, Malka N. Halgamuge
A Survey of Large Language Models Attribution, arXiv, 2311.03731, arxiv, pdf, cication: -1

Dongfang Li, Zetian Sun, Xinshuo Hu, Zhenyu Liu, Ziyang Chen, Baotian Hu, Aiguo Wu, Min Zhang · (awesome-llm-attributions - HITsz-TMG)
On the Origin of LLMs: An Evolutionary Tree and Graph for 15,821 Large Language Models, arXiv, 2307.09793, arxiv, pdf, cication: 1

Sarah Gao, Andrew Kean Gao · (constellation.sites.stanford)
A Survey of Large Language Models, arXiv, 2303.18223, arxiv, pdf, cication: 285

Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong · (LLMSurvey - RUCAIBox)

Blogs

2023, year of open LLMs
Research Papers in November 2023
AI and Open Source in 2023 - by Sebastian Raschka, PhD
The History of Open-Source LLMs: Imitation and Alignment (Part Three)
Research Papers (October 2023) - by Sebastian Raschka, PhD
A Survey of Techniques for Maximizing LLM Performance
Transformer Taxonomy (the last lit review) | kipply's blog

· (jiqizhixin)
Catching up on the weird world of LLMs

Toolkits

chat-langchain - langchain-ai
ChatGPT4 - yuntian-deng 🤗
amazing-openai-api - soulteary

Convert different model APIs into the OpenAI API format out of the box.
jan - janhq

Jan is an open source alternative to ChatGPT that runs 100% offline on your computer
GPT_API_free - chatanywhere

Free ChatGPT API Key，免费ChatGPT API，支持GPT4 API（免费），ChatGPT国内可用免费转发API，直连无需代理。可以搭配ChatBox等软件/插件使用，极大降低接口使用成本。国内即可无限制畅快聊天。
BricksLLM - bricks-cloud

Simplifying LLM ops in production
skypilot - skypilot-org

SkyPilot: Run LLMs, AI, and Batch jobs on any cloud. Get maximum savings, highest GPU availability, and managed execution—all with a simple interface.

· (blog.skypilot)
vllm - vllm-project

A high-throughput and memory-efficient inference and serving engine for LLMs
langflow - logspace-ai

⛓️ LangFlow is a UI for LangChain, designed with react-flow to provide an effortless way to experiment and prototype flows.
torchscale - microsoft

Foundation Architecture for (M)LLMs
LLM-As-Chatbot - deep-diver

LLM as a Chatbot Service
Llama-2-Open-Source-LLM-CPU-Inference - kennethleungty

Running Llama 2 and other Open-Source LLMs on CPU Inference Locally for Document Q&A
ollama - jmorganca

Get up and running with large language models locally
OpenLLM - bentoml

An open platform for operating large language models (LLMs) in production. Fine-tune, serve, deploy, and monitor any LLMs with ease.
litellm - BerriAI

Call all LLM APIs using the OpenAI format. Use Bedrock, Azure, OpenAI, Cohere, Anthropic, Ollama, Sagemaker, HuggingFace, Replicate (100+ LLMs)
ollama - jmorganca

Get up and running with Llama 2 and other large language models locally
gpu_poor - RahulSChand

Calculate GPU memory requirement & breakdown for training/inference of LLM models. Supports ggml/bnb quantization
leptonai - leptonai

A Pythonic framework to simplify AI service building
exllamav2 - turboderp

A fast inference library for running LLMs locally on modern consumer-class GPUs
outlines - normal-computing

Generative Model Programming
one-api - songquanpeng

OpenAI 接口管理 & 分发系统，支持 Azure、Anthropic Claude、Google PaLM 2、智谱 ChatGLM、百度文心一言、讯飞星火认知以及阿里通义千问，可用于二次分发管理 key，仅单可执行文件，已打包好 Docker 镜像，一键部署，开箱即用. OpenAI key management & redistribution system, using a single API for all LLMs, and features an English UI.
LLaMA2-Accessory - Alpha-VLLM

An Open-source Toolkit for LLM Development
Flowise - FlowiseAI

Drag & drop UI to build your customized LLM flow
simpleaichat - minimaxir

Python package for easily interfacing with chat apps, with robust features and minimal code complexity.
TypeChat - Microsoft

TypeChat is a library that makes it easy to build natural language interfaces using types.
petals - bigscience-workshop

🌸 Run large language models at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading
chatbox - Bin-Huang

Chatbox is a desktop app for GPT/LLM that supports Windows, Mac, Linux & Web Online
h2o-llmstudio - h2oai

H2O LLM Studio - a framework and no-code GUI for fine-tuning LLMs
LMFlow - OptimalScale

An Extensible Toolkit for Finetuning and Inference of Large Foundation Models. Large Model for All.
FlagAI - FlagAI-Open

FlagAI (Fast LArge-scale General AI models) is a fast, easy-to-use and extensible toolkit for large-scale model.

Unlearning

To Forget or Not? Towards Practical Knowledge Unlearning for Large Language Models, arXiv, 2407.01920, arxiv, pdf, cication: -1

Bozhong Tian, Xiaozhuan Liang, Siyuan Cheng, Qingbin Liu, Mengru Wang, Dianbo Sui, Xi Chen, Huajun Chen, Ningyu Zhang

· (KnowUnDo - zjunlp)
UnUnlearning: Unlearning is not sufficient for content regulation in advanced generative AI, arXiv, 2407.00106, arxiv, pdf, cication: -1

Ilia Shumailov, Jamie Hayes, Eleni Triantafillou, Guillermo Ortiz-Jimenez, Nicolas Papernot, Matthew Jagielski, Itay Yona, Heidi Howard, Eugene Bagdasaryan
What makes unlearning hard and what to do about it, arXiv, 2406.01257, arxiv, pdf, cication: -1

Kairan Zhao, Meghdad Kurmanji, George-Octavian Bărbulescu, Eleni Triantafillou, Peter Triantafillou
The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning, arXiv, 2403.03218, arxiv, pdf, cication: -1

Nathaniel Li, Alexander Pan, Anjali Gopal, Summer Yue, Daniel Berrios, Alice Gatti, Justin D. Li, Ann-Kathrin Dombrowski, Shashwat Goel, Long Phan
- The WMDP benchmark is a curated dataset of over 4,000 questions designed to gauge and mitigate LLMs' knowledge in areas with misuse potential, such as biosecurity and cybersecurity.
Machine Unlearning of Pre-trained Large Language Models, arXiv, 2402.15159, arxiv, pdf, cication: -1

Jin Yao, Eli Chien, Minxin Du, Xinyao Niu, Tianhao Wang, Zezhou Cheng, Xiang Yue · (Unlearning_LLM - yaojin17)
TOFU: A Task of Fictitious Unlearning for LLMs, arXiv, 2401.06121, arxiv, pdf, cication: -1

Pratyush Maini, Zhili Feng, Avi Schwarzschild, Zachary C. Lipton, J. Zico Kolter
Large Language Model Unlearning, arXiv, 2310.10683, arxiv, pdf, cication: -1

Yuanshun Yao, Xiaojun Xu, Yang Liu

· (jiqizhixin) · (llm_unlearn - kevinyaobytedance)
Improving Language Plasticity via Pretraining with Active Forgetting, arXiv, 2307.01163, arxiv, pdf, cication: -1

Yihong Chen, Kelly Marchisio, Roberta Raileanu, David Ifeoluwa Adelani, Pontus Stenetorp, Sebastian Riedel, Mikel Artetxe
Announcing the first Machine Unlearning Challenge – Google Research Blog

Personality

Large Language Models Understand and Can be Enhanced by Emotional Stimuli, arXiv, 2307.11760, arxiv, pdf, cication: 6

Cheng Li, Jindong Wang, Yixuan Zhang, Kaijie Zhu, Wenxin Hou, Jianxun Lian, Fang Luo, Qiang Yang, Xing Xie
When Large Language Models Meet Personalization: Perspectives of Challenges and Opportunities, arXiv, 2307.16376, arxiv, pdf, cication: 7

Jin Chen, Zheng Liu, Xu Huang, Chenwang Wu, Qi Liu, Gangwei Jiang, Yuanhao Pu, Yuxuan Lei, Xiaolong Chen, Xingmei Wang
Personality Traits in Large Language Models, arXiv, 2307.00184, arxiv, pdf, cication: 17

Greg Serapio-García, Mustafa Safdari, Clément Crepy, Luning Sun, Stephen Fitz, Peter Romero, Marwa Abdulhai, Aleksandra Faust, Maja Matarić

World Model

Efficient World Models with Context-Aware Tokenization, arXiv, 2406.19320, arxiv, pdf, cication: -1

Vincent Micheli, Eloi Alonso, François Fleuret

· (delta-iris - vmicheli)
Can Language Models Serve as Text-Based World Simulators?, arXiv, 2406.06485, arxiv, pdf, cication: -1

Ruoyao Wang, Graham Todd, Ziang Xiao, Xingdi Yuan, Marc-Alexandre Côté, Peter Clark, Peter Jansen
Cognitively Inspired Energy-Based World Models, arXiv, 2406.08862, arxiv, pdf, cication: -1

Alexi Gladstone, Ganesh Nanduru, Md Mofijul Islam, Aman Chadha, Jundong Li, Tariq Iqbal
Pandora - maitrix-org

Pandora: Towards General World Model with Natural Language Actions and Video States · (world-model.maitrix)
iVideoGPT: Interactive VideoGPTs are Scalable World Models, arXiv, 2405.15223, arxiv, pdf, cication: -1

Jialong Wu, Shaofeng Yin, Ningya Feng, Xu He, Dong Li, Jianye Hao, Mingsheng Long
Diffusion for World Modeling: Visual Details Matter in Atari, arXiv, 2405.12399, arxiv, pdf, cication: -1

Eloi Alonso, Adam Jelley, Vincent Micheli, Anssi Kanervisto, Amos Storkey, Tim Pearce, François Fleuret · (diamond - eloialonso)
Robust agents learn causal world models, arXiv, 2402.10877, arxiv, pdf, cication: -1

Jonathan Richens, Tom Everitt
Learning and Leveraging World Models in Visual Representation Learning, arXiv, 2403.00504, arxiv, pdf, cication: -1

Quentin Garrido, Mahmoud Assran, Nicolas Ballas, Adrien Bardes, Laurent Najman, Yann LeCun
Video as the New Language for Real-World Decision Making, arXiv, 2402.17139, arxiv, pdf, cication: -1

Sherry Yang, Jacob Walker, Jack Parker-Holder, Yilun Du, Jake Bruce, Andre Barreto, Pieter Abbeel, Dale Schuurmans

· (mp.weixin.qq)
Genie: Generative Interactive Environments, arXiv, 2402.15391, arxiv, pdf, cication: -1

Jake Bruce, Michael Dennis, Ashley Edwards, Jack Parker-Holder, Yuge Shi, Edward Hughes, Matthew Lai, Aditi Mavalankar, Richie Steigerwald, Chris Apps
Diffusion World Model, arXiv, 2402.03570, arxiv, pdf, cication: -1

Zihan Ding, Amy Zhang, Yuandong Tian, Qinqing Zheng
The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets, arXiv, 2310.06824, arxiv, pdf, cication: -1

Samuel Marks, Max Tegmark · (mp.weixin.qq)
Language Models Represent Space and Time, arXiv, 2310.02207, arxiv, pdf, cication: 2

Wes Gurnee, Max Tegmark · (world-models - wesg52)
How far are we from AGI?

· (mp.weixin.qq)
what a world model is
图灵奖得主杨立昆教授在哈佛大学数学系演讲稿——关于人工智能世界新模型
图灵奖得主LeCun最新专访：为什么物理世界终将成为LLM的「死穴」？
OpenAI「登月计划」剑指超级AI！LeCun提出AGI之路七阶段，打造世界模型是首位

Forecasting

Chronos: Learning the Language of Time Series, arXiv, 2403.07815, arxiv, pdf, cication: -1

Abdul Fatir Ansari, Lorenzo Stella, Caner Turkmen, Xiyuan Zhang, Pedro Mercado, Huibin Shen, Oleksandr Shchur, Syama Sundar Rangapuram, Sebastian Pineda Arango, Shubham Kapoor
Lag-Llama: Towards Foundation Models for Probabilistic Time Series Forecasting, arXiv, 2310.08278, arxiv, pdf, cication: 11

Kashif Rasul, Arjun Ashok, Andrew Robert Williams, Hena Ghonia, Rishika Bhagwatkar, Arian Khorasani, Mohammad Javad Darvishi Bayazi, George Adamopoulos, Roland Riachi, Nadhir Hassen · (lag-llama - time-series-foundation-models)
Time-LLM: Time Series Forecasting by Reprogramming Large Language Models, arXiv, 2310.01728, arxiv, pdf, cication: 17

Ming Jin, Shiyu Wang, Lintao Ma, Zhixuan Chu, James Y. Zhang, Xiaoming Shi, Pin-Yu Chen, Yuxuan Liang, Yuan-Fang Li, Shirui Pan · (time-llm - kimmeen)
A decoder-only foundation model for time-series forecasting, arXiv, 2310.10688, arxiv, pdf, cication: 2

Abhimanyu Das, Weihao Kong, Rajat Sen, Yichen Zhou · (jiqizhixin)

Chat arena

No-code LLM fine-tuning and evaluation at scale – Airtrain.ai
Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference, arXiv, 2403.04132, arxiv, pdf, cication: -1

Wei-Lin Chiang, Lianmin Zheng, Ying Sheng, Anastasios Nikolas Angelopoulos, Tianle Li, Dacheng Li, Hao Zhang, Banghua Zhu, Michael Jordan, Joseph E. Gonzalez
GodMode - smol-ai

AI Chat Browser: Fast, Full webapp access to ChatGPT / Claude / Bard / Bing / Llama2! I use this 20 times a day.
ChatALL - sunner

Concurrently chat with ChatGPT, Bing Chat, Bard, Alpaca, Vicuna, Claude, ChatGLM, MOSS, 讯飞星火, 文心一言 and more, discover the best answers

State Space Model

Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling, arXiv, 2406.07522, arxiv, pdf, cication: -1

Liliang Ren, Yang Liu, Yadong Lu, Yelong Shen, Chen Liang, Weizhu Chen
An Empirical Study of Mamba-based Language Models, arXiv, 2406.07887, arxiv, pdf, cication: -1

Roger Waleffe, Wonmin Byeon, Duncan Riach, Brandon Norick, Vijay Korthikanti, Tri Dao, Albert Gu, Ali Hatamizadeh, Sudhakar Singh, Deepak Narayanan
Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality, arXiv, 2405.21060, arxiv, pdf, cication: -1

Tri Dao, Albert Gu · (mamba - state-spaces) · (goombalab.github)
Zamba: A Compact 7B SSM Hybrid Model, arXiv, 2405.16712, arxiv, pdf, cication: -1

Paolo Glorioso, Quentin Anthony, Yury Tokpanov, James Whittington, Jonathan Pilault, Adam Ibrahim, Beren Millidge
mamba-7b-rw - TRI-ML 🤗
The Illusion of State in State-Space Models, arXiv, 2404.08819, arxiv, pdf, cication: -1

William Merrill, Jackson Petty, Ashish Sabharwal
Zamba — Zyphra

· (twitter)
MambaMixer: Efficient Selective State Space Models with Dual Token and Channel Selection, arXiv, 2403.19888, arxiv, pdf, cication: -1

Ali Behrouz, Michele Santacatterina, Ramin Zabih

· (mambamixer.github)
Jamba: A Hybrid Transformer-Mamba Language Model, arXiv, 2403.19887, arxiv, pdf, cication: -1

Opher Lieber, Barak Lenz, Hofit Bata, Gal Cohen, Jhonathan Osin, Itay Dalmedigos, Erez Safahi, Shaked Meirom, Yonatan Belinkov, Shai Shalev-Shwartz · (ai21) · (huggingface)
VideoMamba: State Space Model for Efficient Video Understanding, arXiv, 2403.06977, arxiv, pdf, cication: -1

Kunchang Li, Xinhao Li, Yi Wang, Yinan He, Yali Wang, Limin Wang, Yu Qiao · (VideoMamba - OpenGVLab)
DenseMamba: State Space Models with Dense Hidden Connection for Efficient Large Language Models, arXiv, 2403.00818, arxiv, pdf, cication: -1

Wei He, Kai Han, Yehui Tang, Chengcheng Wang, Yujie Yang, Tianyu Guo, Yunhe Wang
Mamba: The Easy Way
Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks, arXiv, 2402.04248, arxiv, pdf, cication: -1

Jongho Park, Jaeseung Park, Zheyang Xiong, Nayoung Lee, Jaewoong Cho, Samet Oymak, Kangwook Lee, Dimitris Papailiopoulos
Repeat After Me: Transformers are Better than State Space Models at Copying, arXiv, 2402.01032, arxiv, pdf, cication: -1

Samy Jelassi, David Brandfonbrener, Sham M. Kakade, Eran Malach
BlackMamba: Mixture of Experts for State-Space Models, arXiv, 2402.01771, arxiv, pdf, cication: -1

Quentin Anthony, Yury Tokpanov, Paolo Glorioso, Beren Millidge · (BlackMamba - Zyphra) · (zyphra) · (static1.squarespace)
MambaByte: Token-free Selective State Space Model, arXiv, 2401.13660, arxiv, pdf, cication: -1

Junxiong Wang, Tushaar Gangavarapu, Jing Nathan Yan, Alexander M Rush
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts, arXiv, 2401.04081, arxiv, pdf, cication: -1

Maciej Pióro, Kamil Ciebiera, Krystian Król, Jan Ludziejewski, Sebastian Jaszczur
The Annotated S4
Mamba: Linear-Time Sequence Modeling with Selective State Spaces, arXiv, 2312.00752, arxiv, pdf, cication: -1

Albert Gu, Tri Dao · (mamba - state-spaces)

Do we need Attention? A Mamba Primer - YouTube
Mamba Explained
Recent Mamba Papers - a julien-c Collection
Mamba: Linear-Time Sequence Modeling with Selective State Spaces (Paper Explained) - YouTube

New model

Learning to (Learn at Test Time): RNNs with Expressive Hidden States, arXiv, 2407.04620, arxiv, pdf, cication: -1

Yu Sun, Xinhao Li, Karan Dalal, Jiarui Xu, Arjun Vikram, Genghan Zhang, Yann Dubois, Xinlei Chen, Xiaolong Wang, Sanmi Koyejo
Simple and Effective Masked Diffusion Language Models, arXiv, 2406.07524, arxiv, pdf, cication: -1

Subham Sekhar Sahoo, Marianne Arriola, Yair Schiff, Aaron Gokaslan, Edgar Marroquin, Justin T Chiu, Alexander Rush, Volodymyr Kuleshov · (mdlm - kuleshov-group)
A Unified Implicit Attention Formulation for Gated-Linear Recurrent Sequence Models, arXiv, 2405.16504, arxiv, pdf, cication: -1

Itamar Zimerman, Ameen Ali, Lior Wolf · (UnifiedImplicitAttnRepr - Itamarzimm)
Attention as an RNN, arXiv, 2405.13956, arxiv, pdf, cication: -1

Leo Feng, Frederick Tung, Hossein Hajimirsadeghi, Mohamed Osama Ahmed, Yoshua Bengio, Greg Mori · (jiqizhixin)
linear_open_lm - tri-ml

A repository for research on medium sized language models.
xLSTM: Extended Long Short-Term Memory, arXiv, 2405.04517, arxiv, pdf, cication: -1

Maximilian Beck, Korbinian Pöppel, Markus Spanring, Andreas Auer, Oleksandra Prudnikova, Michael Kopp, Günter Klambauer, Johannes Brandstetter, Sepp Hochreiter
HGRN2: Gated Linear RNNs with State Expansion, arXiv, 2404.07904, arxiv, pdf, cication: -1

Zhen Qin, Songlin Yang, Weixuan Sun, Xuyang Shen, Dong Li, Weigao Sun, Yiran Zhong
Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence, arXiv, 2404.05892, arxiv, pdf, cication: -1

Bo Peng, Daniel Goldstein, Quentin Anthony, Alon Albalak, Eric Alcaide, Stella Biderman, Eugene Cheah, Xingjian Du, Teddy Ferdinan, Haowen Hou · (RWKV-LM - RWKV) · (ChatRWKV - RWKV)
RWKV-LM - RWKV

RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.
Infini-gram: Scaling Unbounded n-gram Language Models to a Trillion Tokens, arXiv, 2401.17377, arxiv, pdf, cication: -1

Jiacheng Liu, Sewon Min, Luke Zettlemoyer, Yejin Choi, Hannaneh Hajishirzi
Transfer Learning for Text Diffusion Models, arXiv, 2401.17181, arxiv, pdf, cication: -1

Kehang Han, Kathleen Kenealy, Aditya Barua, Noah Fiedel, Noah Constant
🦅 Eagle 7B : Soaring past Transformers with 1 Trillion Tokens Across 100+ Languages (RWKV-v5)
Paving the way to efficient architectures: StripedHyena-7B, open source models offering a glimpse into a world beyond Transformers
TCNCA: Temporal Convolution Network with Chunked Attention for Scalable Sequence Processing, arXiv, 2312.05605, arxiv, pdf, cication: -1

Aleksandar Terzic, Michael Hersche, Geethan Karunaratne, Luca Benini, Abu Sebastian, Abbas Rahimi
GIVT: Generative Infinite-Vocabulary Transformers, arXiv, 2312.02116, arxiv, pdf, cication: -1

Michael Tschannen, Cian Eastwood, Fabian Mentzer
Text Rendering Strategies for Pixel Language Models, arXiv, 2311.00522, arxiv, pdf, cication: -1

Jonas F. Lotz, Elizabeth Salesky, Phillip Rust, Desmond Elliott
Retentive Network: A Successor to Transformer for Large Language Models, arXiv, 2307.08621, arxiv, pdf, cication: 14

Yutao Sun, Li Dong, Shaohan Huang, Shuming Ma, Yuqing Xia, Jilong Xue, Jianyong Wang, Furu Wei
Copy Is All You Need, arXiv, 2307.06962, arxiv, pdf, cication: 217

Tian Lan, Deng Cai, Yan Wang, Heyan Huang, Xian-Ling Mao
BiPhone: Modeling Inter Language Phonetic Influences in Text, arXiv, 2307.03322, arxiv, pdf, cication: -1

Abhirut Gupta, Ananya B. Sai, Richard Sproat, Yuri Vasilevski, James S. Ren, Ambarish Jash, Sukhdeep S. Sodhi, Aravindan Raghuveer
Deep Language Networks: Joint Prompt Training of Stacked LLMs using Variational Inference, arXiv, 2306.12509, arxiv, pdf, cication: 4

Alessandro Sordoni, Xingdi Yuan, Marc-Alexandre Côté, Matheus Pereira, Adam Trischler, Ziang Xiao, Arian Hosseini, Friederike Niedtner, Nicolas Le Roux
Backpack Language Models, arXiv, 2305.16765, arxiv, pdf, cication: 4

John Hewitt, John Thickstun, Christopher D. Manning, Percy Liang · (jiqizhixin) · (mp.weixin.qq)

LLM detection

MarkLLM: An Open-Source Toolkit for LLM Watermarking, arXiv, 2405.10051, arxiv, pdf, cication: -1

Leyi Pan, Aiwei Liu, Zhiwei He, Zitian Gao, Xuandong Zhao, Yijian Lu, Binglin Zhou, Shuliang Liu, Xuming Hu, Lijie Wen · (markllm - thu-bpm)
Min-K%++: Improved Baseline for Detecting Pre-Training Data from Large Language Models, arXiv, 2404.02936, arxiv, pdf, cication: -1

Jingyang Zhang, Jingwei Sun, Eric Yeats, Yang Ouyang, Martin Kuo, Jianyi Zhang, Hao Yang, Hai Li · (zjysteven.github) · (mink-plus-plus - zjysteven)
Decoding the AI Pen: Techniques and Challenges in Detecting AI-Generated Text, arXiv, 2403.05750, arxiv, pdf, cication: -1

Sara Abdali, Richard Anarfi, CJ Barberan, Jia He
AI Watermarking 101: Tools and Techniques
Watermarking Makes Language Models Radioactive, arXiv, 2402.14904, arxiv, pdf, cication: -1

Tom Sander, Pierre Fernandez, Alain Durmus, Matthijs Douze, Teddy Furon
HuRef: HUman-REadable Fingerprint for Large Language Models, arXiv, 2312.04828, arxiv, pdf, cication: -1

Boyi Zeng, Chenghu Zhou, Xinbing Wang, Zhouhan Lin · (jiqizhixin)
LLM-generated-text-detection - thunlp
Adaptive Text Watermark for Large Language Models, arXiv, 2401.13927, arxiv, pdf, cication: -1

Yepeng Liu, Yuheng Bu
Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text, arXiv, 2401.12070, arxiv, pdf, cication: -1

Abhimanyu Hans, Avi Schwarzschild, Valeriia Cherepanova, Hamid Kazemi, Aniruddha Saha, Micah Goldblum, Jonas Geiping, Tom Goldstein
LLM-as-a-Coauthor: The Challenges of Detecting LLM-Human Mixcase, arXiv, 2401.05952, arxiv, pdf, cication: -1

Chujie Gao, Dongping Chen, Qihui Zhang, Yue Huang, Yao Wan, Lichao Sun · (MixSet - Dongping-Chen)
A Survey of Text Watermarking in the Era of Large Language Models, arXiv, 2312.07913, arxiv, pdf, cication: -1

Aiwei Liu, Leyi Pan, Yijian Lu, Jingjing Li, Xuming Hu, Xi Zhang, Lijie Wen, Irwin King, Hui Xiong, Philip S. Yu · (jiqizhixin)
Fast-DetectGPT: Efficient Zero-Shot Detection of Machine-Generated Text via Conditional Probability Curvature, arXiv, 2310.05130, arxiv, pdf, cication: 17

Guangsheng Bao, Yanbin Zhao, Zhiyang Teng, Linyi Yang, Yue Zhang · (fast-detect-gpt - baoguangsheng) · (jiqizhixin)
Ghostbuster: Detecting Text Ghostwritten by Large Language Models, arXiv, 2305.15047, arxiv, pdf, cication: 6

Vivek Verma, Eve Fleisig, Nicholas Tomlin, Dan Klein · (bair.berkeley)
‘ChatGPT detector’ catches AI-generated papers with unprecedented accuracy
GPT detectors are biased against non-native English writers, arXiv, 2304.02819, arxiv, pdf, cication: 42

Weixin Liang, Mert Yuksekgonul, Yining Mao, Eric Wu, James Zou
Can LLM-Generated Misinformation Be Detected?, arXiv, 2309.13788, arxiv, pdf, cication: -1

Canyu Chen, Kai Shu · (llm-misinformation - llm-misinformation)
Three Bricks to Consolidate Watermarks for Large Language Models, arXiv, 2308.00113, arxiv, pdf, cication: 3

Pierre Fernandez, Antoine Chaffin, Karim Tit, Vivien Chappelier, Teddy Furon
Robust Distortion-free Watermarks for Language Models, arXiv, 2307.15593, arxiv, pdf, cication: 9

Rohith Kuditipudi, John Thickstun, Tatsunori Hashimoto, Percy Liang
Can AI-Generated Text be Reliably Detected?, arXiv, 2303.11156, arxiv, pdf, cication: 93

Vinu Sankar Sadasivan, Aounon Kumar, Sriram Balasubramanian, Wenxiao Wang, Soheil Feizi · (mp.weixin.qq)
Digital tool spots academic text spawned by ChatGPT with 99% accuracy | The University of Kansas

· (mp.weixin.qq)

Interpretability

Can LLMs Learn by Teaching? A Preliminary Study, arXiv, 2406.14629, arxiv, pdf, cication: -1

Xuefei Ning, Zifu Wang, Shiyao Li, Zinan Lin, Peiran Yao, Tianyu Fu, Matthew B. Blaschko, Guohao Dai, Huazhong Yang, Yu Wang

· (lbt - imagination-research)
The Missing Curve Detectors of InceptionV1: Applying Sparse Autoencoders to InceptionV1 Early Vision, arXiv, 2406.03662, arxiv, pdf, cication: -1

Liv Gorton

· (livgorton)
From RAGs to rich parameters: Probing how language models utilize external knowledge over parametric information for factual queries, arXiv, 2406.12824, arxiv, pdf, cication: -1

Hitesh Wadhwa, Rahul Seetharaman, Somyaa Aggarwal, Reshmi Ghosh, Samyadeep Basu, Soundararajan Srinivasan, Wenlong Zhao, Shreyas Chaudhari, Ehsan Aghazadeh
How Do Large Language Models Acquire Factual Knowledge During Pretraining?, arXiv, 2406.11813, arxiv, pdf, cication: -1

Hoyeon Chang, Jinho Park, Seonghyeon Ye, Sohee Yang, Youngkyung Seo, Du-Seong Chang, Minjoon Seo
How Do Large Language Models Acquire Factual Knowledge During Pretraining?, arXiv, 2406.11813, arxiv, pdf, cication: -1

Hoyeon Chang, Jinho Park, Seonghyeon Ye, Sohee Yang, Youngkyung Seo, Du-Seong Chang, Minjoon Seo
What Do Neural Networks Really Learn? Exploring the Brain of an AI Model - YouTube
Scaling interpretability - YouTube
Large Language Model Confidence Estimation via Black-Box Access, arXiv, 2406.04370, arxiv, pdf, cication: -1

Tejaswini Pedapati, Amit Dhurandhar, Soumya Ghosh, Soham Dan, Prasanna Sattigeri
sparse-autoencoders.pdf
llm.c by Hand
Not All Language Model Features Are Linear, arXiv, 2405.14860, arxiv, pdf, cication: -1

Joshua Engels, Isaac Liao, Eric J. Michaud, Wes Gurnee, Max Tegmark
Your Transformer is Secretly Linear, arXiv, 2405.12250, arxiv, pdf, cication: -1

Anton Razzhigaev, Matvey Mikhalchuk, Elizaveta Goncharova, Nikolai Gerasimenko, Ivan Oseledets, Denis Dimitrov, Andrey Kuznetsov
Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet

· (anthropic)
The Platonic Representation Hypothesis, arXiv, 2405.07987, arxiv, pdf, cication: -1

Minyoung Huh, Brian Cheung, Tongzhou Wang, Phillip Isola · (platonic-rep - minyoungg) · (phillipi.github)
Fishing for Magikarp: Automatically Detecting Under-trained Tokens in Large Language Models, arXiv, 2405.05417, arxiv, pdf, cication: -1

Sander Land, Max Bartolo · (magikarp - cohere-ai)
A Primer on the Inner Workings of Transformer-based Language Models, arXiv, 2405.00208, arxiv, pdf, cication: -1

Javier Ferrando, Gabriele Sarti, Arianna Bisazza, Marta R. Costa-jussà
Understanding Emergent Abilities of Language Models from the Loss Perspective, arXiv, 2403.15796, arxiv, pdf, cication: -1

Zhengxiao Du, Aohan Zeng, Yuxiao Dong, Jie Tang
Circuits Updates - April 2024
Transformers Can Represent $n$-gram Language Models, arXiv, 2404.14994, arxiv, pdf, cication: -1

Anej Svete, Ryan Cotterell
A Multimodal Automated Interpretability Agent, arXiv, 2404.14394, arxiv, pdf, cication: -1

Tamar Rott Shaham, Sarah Schwettmann, Franklin Wang, Achyuta Rajaram, Evan Hernandez, Jacob Andreas, Antonio Torralba
llm-transparency-tool - facebookresearch
Compression Represents Intelligence Linearly, arXiv, 2404.09937, arxiv, pdf, cication: -1

Yuzhen Huang, Jinghan Zhang, Zifei Shan, Junxian He

· (huggingface) · (llm-compression-intelligence - hkust-nlp)
color-coded-text-generation - joaogante 🤗
LVLM-Intrepret: An Interpretability Tool for Large Vision-Language Models, arXiv, 2404.03118, arxiv, pdf, cication: -1

Gabriela Ben Melech Stan, Raanan Yehezkel Rohekar, Yaniv Gurwicz, Matthew Lyle Olson, Anahita Bhiwandiwalla, Estelle Aflalo, Chenfei Wu, Nan Duan, Shao-Yen Tseng, Vasudev Lal
Understanding Emergent Abilities of Language Models from the Loss Perspective, arXiv, 2403.15796, arxiv, pdf, cication: -1

Zhengxiao Du, Aohan Zeng, Yuxiao Dong, Jie Tang
Source-Aware Training Enables Knowledge Attribution in Language Models, arXiv, 2404.01019, arxiv, pdf, cication: -1

Muhammad Khalifa, David Wadden, Emma Strubell, Honglak Lee, Lu Wang, Iz Beltagy, Hao Peng
Future Lens: Anticipating Subsequent Tokens from a Single Hidden State, arXiv, 2311.04897, arxiv, pdf, cication: -1

Koyena Pal, Jiuding Sun, Andrew Yuan, Byron C. Wallace, David Bau
Localizing Paragraph Memorization in Language Models, arXiv, 2403.19851, arxiv, pdf, cication: -1

Niklas Stoehr, Mitchell Gordon, Chiyuan Zhang, Owen Lewis
SAE-VIS: Announcement Post — LessWrong
Circuits Updates - March 2024
pyvene: A Library for Understanding and Improving PyTorch Models via Interventions, arXiv, 2403.07809, arxiv, pdf, cication: -1

Zhengxuan Wu, Atticus Geiger, Aryaman Arora, Jing Huang, Zheng Wang, Noah D. Goodman, Christopher D. Manning, Christopher Potts · (pyvene - stanfordnlp)
transformer-debugger - openai

· (jiqizhixin)
Logits of API-Protected LLMs Leak Proprietary Information, arXiv, 2403.09539, arxiv, pdf, cication: -1

Matthew Finlayson, Xiang Ren, Swabha Swayamdipta

· (qbitai)
Stealing Part of a Production Language Model, arXiv, 2403.06634, arxiv, pdf, cication: -1

Nicholas Carlini, Daniel Paleka, Krishnamurthy Dj Dvijotham, Thomas Steinke, Jonathan Hayase, A. Feder Cooper, Katherine Lee, Matthew Jagielski, Milad Nasr, Arthur Conmy

· (qbitai)
- extracting information from black-box language models like OpenAI's ChatGPT and Google's PaLM-2 (revealing for the first time the hidden dimensions of these models)
Reflections on Qualitative Research
Claude-3's uncanny "awareness"
AtP: An efficient and scalable method for localizing LLM behaviour to components*, arXiv, 2403.00745, arxiv, pdf, cication: -1

János Kramár, Tom Lieberum, Rohin Shah, Neel Nanda
Circuits Updates - February 2024
A phase transition between positional and semantic learning in a solvable model of dot-product attention, arXiv, 2402.03902, arxiv, pdf, cication: -1

Hugo Cui, Freya Behrens, Florent Krzakala, Lenka Zdeborová
fractal - sohl-dickstein

The boundary of neural network trainability is fractal
Rethinking Interpretability in the Era of Large Language Models, arXiv, 2402.01761, arxiv, pdf, cication: -1

Chandan Singh, Jeevana Priya Inala, Michel Galley, Rich Caruana, Jianfeng Gao
Can Large Language Models Understand Context?, arXiv, 2402.00858, arxiv, pdf, cication: -1

Yilun Zhu, Joel Ruben Antony Moniz, Shruti Bhargava, Jiarui Lu, Dhivya Piraviperumal, Site Li, Yuan Zhang, Hong Yu, Bo-Hsiang Tseng
Circuits Updates - January 2024
Patchscope: A Unifying Framework for Inspecting Hidden Representations of Language Models, arXiv, 2401.06102, arxiv, pdf, cication: -1

Asma Ghandeharioun, Avi Caciularu, Adam Pearce, Lucas Dixon, Mor Geva
Vayu Robotics Blog - Interpretable End-to-End Robot Navigation
Fact Finding: Attempting to Reverse-Engineer Factual Recall on the Neuron Level (Post 1) — AI Alignment Forum
deep learning does approximate Solomonoff induction
awesome-llm-interpretability - JShollaj

A curated list of Large Language Model (LLM) Interpretability resources.
Site Unreachable
Challenges with unsupervised LLM knowledge discovery, arXiv, 2312.10029, arxiv, pdf, cication: -1

Sebastian Farquhar, Vikrant Varma, Zachary Kenton, Johannes Gasteiger, Vladimir Mikulik, Rohin Shah
Using Captum to Explain Generative Language Models, arXiv, 2312.05491, arxiv, pdf, cication: -1

Vivek Miglani, Aobo Yang, Aram H. Markosyan, Diego Garcia-Olano, Narine Kokhlikyan
Beyond Surface: Probing LLaMA Across Scales and Layers, arXiv, 2312.04333, arxiv, pdf, cication: -1

Nuo Chen, Ning Wu, Shining Liang, Ming Gong, Linjun Shou, Dongmei Zhang, Jia Li
llm-viz - bbycroft

3D Visualization of an GPT-style LLM
White-Box Transformers via Sparse Rate Reduction: Compression Is All There Is?, arXiv, 2311.13110, arxiv, pdf, cication: -1

Yaodong Yu, Sam Buchanan, Druv Pai, Tianzhe Chu, Ziyang Wu, Shengbang Tong, Hao Bai, Yuexiang Zhai, Benjamin D. Haeffele, Yi Ma · (mp.weixin.qq)
Pretraining Data Mixtures Enable Narrow Model Selection Capabilities in Transformer Models, arXiv, 2311.00871, arxiv, pdf, cication: -1

Steve Yadlowsky, Lyric Doshi, Nilesh Tripuraneni · (jiqizhixin)
The Generative AI Paradox: "What It Can Create, It May Not Understand", arXiv, 2311.00059, arxiv, pdf, cication: -1

Peter West, Ximing Lu, Nouha Dziri, Faeze Brahman, Linjie Li, Jena D. Hwang, Liwei Jiang, Jillian Fisher, Abhilasha Ravichander, Khyathi Chandu
The Impact of Depth and Width on Transformer Language Model Generalization, arXiv, 2310.19956, arxiv, pdf, cication: -1

Jackson Petty, Sjoerd van Steenkiste, Ishita Dasgupta, Fei Sha, Dan Garrette, Tal Linzen
Can Large Language Models Explain Themselves? A Study of LLM-Generated Self-Explanations, arXiv, 2310.11207, arxiv, pdf, cication: -1

Shiyuan Huang, Siddarth Mamidanna, Shreedhar Jangam, Yilun Zhou, Leilani H. Gilpin
Towards Monosemanticity: Decomposing Language Models With Dictionary Learning

· (qbitai)
Representation Engineering: A Top-Down Approach to AI Transparency, arXiv, 2310.01405, arxiv, pdf, cication: 5

Andy Zou, Long Phan, Sarah Chen, James Campbell, Phillip Guo, Richard Ren, Alexander Pan, Xuwang Yin, Mantas Mazeika, Ann-Kathrin Dombrowski · (representation-engineering - andyzoujm) · (mp.weixin.qq)
Attention Satisfies: A Constraint-Satisfaction Lens on Factual Errors of Language Models, arXiv, 2309.15098, arxiv, pdf, cication: -1

Mert Yuksekgonul, Varun Chandrasekaran, Erik Jones, Suriya Gunasekar, Ranjita Naik, Hamid Palangi, Ece Kamar, Besmira Nushi
Language Modeling Is Compression, arXiv, 2309.10668, arxiv, pdf, cication: 7

Grégoire Delétang, Anian Ruoss, Paul-Ambroise Duquenne, Elliot Catt, Tim Genewein, Christopher Mattern, Jordi Grau-Moya, Li Kevin Wenliang, Matthew Aitchison, Laurent Orseau
Sorted LLaMA: Unlocking the Potential of Intermediate Layers of Large Language Models for Dynamic Inference Using Sorted Fine-Tuning (SoFT), arXiv, 2309.08968, arxiv, pdf, cication: -1

Parsa Kavehzadeh, Mojtaba Valipour, Marzieh Tahaei, Ali Ghodsi, Boxing Chen, Mehdi Rezagholizadeh
Sparse Autoencoders Find Highly Interpretable Features in Language Models, arXiv, 2309.08600, arxiv, pdf, cication: 5

Hoagy Cunningham, Aidan Ewart, Logan Riggs, Robert Huben, Lee Sharkey
Human Language Understanding & Reasoning

· (mp.weixin.qq)
Do Machine Learning Models Memorize or Generalize?

· (mp.weixin.qq)
CIMI - Daftstone

· (jiqizhixin)
Do Machine Learning Models Memorize or Generalize?

· (qbitai)
Studying Large Language Model Generalization with Influence Functions, arXiv, 2308.03296, arxiv, pdf, cication: 12

Roger Grosse, Juhan Bae, Cem Anil, Nelson Elhage, Alex Tamkin, Amirhossein Tajdini, Benoit Steiner, Dustin Li, Esin Durmus, Ethan Perez
Can foundation models label data like humans?
Scan and Snap: Understanding Training Dynamics and Token Composition in 1-layer Transformer, arXiv, 2305.16380, arxiv, pdf, cication: 6

Yuandong Tian, Yiping Wang, Beidi Chen, Simon Du · (mp.weixin.qq)
Label Words are Anchors: An Information Flow Perspective for Understanding In-Context Learning, arXiv, 2305.14160, arxiv, pdf, cication: -1

Lean Wang, Lei Li, Damai Dai, Deli Chen, Hao Zhou, Fandong Meng, Jie Zhou, Xu Sun · (qbitai)
How Much Does Attention Actually Attend? Questioning the Importance of Attention in Pretrained Transformers, arXiv, 2211.03495, arxiv, pdf, cication: 16

Michael Hassid, Hao Peng, Daniel Rotem, Jungo Kasai, Ivan Montero, Noah A. Smith, Roy Schwartz

Generaliazation

Time is Encoded in the Weights of Finetuned Language Models, arXiv, 2312.13401, arxiv, pdf, cication: -1

Kai Nylund, Suchin Gururangan, Noah A. Smith
Pretraining Data Mixtures Enable Narrow Model Selection Capabilities in Transformer Models, arXiv, 2311.00871, arxiv, pdf, cication: -1

Steve Yadlowsky, Lyric Doshi, Nilesh Tripuraneni

LLM editting

Composable Interventions for Language Models, arXiv, 2407.06483, arxiv, pdf, cication: -1

Arinbjorn Kolbeinsson, Kyle O'Brien, Tianjin Huang, Shanghua Gao, Shiwei Liu, Jonathan Richard Schwarz, Anurag Vaidya, Faisal Mahmood, Marinka Zitnik, Tianlong Chen
Breaking Boundaries: Investigating the Effects of Model Editing on Cross-linguistic Performance, arXiv, 2406.11139, arxiv, pdf, cication: -1

Somnath Banerjee, Avik Halder, Rajarshi Mandal, Sayan Layek, Ian Soboroff, Rima Hazra, Animesh Mukherjee
Is Bigger Edit Batch Size Always Better? -- An Empirical Study on Model Editing with Llama-3, arXiv, 2405.00664, arxiv, pdf, cication: -1

Junsang Yoon, Akshat Gupta, Gopala Anumanchipalli
Robust and Scalable Model Editing for Large Language Models, arXiv, 2403.17431, arxiv, pdf, cication: -1

Yingfa Chen, Zhengyan Zhang, Xu Han, Chaojun Xiao, Zhiyuan Liu, Chen Chen, Kuai Li, Tao Yang, Maosong Sun · (EREN - thunlp)
Editing Conceptual Knowledge for Large Language Models, arXiv, 2403.06259, arxiv, pdf, cication: -1

Xiaohan Wang, Shengyu Mao, Ningyu Zhang, Shumin Deng, Yunzhi Yao, Yue Shen, Lei Liang, Jinjie Gu, Huajun Chen

· (zjukg) · (EasyEdit - zjunlp)
A Comprehensive Study of Knowledge Editing for Large Language Models, arXiv, 2401.01286, arxiv, pdf, cication: -1

Ningyu Zhang, Yunzhi Yao, Bozhong Tian, Peng Wang, Shumin Deng, Mengru Wang, Zekun Xi, Shengyu Mao, Jintian Zhang, Yuansheng Ni
Evaluating the Ripple Effects of Knowledge Editing in Language Models, arXiv, 2307.12976, arxiv, pdf, cication: 5

Roi Cohen, Eden Biran, Ori Yoran, Amir Globerson, Mor Geva
Editing Large Language Models: Problems, Methods, and Opportunities, arXiv, 2305.13172, arxiv, pdf, cication: 12

Yunzhi Yao, Peng Wang, Bozhong Tian, Siyuan Cheng, Zhoubo Li, Shumin Deng, Huajun Chen, Ningyu Zhang · (easyedit - zjunlp)
ModelEditingPapers - zjunlp

Must-read Papers on Model Editing.

AGI insights

Open-Endedness is Essential for Artificial Superhuman Intelligence, arXiv, 2406.04268, arxiv, pdf, cication: -1

Edward Hughes, Michael Dennis, Jack Parker-Holder, Feryal Behbahani, Aditi Mavalankar, Yuge Shi, Tom Schaul, Tim Rocktaschel
Artificial Generational Intelligence: Cultural Accumulation in Reinforcement Learning, arXiv, 2406.00392, arxiv, pdf, cication: -1

Jonathan Cook, Chris Lu, Edward Hughes, Joel Z. Leibo, Jakob Foerster
My AI Timelines Have Sped Up (Again)

· (mp.weixin.qq)
Computing Power and the Governance of Artificial Intelligence, arXiv, 2402.08797, arxiv, pdf, cication: -1

Girish Sastry, Lennart Heim, Haydn Belfield, Markus Anderljung, Miles Brundage, Julian Hazell, Cullen O'Keefe, Gillian K. Hadfield, Richard Ngo, Konstantin Pilz
Self-driving as a case study for AGI
Perspectives on the State and Future of Deep Learning -- 2023, arXiv, 2312.09323, arxiv, pdf, cication: -1

Micah Goldblum, Anima Anandkumar, Richard Baraniuk, Tom Goldstein, Kyunghyun Cho, Zachary C Lipton, Melanie Mitchell, Preetum Nakkiran, Max Welling, Andrew Gordon Wilson
AI and Open Source in 2023 - by Sebastian Raschka, PhD

· (mp.weixin.qq)
Some intuitions about large language models
Role play with large language models | Nature

· (qbitai)
Levels of AGI: Operationalizing Progress on the Path to AGI, arXiv, 2311.02462, arxiv, pdf, cication: -1

Meredith Ringel Morris, Jascha Sohl-dickstein, Noah Fiedel, Tris Warkentin, Allan Dafoe, Aleksandra Faust, Clement Farabet, Shane Legg
Consciousness in Artificial Intelligence: Insights from the Science of Consciousness, arXiv, 2308.08708, arxiv, pdf, cication: 15

Patrick Butlin, Robert Long, Eric Elmoznino, Yoshua Bengio, Jonathan Birch, Axel Constant, George Deane, Stephen M. Fleming, Chris Frith, Xu Ji · (jiqizhixin)
Collective Intelligence for Deep Learning: A Survey of Recent Developments | 大トロ

融合RL与LLM思想，探寻世界模型以迈向AGI「中·下篇」
融合RL与LLM思想，探寻世界模型以迈向AGI「上篇」
强化学习之父Richard Sutton：通往AGI的另一种可能
图灵奖得主、神经网络之父Hinton最新公开演讲：数字智能会取代生物智能吗？（全文及PPT）
好问题比好答案更重要｜沈向洋大模型五问

Callibration

Llamas Know What GPTs Don't Show: Surrogate Models for Confidence Estimation, arXiv, 2311.08877, arxiv, pdf, cication: -1

Vaishnavi Shrivastava, Percy Liang, Ananya Kumar
Do Large Language Models Know What They Don't Know?, arXiv, 2305.18153, arxiv, pdf, cication: 16

Zhangyue Yin, Qiushi Sun, Qipeng Guo, Jiawen Wu, Xipeng Qiu, Xuanjing Huang

Tokenization

Tokenization Falling Short: The Curse of Tokenization, arXiv, 2406.11687, arxiv, pdf, cication: -1

Yekun Chai, Yewei Fang, Qiwei Peng, Xuhong Li
Zero-Shot Tokenizer Transfer, arXiv, 2405.07883, arxiv, pdf, cication: -1

Benjamin Minixhofer, Edoardo Maria Ponti, Ivan Vulić
Toward a Theory of Tokenization in LLMs, arXiv, 2404.08335, arxiv, pdf, cication: -1

Nived Rajaraman, Jiantao Jiao, Kannan Ramchandran
Training LLMs over Neurally Compressed Text, arXiv, 2404.03626, arxiv, pdf, cication: -1

Brian Lester, Jaehoon Lee, Alex Alemi, Jeffrey Pennington, Adam Roberts, Jascha Sohl-Dickstein, Noah Constant
Greed is All You Need: An Evaluation of Tokenizer Inference Methods, arXiv, 2403.01289, arxiv, pdf, cication: -1

Omri Uzan, Craig W. Schmidt, Chris Tanner, Yuval Pinter
xT: Nested Tokenization for Larger Context in Large Images, arXiv, 2403.01915, arxiv, pdf, cication: -1

Ritwik Gupta, Shufan Li, Tyler Zhu, Jitendra Malik, Trevor Darrell, Karttikeya Mangalam · (xT - bair-climate-initiative)

Books

GenAI Handbook
大规模语言模型：从理论到实践

Privacy

PIISA

Misc

Prompt2Model: Generating Deployable Models from Natural Language Instructions, arXiv, 2308.12261, arxiv, pdf, cication: -1

Vijay Viswanathan, Chenyang Zhao, Amanda Bertsch, Tongshuang Wu, Graham Neubig · (mp.weixin.qq)
xVal: A Continuous Number Encoding for Large Language Models, arXiv, 2310.02989, arxiv, pdf, cication: -1

Siavash Golkar, Mariel Pettee, Michael Eickenberg, Alberto Bietti, Miles Cranmer, Geraud Krawezik, Francois Lanusse, Michael McCabe, Ruben Ohana, Liam Parker · (mp.weixin.qq)
GraphGPT: Graph Instruction Tuning for Large Language Models, arXiv, 2310.13023, arxiv, pdf, cication: 2

Jiabin Tang, Yuhao Yang, Wei Wei, Lei Shi, Lixin Su, Suqi Cheng, Dawei Yin, Chao Huang · (mp.weixin.qq)
A taxonomy and review of generalization research in NLP | Nature Machine Intelligence
Neurons in Large Language Models: Dead, N-gram, Positional, arXiv, 2309.04827, arxiv, pdf, cication: -1

Elena Voita, Javier Ferrando, Christoforos Nalmpantis
模型融合、混合专家、更小的LLM，几篇论文看懂2024年LLM发展方向 | 机器之心
ACL 2023最佳论文出炉！CMU西交大等摘桂冠，杰出论文奖华人学者占半壁江山

Impacts

On the Societal Impact of Open Foundation Models, arXiv, 2403.07918, arxiv, pdf, cication: -1

Sayash Kapoor, Rishi Bommasani, Kevin Klyman, Shayne Longpre, Ashwin Ramaswami, Peter Cihon, Aspen Hopkins, Kevin Bankston, Stella Biderman, Miranda Bogen · (crfm.stanford)
MIT新研究：打工人不用担心被AI淘汰！成本巨贵，视觉工作只有23%可替代

Course & Tutorial

llama3-from-scratch - naklecha

llama3 implementation one matrix multiplication at a time
cookbook - learn 🤗
GPT in 60 Lines of NumPy | Jay Mody
Fetching Title#4fnm
Stanford CS25 - Transformers United - YouTube
Let's build the GPT Tokenizer - YouTube
minbpe - karpathy

Minimal, clean, code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.
LLMs-from-scratch - rasbt

Implementing a ChatGPT-like LLM from scratch, step by step
MachineLearning-QandAI-book - rasbt

Machine Learning Q and AI book
ML-YouTube-Courses - dair-ai

📺 Discover the latest machine learning / AI courses on YouTube.
llm-course - mlabonne

Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.
[1hr Talk] Intro to Large Language Models - YouTube

· (drive.google) · (drive.google)

· (mp.weixin.qq)
ML 2023 Spring
80分鐘快速了解大型語言模型 (5:30 有咒術迴戰雷) - YouTube
Stanford CS224N: Natural Language Processing with Deep Learning | 2023 - YouTube

CUDA

lectures - cuda-mode

Material for cuda-mode lectures
introduce CUDA in a way that will be accessible to Python folks

· (youtu)

Videos

Before you continue to YouTube
Yann Lecun | Objective-Driven AI: Towards AI systems that can learn, remember, reason, and plan - YouTube

· (twitter)
Deep Learning Foundations by Soheil Feizi : Large Language Models - YouTube
But what is a GPT? Visual intro to Transformers | Deep learning, chapter 5 - YouTube
Unsupervised Learning: Redpoint's AI Podcast - YouTube

· (youtube)
Sequoia Capital - YouTube
Making AI accessible with Andrej Karpathy and Stephanie Zhan - YouTube

· (mp.weixin.qq)
Yann Lecun: Meta AI, Open Source, Limits of LLMs, AGI & the Future of AI | Lex Fridman Podcast #416 - YouTube
Sam Altman: OpenAI, GPT-5, Sora, Board Saga, Elon Musk, Ilya, Power & AGI | Lex Fridman Podcast #419 - YouTube
No Priors: AI, Machine Learning, Tech, & Startups - YouTube
Eye on AI - YouTube

Blogs

Fetching Title#o83i
Deep Dive into Transformers by Hand ✍︎ | by Srijanie Dey, PhD | Towards Data Science
Transformers-Tutorials - NielsRogge

This repository contains demos I made with the Transformers library by HuggingFace.
How I got into deep learning - Vikas Paruchuri
NLP Newsletter | elvis | Substack
Archive • AI News • Buttondown
Sebastian Raschka, PhD | Substack
Lil'Log
网传Ilya Sutskever的推荐清单火了，掌握当前AI 90% | 机器之心

Extra reference

mamba_state_space_model_paper_list - event-ahu

[Mamba-Survey-2024] Paper list for State-Space-Model/Mamba and it's Applications
Awesome-Mamba-Papers - yyyujintang

Awesome Papers related to Mamba.
awesome-generative-ai-guide - aishwaryanr

A one stop repository for generative AI research updates, interview resources, notebooks and much more!
awesome-local-ai - janhq

An awesome repository of local AI tools
how-to-optim-algorithm-in-cuda - BBuf

how to optimize some algorithm in cuda.

机器之心SOTA！模型
MLC-LLM 支持RWKV-5推理以及对RWKV-5的一些思考

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

awesome_llm_misc.md

awesome_llm_misc.md

Awesome llm misc

Survey

Blogs

Toolkits

Unlearning

Personality

World Model

Forecasting

Chat arena

State Space Model

New model

LLM detection

Interpretability

Generaliazation

LLM editting

AGI insights

Callibration

Tokenization

Benjamin Minixhofer, Edoardo Maria Ponti, Ivan Vulić

Books

Privacy

Misc

Impacts

Course & Tutorial

CUDA

Videos

Blogs

Extra reference

Files

awesome_llm_misc.md

Latest commit

History

awesome_llm_misc.md

File metadata and controls

Awesome llm misc

Survey

Blogs

Toolkits

Unlearning

Personality

World Model

Forecasting

Chat arena

State Space Model

New model

LLM detection

Interpretability

Generaliazation

LLM editting

AGI insights

Callibration

Tokenization

Benjamin Minixhofer, Edoardo Maria Ponti, Ivan Vulić

Books

Privacy

Misc

Impacts

Course & Tutorial

CUDA

Videos

Blogs

Extra reference