Skip to content

Latest commit

 

History

History
331 lines (236 loc) · 34.6 KB

awesome_long_context_llm.md

File metadata and controls

331 lines (236 loc) · 34.6 KB

Awesom long-context llm

Survey

Evaluation

  • Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems, arXiv, 2407.01370, arxiv, pdf, cication: -1

    Philippe Laban, Alexander R. Fabbri, Caiming Xiong, Chien-Sheng Wu

  • Counting-Stars: A Simple, Efficient, and Reasonable Strategy for Evaluating Long-Context Large Language Models, arXiv, 2403.11802, arxiv, pdf, cication: -1

    Mingyang Song, Mao Zheng, Xuan Luo · (Counting-Stars - nick7nlp) Star · (mp.weixin.qq)

  • Evaluating Very Long-Term Conversational Memory of LLM Agents, arXiv, 2402.17753, arxiv, pdf, cication: -1

    Adyasha Maharana, Dong-Ho Lee, Sergey Tulyakov, Mohit Bansal, Francesco Barbieri, Yuwei Fang

Papers

  • Associative Recurrent Memory Transformer, arXiv, 2407.04841, arxiv, pdf, cication: -1

    Ivan Rodkin, Yuri Kuratov, Aydar Bulatov, Mikhail Burtsev

  • Is It Really Long Context if All You Need Is Retrieval? Towards Genuinely Difficult Long Context NLP, arXiv, 2407.00402, arxiv, pdf, cication: -1

    Omer Goldman, Alon Jacovi, Aviv Slobodkin, Aviya Maimon, Ido Dagan, Reut Tsarfaty

  • Sparser is Faster and Less is More: Efficient Sparse Attention for Long-Range Transformers, arXiv, 2406.16747, arxiv, pdf, cication: -1

    Chao Lou, Zixia Jia, Zilong Zheng, Kewei Tu

  • Recite, Reconstruct, Recollect: Memorization in LMs as a Multifaceted Phenomenon, arXiv, 2406.17746, arxiv, pdf, cication: -1

    USVSN Sai Prashanth, Alvin Deng, Kyle O'Brien, Jyothir S V, Mohammad Aflah Khan, Jaydeep Borkar, Christopher A. Choquette-Choo, Jacob Ray Fuehne, Stella Biderman, Tracy Ke

  • Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?, arXiv, 2406.13121, arxiv, pdf, cication: -1

    Jinhyuk Lee, Anthony Chen, Zhuyun Dai, Dheeru Dua, Devendra Singh Sachan, Michael Boratko, Yi Luan, Sébastien M. R. Arnold, Vincent Perot, Siddharth Dalmia · (loft - google-deepmind) Star

  • THEANINE: Revisiting Memory Management in Long-term Conversations with Timeline-augmented Response Generation, arXiv, 2406.10996, arxiv, pdf, cication: -1

    Seo Hyun Kim, Kai Tzu-iunn Ong, Taeyoon Kwon, Namyoung Kim, Keummin Ka, SeongHyeon Bae, Yohan Jo, Seung-won Hwang, Dongha Lee, Jinyoung Yeo · (theanine-693b0.web)

  • Recurrent Context Compression: Efficiently Expanding the Context Window of LLM, arXiv, 2406.06110, arxiv, pdf, cication: -1

    Chensen Huang, Guibo Zhu, Xuepeng Wang, Yifei Luo, Guojing Ge, Haoran Chen, Dong Yi, Jinqiao Wang · (RCC_Transformer - WUHU-G) Star

  • Contextual Position Encoding: Learning to Count What's Important, arXiv, 2405.18719, arxiv, pdf, cication: -1

    Olga Golovneva, Tianlu Wang, Jason Weston, Sainbayar Sukhbaatar

  • Are Long-LLMs A Necessity For Long-Context Tasks?, arXiv, 2405.15318, arxiv, pdf, cication: -1

    Hongjin Qian, Zheng Liu, Peitian Zhang, Kelong Mao, Yujia Zhou, Xu Chen, Zhicheng Dou

  • Challenges in Deploying Long-Context Transformers: A Theoretical Peak Performance Analysis, arXiv, 2405.08944, arxiv, pdf, cication: -1

    Yao Fu

  • Extending Llama-3's Context Ten-Fold Overnight, arXiv, 2404.19553, arxiv, pdf, cication: -1

    Peitian Zhang, Ninglu Shao, Zheng Liu, Shitao Xiao, Hongjin Qian, Qiwei Ye, Zhicheng Dou · (FlagEmbedding - FlagOpen) Star

  • Make Your LLM Fully Utilize the Context, arXiv, 2404.16811, arxiv, pdf, cication: -1

    Shengnan An, Zexiong Ma, Zeqi Lin, Nanning Zheng, Jian-Guang Lou · (FILM - microsoft) Star

  • SnapKV: LLM Knows What You are Looking for Before Generation, arXiv, 2404.14469, arxiv, pdf, cication: -1

    Yuhong Li, Yingbing Huang, Bowen Yang, Bharat Venkitesh, Acyr Locatelli, Hanchen Ye, Tianle Cai, Patrick Lewis, Deming Chen · (SnapKV - FasterDecoding) Star

  • LongEmbed: Extending Embedding Models for Long Context Retrieval, arXiv, 2404.12096, arxiv, pdf, cication: -1

    Dawei Zhu, Liang Wang, Nan Yang, Yifan Song, Wenhao Wu, Furu Wei, Sujian Li · (LongEmbed - dwzhu-pku) Star

  • Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length, arXiv, 2404.08801, arxiv, pdf, cication: -1

    Xuezhe Ma, Xiaomeng Yang, Wenhan Xiong, Beidi Chen, Lili Yu, Hao Zhang, Jonathan May, Luke Zettlemoyer, Omer Levy, Chunting Zhou · (megalodon - XuezheMax) Star

  • TransformerFAM: Feedback attention is working memory, arXiv, 2404.09173, arxiv, pdf, cication: -1

    Dongseong Hwang, Weiran Wang, Zhuoyuan Huo, Khe Chai Sim, Pedro Moreno Mengibar

  • LLoCO: Learning Long Contexts Offline, arXiv, 2404.07979, arxiv, pdf, cication: -1

    Sijun Tan, Xiuyu Li, Shishir Patil, Ziyang Wu, Tianjun Zhang, Kurt Keutzer, Joseph E. Gonzalez, Raluca Ada Popa · (lloco - jeffreysijuntan) Star

  • RULER: What's the Real Context Size of Your Long-Context Language Models?, arXiv, 2404.06654, arxiv, pdf, cication: -1

    Cheng-Ping Hsieh, Simeng Sun, Samuel Kriman, Shantanu Acharya, Dima Rekesh, Fei Jia, Boris Ginsburg

  • Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention, arXiv, 2404.07143, arxiv, pdf, cication: -1

    Tsendsuren Munkhdalai, Manaal Faruqui, Siddharth Gopal

  • Long-context LLMs Struggle with Long In-context Learning, arXiv, 2404.02060, arxiv, pdf, cication: -1

    Tianle Li, Ge Zhang, Quy Duc Do, Xiang Yue, Wenhu Chen

    • the LLMs perform relatively well under the token length of 20K. However, after the context window exceeds 20K, most LLMs except GPT-4 will dip dramatically.
  • BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences, arXiv, 2403.09347, arxiv, pdf, cication: -1

    Sun Ao, Weilin Zhao, Xu Han, Cheng Yang, Zhiyuan Liu, Chuan Shi, Maosong Sun, Shengnan Wang, Teng Su

    • optimizes distributed attention in Transformer-based models for long sequences, cutting communication overhead by 40% and doubling processing speed on GPUs.
  • Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context, arXiv, 2403.05530, arxiv, pdf, cication: -1

    Machel Reid, Nikolay Savinov, Denis Teplyashin, Dmitry Lepikhin, Timothy Lillicrap, Jean-baptiste Alayrac, Radu Soricut, Angeliki Lazaridou, Orhan Firat, Julian Schrittwieser

  • Resonance RoPE: Improving Context Length Generalization of Large Language Models, arXiv, 2403.00071, arxiv, pdf, cication: -1

    Suyuchen Wang, Ivan Kobyzev, Peng Lu, Mehdi Rezagholizadeh, Bang Liu

  • Long-Context Language Modeling with Parallel Context Encoding, arXiv, 2402.16617, arxiv, pdf, cication: -1

    Howard Yen, Tianyu Gao, Danqi Chen · (qbitai)

    · (cepe - princeton-nlp) Star

  • A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts, arXiv, 2402.09727, arxiv, pdf, cication: -1

    Kuang-Huei Lee, Xinyun Chen, Hiroki Furuta, John Canny, Ian Fischer · (mp.weixin.qq)

  • Training-Free Long-Context Scaling of Large Language Models, arXiv, 2402.17463, arxiv, pdf, cication: -1

    Chenxin An, Fei Huang, Jun Zhang, Shansan Gong, Xipeng Qiu, Chang Zhou, Lingpeng Kong · (ChunkLlama - HKUNLP) Star

  • LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens, arXiv, 2402.13753, arxiv, pdf, cication: -1

    Yiran Ding, Li Lyna Zhang, Chengruidong Zhang, Yuanyuan Xu, Ning Shang, Jiahang Xu, Fan Yang, Mao Yang

  • $\infty$Bench: Extending Long Context Evaluation Beyond 100K Tokens, arXiv, 2402.13718, arxiv, pdf, cication: -1

    Xinrong Zhang, Yingfa Chen, Shengding Hu, Zihang Xu, Junhao Chen, Moo Khai Hao, Xu Han, Zhen Leng Thai, Shuo Wang, Zhiyuan Liu

  • LongAgent: Scaling Language Models to 128k Context through Multi-Agent Collaboration, arXiv, 2402.11550, arxiv, pdf, cication: -1

    Jun Zhao, Can Zu, Hao Xu, Yi Lu, Wei He, Yiwen Ding, Tao Gui, Qi Zhang, Xuanjing Huang

  • InfLLM: Unveiling the Intrinsic Capacity of LLMs for Understanding Extremely Long Sequences with Training-Free Memory, arXiv, 2402.04617, arxiv, pdf, cication: -1

    Chaojun Xiao, Pengle Zhang, Xu Han, Guangxuan Xiao, Yankai Lin, Zhengyan Zhang, Zhiyuan Liu, Song Han, Maosong Sun

    · (InfLLM - thunlp) Star · (mp.weixin.qq)

  • In Search of Needles in a 10M Haystack: Recurrent Memory Finds What LLMs Miss, arXiv, 2402.10790, arxiv, pdf, cication: -1

    Yuri Kuratov, Aydar Bulatov, Petr Anokhin, Dmitry Sorokin, Artyom Sorokin, Mikhail Burtsev

  • Data Engineering for Scaling Language Models to 128K Context, arXiv, 2402.10171, arxiv, pdf, cication: -1

    Yao Fu, Rameswar Panda, Xinyao Niu, Xiang Yue, Hannaneh Hajishirzi, Yoon Kim, Hao Peng · (long-context-data-engineering - franxyao) Star

  • Transformers Can Achieve Length Generalization But Not Robustly, arXiv, 2402.09371, arxiv, pdf, cication: -1

    Yongchao Zhou, Uri Alon, Xinyun Chen, Xuezhi Wang, Rishabh Agarwal, Denny Zhou

  • KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization, arXiv, 2401.18079, arxiv, pdf, cication: -1

    Coleman Hooper, Sehoon Kim, Hiva Mohammadzadeh, Michael W. Mahoney, Yakun Sophia Shao, Kurt Keutzer, Amir Gholami

  • Long-Context-Data-Engineering - FranxYao Star

    Implementation of paper Data Engineering for Scaling Language Models to 128K Context

  • LongAlign: A Recipe for Long Context Alignment of Large Language Models, arXiv, 2401.18058, arxiv, pdf, cication: -1

    Yushi Bai, Xin Lv, Jiajie Zhang, Yuze He, Ji Qi, Lei Hou, Jie Tang, Yuxiao Dong, Juanzi Li · (LongAlign - THUDM) Star

  • With Greater Text Comes Greater Necessity: Inference-Time Training Helps Long Text Generation, arXiv, 2401.11504, arxiv, pdf, cication: -1

    Y. Wang, D. Ma, D. Cai · (zhuanlan.zhihu)

  • E^2-LLM: Efficient and Extreme Length Extension of Large Language Models, arXiv, 2401.06951, arxiv, pdf, cication: -1

    Jiaheng Liu, Zhiqi Bai, Yuanxing Zhang, Chenchen Zhang, Yu Zhang, Ge Zhang, Jiakai Wang, Haoran Que, Yukang Chen, Wenbo Su

  • Extending LLMs' Context Window with 100 Samples, arXiv, 2401.07004, arxiv, pdf, cication: -1

    Yikai Zhang, Junlong Li, Pengfei Liu · (Entropy-ABF - GAIR-NLP) Star

  • Transformers are Multi-State RNNs, arXiv, 2401.06104, arxiv, pdf, cication: -1

    Matanel Oren, Michael Hassid, Yossi Adi, Roy Schwartz · (TOVA - schwartz-lab-NLP) Star

  • Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models, arXiv, 2401.04658, arxiv, pdf, cication: -1

    Zhen Qin, Weigao Sun, Dong Li, Xuyang Shen, Weixuan Sun, Yiran Zhong · (lightning-attention - OpenNLPLab) Star

  • Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache, arXiv, 2401.02669, arxiv, pdf, cication: -1

    Bin Lin, Tao Peng, Chen Zhang, Minmin Sun, Lanbo Li, Hanyu Zhao, Wencong Xiao, Qi Xu, Xiafei Qiu, Shen Li

  • LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning, arXiv, 2401.01325, arxiv, pdf, cication: -1

    Hongye Jin, Xiaotian Han, Jingfeng Yang, Zhimeng Jiang, Zirui Liu, Chia-Yuan Chang, Huiyuan Chen, Xia Hu · (qbitai)

  • Cached Transformers: Improving Transformers with Differentiable Memory Cache, arXiv, 2312.12742, arxiv, pdf, cication: -1

    Zhaoyang Zhang, Wenqi Shao, Yixiao Ge, Xiaogang Wang, Jinwei Gu, Ping Luo

  • Extending Context Window of Large Language Models via Semantic Compression, arXiv, 2312.09571, arxiv, pdf, cication: -1

    Weizhi Fei, Xueyan Niu, Pingyi Zhou, Lu Hou, Bo Bai, Lei Deng, Wei Han

  • Zebra: Extending Context Window with Layerwise Grouped Local-Global Attention, arXiv, 2312.08618, arxiv, pdf, cication: -1

    Kaiqiang Song, Xiaoyang Wang, Sangwoo Cho, Xiaoman Pan, Dong Yu

  • Ultra-Long Sequence Distributed Transformer, arXiv, 2311.02382, arxiv, pdf, cication: -1

    Xiao Wang, Isaac Lyngaas, Aristeidis Tsaris, Peng Chen, Sajal Dash, Mayanka Chandra Shekar, Tao Luo, Hong-Jun Yoon, Mohamed Wahib, John Gouley

  • HyperAttention: Long-context Attention in Near-Linear Time, arXiv, 2310.05869, arxiv, pdf, cication: 2

    Insu Han, Rajesh Jayaram, Amin Karbasi, Vahab Mirrokni, David P. Woodruff, Amir Zandieh

  • CLEX: Continuous Length Extrapolation for Large Language Models, arXiv, 2310.16450, arxiv, pdf, cication: -1

    Guanzheng Chen, Xin Li, Zaiqiao Meng, Shangsong Liang, Lidong Bing

  • TRAMS: Training-free Memory Selection for Long-range Language Modeling, arXiv, 2310.15494, arxiv, pdf, cication: -1

    Haofei Yu, Cunxiang Wang, Yue Zhang, Wei Bi

  • Ring Attention with Blockwise Transformers for Near-Infinite Context, arXiv, 2310.01889, arxiv, pdf, cication: 6

    Hao Liu, Matei Zaharia, Pieter Abbeel · (RingAttention - lhao499) Star

  • Walking Down the Memory Maze: Beyond Context Limit through Interactive Reading, arXiv, 2310.05029, arxiv, pdf, cication: -1

    Howard Chen, Ramakanth Pasunuru, Jason Weston, Asli Celikyilmaz · (mp.weixin.qq)

  • Scaling Laws of RoPE-based Extrapolation, arXiv, 2310.05209, arxiv, pdf, cication: -1

    Xiaoran Liu, Hang Yan, Shuo Zhang, Chenxin An, Xipeng Qiu, Dahua Lin · (qbitai)

  • Walking Down the Memory Maze: Beyond Context Limit through Interactive Reading, arXiv, 2310.05029, arxiv, pdf, cication: -1

    Howard Chen, Ramakanth Pasunuru, Jason Weston, Asli Celikyilmaz

  • Ring Attention with Blockwise Transformers for Near-Infinite Context, arXiv, 2310.01889, arxiv, pdf, cication: -1

    Hao Liu, Matei Zaharia, Pieter Abbeel

  • EIPE-text: Evaluation-Guided Iterative Plan Extraction for Long-Form Narrative Text Generation, arXiv, 2310.08185, arxiv, pdf, cication: -1

    Wang You, Wenshan Wu, Yaobo Liang, Shaoguang Mao, Chenfei Wu, Maosong Cao, Yuzhe Cai, Yiduo Guo, Yan Xia, Furu Wei

  • CoCA: Fusing position embedding with Collinear Constrained Attention for fine-tuning free context window extending, arXiv, 2309.08646, arxiv, pdf, cication: -1

    Shiyi Zhu, Jing Ye, Wei Jiang, Qi Zhang, Yifan Wu, Jianguo Li · (Collinear-Constrained-Attention - codefuse-ai) Star · (jiqizhixin)

  • PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training, arXiv, 2309.10400, arxiv, pdf, cication: -1

    Dawei Zhu, Nan Yang, Liang Wang, Yifan Song, Wenhao Wu, Furu Wei, Sujian Li · (PoSE - dwzhu-pku) Star

  • Effective Long-Context Scaling of Foundation Models, arXiv, 2309.16039, arxiv, pdf, cication: 1

    Wenhan Xiong, Jingyu Liu, Igor Molybog, Hejia Zhang, Prajjwal Bhargava, Rui Hou, Louis Martin, Rashi Rungta, Karthik Abinav Sankararaman, Barlas Oguz · (qbitai)

  • LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models, arXiv, 2308.16137, arxiv, pdf, cication: 3

    Chi Han, Qifan Wang, Wenhan Xiong, Yu Chen, Heng Ji, Sinong Wang

  • DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models, arXiv, 2309.14509, arxiv, pdf, cication: -1

    Sam Ade Jacobs, Masahiro Tanaka, Chengming Zhang, Minjia Zhang, Shuaiwen Leon Song, Samyam Rajbhandari, Yuxiong He

  • YaRN: Efficient Context Window Extension of Large Language Models, arXiv, 2309.00071, arxiv, pdf, cication: 9

    Bowen Peng, Jeffrey Quesnelle, Honglu Fan, Enrico Shippole · (yarn - jquesnelle) Star · (jiqizhixin)

  • In-context Autoencoder for Context Compression in a Large Language Model, arXiv, 2307.06945, arxiv, pdf, cication: 4

    Tao Ge, Jing Hu, Lei Wang, Xun Wang, Si-Qing Chen, Furu Wei

  • Focused Transformer: Contrastive Training for Context Scaling, arXiv, 2307.03170, arxiv, pdf, cication: 12

    Szymon Tworkowski, Konrad Staniszewski, Mikołaj Pacek, Yuhuai Wu, Henryk Michalewski, Piotr Miłoś

  • Lost in the Middle: How Language Models Use Long Contexts, arXiv, 2307.03172, arxiv, pdf, cication: 64

    Nelson F. Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, Percy Liang

  • LongNet: Scaling Transformers to 1,000,000,000 Tokens, arXiv, 2307.02486, arxiv, pdf, cication: 15

    Jiayu Ding, Shuming Ma, Li Dong, Xingxing Zhang, Shaohan Huang, Wenhui Wang, Nanning Zheng, Furu Wei

  • Extending Context Window of Large Language Models via Positional Interpolation, arXiv, 2306.15595, arxiv, pdf, cication: 36

    Shouyuan Chen, Sherman Wong, Liangjian Chen, Yuandong Tian · (qbitai)

  • The Impact of Positional Encoding on Length Generalization in Transformers, arXiv, 2305.19466, arxiv, pdf, cication: 5

    Amirhossein Kazemnejad, Inkit Padhi, Karthikeyan Natesan Ramamurthy, Payel Das, Siva Reddy

  • Long-range Language Modeling with Self-retrieval, arXiv, 2306.13421, arxiv, pdf, cication: 3

    Ohad Rubin, Jonathan Berant

  • Block-State Transformers, arXiv, 2306.09539, arxiv, pdf, cication: 2

    Mahan Fathi, Jonathan Pilault, Orhan Firat, Christopher Pal, Pierre-Luc Bacon, Ross Goroshin

  • LeanDojo: Theorem Proving with Retrieval-Augmented Language Models, arXiv, 2306.15626, arxiv, pdf, cication: 14

    Kaiyu Yang, Aidan M. Swope, Alex Gu, Rahul Chalamala, Peiyang Song, Shixing Yu, Saad Godil, Ryan Prenger, Anima Anandkumar

  • GLIMMER: generalized late-interaction memory reranker, arXiv, 2306.10231, arxiv, pdf, cication: 1

    Michiel de Jong, Yury Zemlyanskiy, Nicholas FitzGerald, Sumit Sanghai, William W. Cohen, Joshua Ainslie

  • Augmenting Language Models with Long-Term Memory, arXiv, 2306.07174, arxiv, pdf, cication: 7

    Weizhi Wang, Li Dong, Hao Cheng, Xiaodong Liu, Xifeng Yan, Jianfeng Gao, Furu Wei · (aka)

  • Sequence Parallelism: Long Sequence Training from System Perspective, arXiv, 2105.13120, arxiv, pdf, cication: 2

    Shenggui Li, Fuzhao Xue, Chaitanya Baranwal, Yongbin Li, Yang You

Projects

  • EasyContext - jzhang38 Star

    Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware. · (twitter)

  • LLMLingua - microsoft Star

    To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.

  • long-context - abacusai Star

    This repository contains code and tooling for the Abacus.AI LLM Context Expansion project. Also included are evaluation scripts and benchmark tasks that evaluate a model’s information retrieval capabilities with context expansion. We also include key experimental results and instructions for reproducing and building on them.

  • LLaMA rope_scaling

  • long_llama - cstankonrad Star

    LongLLaMA is a large language model capable of handling long contexts. It is based on OpenLLaMA and fine-tuned with the Focused Transformer (FoT) method.

Other

Extra reference