Skip to content

Latest commit

 

History

History
978 lines (690 loc) · 91.5 KB

awesome_llm_misc.md

File metadata and controls

978 lines (690 loc) · 91.5 KB

Awesome llm misc

Survey

  • agi-survey - ulab-uiuc Star

  • A Survey on Self-Evolution of Large Language Models, arXiv, 2404.14387, arxiv, pdf, cication: -1

    Zhengwei Tao, Ting-En Lin, Xiancai Chen, Hangyu Li, Yuchuan Wu, Yongbin Li, Zhi Jin, Fei Huang, Dacheng Tao, Jingren Zhou

  • State Space Model for New-Generation Network Alternative to Transformers: A Survey, arXiv, 2404.09516, arxiv, pdf, cication: -1

    Xiao Wang, Shiao Wang, Yuhe Ding, Yuehang Li, Wentao Wu, Yao Rong, Weizhe Kong, Ju Huang, Shihao Li, Haoxiang Yang

  • Multilingual Large Language Model: A Survey of Resources, Taxonomy and Frontiers, arXiv, 2404.04925, arxiv, pdf, cication: -1

    Libo Qin, Qiguang Chen, Yuhang Zhou, Zhi Chen, Yinghui Li, Lizi Liao, Min Li, Wanxiang Che, Philip S. Yu

  • A Survey on Multilingual Large Language Models: Corpora, Alignment, and Bias, arXiv, 2404.00929, arxiv, pdf, cication: -1

    Yuemei Xu, Ling Hu, Jiayi Zhao, Zihan Qiu, Yuqi Ye, Hanwen Gu

  • ChatGPT Alternative Solutions: Large Language Models Survey, arXiv, 2403.14469, arxiv, pdf, cication: -1

    Hanieh Alipour, Nick Pendar, Kohinoor Roy

  • Large Language Models and Causal Inference in Collaboration: A Comprehensive Survey, arXiv, 2403.09606, arxiv, pdf, cication: -1

    Xiaoyu Liu, Paiheng Xu, Junda Wu, Jiaxin Yuan, Yifan Yang, Yuhang Zhou, Fuxiao Liu, Tianrui Guan, Haoliang Wang, Tong Yu

  • Knowledge Conflicts for LLMs: A Survey, arXiv, 2403.08319, arxiv, pdf, cication: -1

    Rongwu Xu, Zehan Qi, Cunxiang Wang, Hongru Wang, Yue Zhang, Wei Xu

  • Large Language Models on Tabular Data -- A Survey, arXiv, 2402.17944, arxiv, pdf, cication: -1

    Xi Fang, Weijie Xu, Fiona Anting Tan, Jiani Zhang, Ziqing Hu, Yanjun Qi, Scott Nickleach, Diego Socolinsky, Srinivasan Sengamedu, Christos Faloutsos

  • Large Language Models and Games: A Survey and Roadmap, arXiv, 2402.18659, arxiv, pdf, cication: -1

    Roberto Gallotta, Graham Todd, Marvin Zammit, Sam Earle, Antonios Liapis, Julian Togelius, Georgios N. Yannakakis

  • A Survey on Recent Advances in LLM-Based Multi-turn Dialogue Systems, arXiv, 2402.18013, arxiv, pdf, cication: -1

    Zihao Yi, Jiarui Ouyang, Yuwen Liu, Tianhao Liao, Zhe Xu, Ying Shen

  • A Survey of Large Language Models in Cybersecurity, arXiv, 2402.16968, arxiv, pdf, cication: -1

    Gabriel de Jesus Coelho da Silva, Carlos Becker Westphall

  • Large Language Models for Data Annotation: A Survey, arXiv, 2402.13446, arxiv, pdf, cication: -1

    Zhen Tan, Alimohammad Beigi, Song Wang, Ruocheng Guo, Amrita Bhattacharjee, Bohan Jiang, Mansooreh Karami, Jundong Li, Lu Cheng, Huan Liu

  • Large Language Models: A Survey, arXiv, 2402.06196, arxiv, pdf, cication: -1

    Shervin Minaee, Tomas Mikolov, Narjes Nikzad, Meysam Chenaghlu, Richard Socher, Xavier Amatriain, Jianfeng Gao

  • Continual Learning for Large Language Models: A Survey, arXiv, 2402.01364, arxiv, pdf, cication: -1

    Tongtong Wu, Linhao Luo, Yuan-Fang Li, Shirui Pan, Thuy-Trang Vu, Gholamreza Haffari

  • From Google Gemini to OpenAI Q (Q-Star): A Survey of Reshaping the Generative Artificial Intelligence (AI) Research Landscape*, arXiv, 2312.10868, arxiv, pdf, cication: -1

    Timothy R. McIntosh, Teo Susnjak, Tong Liu, Paul Watters, Malka N. Halgamuge

  • A Survey of Large Language Models Attribution, arXiv, 2311.03731, arxiv, pdf, cication: -1

    Dongfang Li, Zetian Sun, Xinshuo Hu, Zhenyu Liu, Ziyang Chen, Baotian Hu, Aiguo Wu, Min Zhang · (awesome-llm-attributions - HITsz-TMG) Star

  • On the Origin of LLMs: An Evolutionary Tree and Graph for 15,821 Large Language Models, arXiv, 2307.09793, arxiv, pdf, cication: 1

    Sarah Gao, Andrew Kean Gao · (constellation.sites.stanford)

  • A Survey of Large Language Models, arXiv, 2303.18223, arxiv, pdf, cication: 285

    Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong · (LLMSurvey - RUCAIBox) Star

Blogs

Toolkits

  • chat-langchain - langchain-ai Star

  • ChatGPT4 - yuntian-deng 🤗

  • amazing-openai-api - soulteary Star

    Convert different model APIs into the OpenAI API format out of the box.

  • jan - janhq Star

    Jan is an open source alternative to ChatGPT that runs 100% offline on your computer

  • GPT_API_free - chatanywhere Star

    Free ChatGPT API Key,免费ChatGPT API,支持GPT4 API(免费),ChatGPT国内可用免费转发API,直连无需代理。可以搭配ChatBox等软件/插件使用,极大降低接口使用成本。国内即可无限制畅快聊天。

  • BricksLLM - bricks-cloud Star

    Simplifying LLM ops in production

  • skypilot - skypilot-org Star

    SkyPilot: Run LLMs, AI, and Batch jobs on any cloud. Get maximum savings, highest GPU availability, and managed execution—all with a simple interface.

    · (blog.skypilot)

  • vllm - vllm-project Star

    A high-throughput and memory-efficient inference and serving engine for LLMs

  • langflow - logspace-ai Star

    ⛓️ LangFlow is a UI for LangChain, designed with react-flow to provide an effortless way to experiment and prototype flows.

  • torchscale - microsoft Star

    Foundation Architecture for (M)LLMs

  • LLM-As-Chatbot - deep-diver Star

    LLM as a Chatbot Service

  • Llama-2-Open-Source-LLM-CPU-Inference - kennethleungty Star

    Running Llama 2 and other Open-Source LLMs on CPU Inference Locally for Document Q&A

  • ollama - jmorganca Star

    Get up and running with large language models locally

  • OpenLLM - bentoml Star

    An open platform for operating large language models (LLMs) in production. Fine-tune, serve, deploy, and monitor any LLMs with ease.

  • litellm - BerriAI Star

    Call all LLM APIs using the OpenAI format. Use Bedrock, Azure, OpenAI, Cohere, Anthropic, Ollama, Sagemaker, HuggingFace, Replicate (100+ LLMs)

  • ollama - jmorganca Star

    Get up and running with Llama 2 and other large language models locally

  • gpu_poor - RahulSChand Star

    Calculate GPU memory requirement & breakdown for training/inference of LLM models. Supports ggml/bnb quantization

  • leptonai - leptonai Star

    A Pythonic framework to simplify AI service building

  • exllamav2 - turboderp Star

    A fast inference library for running LLMs locally on modern consumer-class GPUs

  • outlines - normal-computing Star

    Generative Model Programming

  • one-api - songquanpeng Star

    OpenAI 接口管理 & 分发系统,支持 Azure、Anthropic Claude、Google PaLM 2、智谱 ChatGLM、百度文心一言、讯飞星火认知以及阿里通义千问,可用于二次分发管理 key,仅单可执行文件,已打包好 Docker 镜像,一键部署,开箱即用. OpenAI key management & redistribution system, using a single API for all LLMs, and features an English UI.

  • LLaMA2-Accessory - Alpha-VLLM Star

    An Open-source Toolkit for LLM Development

  • Flowise - FlowiseAI Star

    Drag & drop UI to build your customized LLM flow

  • simpleaichat - minimaxir Star

    Python package for easily interfacing with chat apps, with robust features and minimal code complexity.

  • TypeChat - Microsoft Star

    TypeChat is a library that makes it easy to build natural language interfaces using types.

  • petals - bigscience-workshop Star

    🌸 Run large language models at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading

  • chatbox - Bin-Huang Star

    Chatbox is a desktop app for GPT/LLM that supports Windows, Mac, Linux & Web Online

  • h2o-llmstudio - h2oai Star

    H2O LLM Studio - a framework and no-code GUI for fine-tuning LLMs

  • LMFlow - OptimalScale Star

    An Extensible Toolkit for Finetuning and Inference of Large Foundation Models. Large Model for All.

  • FlagAI - FlagAI-Open Star

    FlagAI (Fast LArge-scale General AI models) is a fast, easy-to-use and extensible toolkit for large-scale model.

Unlearning

  • To Forget or Not? Towards Practical Knowledge Unlearning for Large Language Models, arXiv, 2407.01920, arxiv, pdf, cication: -1

    Bozhong Tian, Xiaozhuan Liang, Siyuan Cheng, Qingbin Liu, Mengru Wang, Dianbo Sui, Xi Chen, Huajun Chen, Ningyu Zhang

    · (KnowUnDo - zjunlp) Star

  • UnUnlearning: Unlearning is not sufficient for content regulation in advanced generative AI, arXiv, 2407.00106, arxiv, pdf, cication: -1

    Ilia Shumailov, Jamie Hayes, Eleni Triantafillou, Guillermo Ortiz-Jimenez, Nicolas Papernot, Matthew Jagielski, Itay Yona, Heidi Howard, Eugene Bagdasaryan

  • What makes unlearning hard and what to do about it, arXiv, 2406.01257, arxiv, pdf, cication: -1

    Kairan Zhao, Meghdad Kurmanji, George-Octavian Bărbulescu, Eleni Triantafillou, Peter Triantafillou

  • The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning, arXiv, 2403.03218, arxiv, pdf, cication: -1

    Nathaniel Li, Alexander Pan, Anjali Gopal, Summer Yue, Daniel Berrios, Alice Gatti, Justin D. Li, Ann-Kathrin Dombrowski, Shashwat Goel, Long Phan

    • The WMDP benchmark is a curated dataset of over 4,000 questions designed to gauge and mitigate LLMs' knowledge in areas with misuse potential, such as biosecurity and cybersecurity.
  • Machine Unlearning of Pre-trained Large Language Models, arXiv, 2402.15159, arxiv, pdf, cication: -1

    Jin Yao, Eli Chien, Minxin Du, Xinyao Niu, Tianhao Wang, Zezhou Cheng, Xiang Yue · (Unlearning_LLM - yaojin17) Star

  • TOFU: A Task of Fictitious Unlearning for LLMs, arXiv, 2401.06121, arxiv, pdf, cication: -1

    Pratyush Maini, Zhili Feng, Avi Schwarzschild, Zachary C. Lipton, J. Zico Kolter

  • Large Language Model Unlearning, arXiv, 2310.10683, arxiv, pdf, cication: -1

    Yuanshun Yao, Xiaojun Xu, Yang Liu

    · (jiqizhixin) · (llm_unlearn - kevinyaobytedance) Star

  • Improving Language Plasticity via Pretraining with Active Forgetting, arXiv, 2307.01163, arxiv, pdf, cication: -1

    Yihong Chen, Kelly Marchisio, Roberta Raileanu, David Ifeoluwa Adelani, Pontus Stenetorp, Sebastian Riedel, Mikel Artetxe

  • Announcing the first Machine Unlearning Challenge – Google Research Blog

Personality

  • Large Language Models Understand and Can be Enhanced by Emotional Stimuli, arXiv, 2307.11760, arxiv, pdf, cication: 6

    Cheng Li, Jindong Wang, Yixuan Zhang, Kaijie Zhu, Wenxin Hou, Jianxun Lian, Fang Luo, Qiang Yang, Xing Xie

  • When Large Language Models Meet Personalization: Perspectives of Challenges and Opportunities, arXiv, 2307.16376, arxiv, pdf, cication: 7

    Jin Chen, Zheng Liu, Xu Huang, Chenwang Wu, Qi Liu, Gangwei Jiang, Yuanhao Pu, Yuxuan Lei, Xiaolong Chen, Xingmei Wang

  • Personality Traits in Large Language Models, arXiv, 2307.00184, arxiv, pdf, cication: 17

    Greg Serapio-García, Mustafa Safdari, Clément Crepy, Luning Sun, Stephen Fitz, Peter Romero, Marwa Abdulhai, Aleksandra Faust, Maja Matarić

World Model

Forecasting

  • Chronos: Learning the Language of Time Series, arXiv, 2403.07815, arxiv, pdf, cication: -1

    Abdul Fatir Ansari, Lorenzo Stella, Caner Turkmen, Xiyuan Zhang, Pedro Mercado, Huibin Shen, Oleksandr Shchur, Syama Sundar Rangapuram, Sebastian Pineda Arango, Shubham Kapoor

  • Lag-Llama: Towards Foundation Models for Probabilistic Time Series Forecasting, arXiv, 2310.08278, arxiv, pdf, cication: 11

    Kashif Rasul, Arjun Ashok, Andrew Robert Williams, Hena Ghonia, Rishika Bhagwatkar, Arian Khorasani, Mohammad Javad Darvishi Bayazi, George Adamopoulos, Roland Riachi, Nadhir Hassen · (lag-llama - time-series-foundation-models) Star

  • Time-LLM: Time Series Forecasting by Reprogramming Large Language Models, arXiv, 2310.01728, arxiv, pdf, cication: 17

    Ming Jin, Shiyu Wang, Lintao Ma, Zhixuan Chu, James Y. Zhang, Xiaoming Shi, Pin-Yu Chen, Yuxuan Liang, Yuan-Fang Li, Shirui Pan · (time-llm - kimmeen) Star

  • A decoder-only foundation model for time-series forecasting, arXiv, 2310.10688, arxiv, pdf, cication: 2

    Abhimanyu Das, Weihao Kong, Rajat Sen, Yichen Zhou · (jiqizhixin)

Chat arena

  • No-code LLM fine-tuning and evaluation at scale – Airtrain.ai

  • Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference, arXiv, 2403.04132, arxiv, pdf, cication: -1

    Wei-Lin Chiang, Lianmin Zheng, Ying Sheng, Anastasios Nikolas Angelopoulos, Tianle Li, Dacheng Li, Hao Zhang, Banghua Zhu, Michael Jordan, Joseph E. Gonzalez

  • GodMode - smol-ai Star

    AI Chat Browser: Fast, Full webapp access to ChatGPT / Claude / Bard / Bing / Llama2! I use this 20 times a day.

  • ChatALL - sunner Star

    Concurrently chat with ChatGPT, Bing Chat, Bard, Alpaca, Vicuna, Claude, ChatGLM, MOSS, 讯飞星火, 文心一言 and more, discover the best answers

State Space Model

  • Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling, arXiv, 2406.07522, arxiv, pdf, cication: -1

    Liliang Ren, Yang Liu, Yadong Lu, Yelong Shen, Chen Liang, Weizhu Chen

  • An Empirical Study of Mamba-based Language Models, arXiv, 2406.07887, arxiv, pdf, cication: -1

    Roger Waleffe, Wonmin Byeon, Duncan Riach, Brandon Norick, Vijay Korthikanti, Tri Dao, Albert Gu, Ali Hatamizadeh, Sudhakar Singh, Deepak Narayanan

  • Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality, arXiv, 2405.21060, arxiv, pdf, cication: -1

    Tri Dao, Albert Gu · (mamba - state-spaces) Star · (goombalab.github)

  • Zamba: A Compact 7B SSM Hybrid Model, arXiv, 2405.16712, arxiv, pdf, cication: -1

    Paolo Glorioso, Quentin Anthony, Yury Tokpanov, James Whittington, Jonathan Pilault, Adam Ibrahim, Beren Millidge

  • mamba-7b-rw - TRI-ML 🤗

  • The Illusion of State in State-Space Models, arXiv, 2404.08819, arxiv, pdf, cication: -1

    William Merrill, Jackson Petty, Ashish Sabharwal

  • Zamba — Zyphra

    · (twitter)

  • MambaMixer: Efficient Selective State Space Models with Dual Token and Channel Selection, arXiv, 2403.19888, arxiv, pdf, cication: -1

    Ali Behrouz, Michele Santacatterina, Ramin Zabih

    · (mambamixer.github)

  • Jamba: A Hybrid Transformer-Mamba Language Model, arXiv, 2403.19887, arxiv, pdf, cication: -1

    Opher Lieber, Barak Lenz, Hofit Bata, Gal Cohen, Jhonathan Osin, Itay Dalmedigos, Erez Safahi, Shaked Meirom, Yonatan Belinkov, Shai Shalev-Shwartz · (ai21) · (huggingface)

  • VideoMamba: State Space Model for Efficient Video Understanding, arXiv, 2403.06977, arxiv, pdf, cication: -1

    Kunchang Li, Xinhao Li, Yi Wang, Yinan He, Yali Wang, Limin Wang, Yu Qiao · (VideoMamba - OpenGVLab) Star

  • DenseMamba: State Space Models with Dense Hidden Connection for Efficient Large Language Models, arXiv, 2403.00818, arxiv, pdf, cication: -1

    Wei He, Kai Han, Yehui Tang, Chengcheng Wang, Yujie Yang, Tianyu Guo, Yunhe Wang

  • Mamba: The Easy Way

  • Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks, arXiv, 2402.04248, arxiv, pdf, cication: -1

    Jongho Park, Jaeseung Park, Zheyang Xiong, Nayoung Lee, Jaewoong Cho, Samet Oymak, Kangwook Lee, Dimitris Papailiopoulos

  • Repeat After Me: Transformers are Better than State Space Models at Copying, arXiv, 2402.01032, arxiv, pdf, cication: -1

    Samy Jelassi, David Brandfonbrener, Sham M. Kakade, Eran Malach

  • BlackMamba: Mixture of Experts for State-Space Models, arXiv, 2402.01771, arxiv, pdf, cication: -1

    Quentin Anthony, Yury Tokpanov, Paolo Glorioso, Beren Millidge · (BlackMamba - Zyphra) Star · (zyphra) · (static1.squarespace)

  • MambaByte: Token-free Selective State Space Model, arXiv, 2401.13660, arxiv, pdf, cication: -1

    Junxiong Wang, Tushaar Gangavarapu, Jing Nathan Yan, Alexander M Rush

  • MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts, arXiv, 2401.04081, arxiv, pdf, cication: -1

    Maciej Pióro, Kamil Ciebiera, Krystian Król, Jan Ludziejewski, Sebastian Jaszczur

  • The Annotated S4

  • Mamba: Linear-Time Sequence Modeling with Selective State Spaces, arXiv, 2312.00752, arxiv, pdf, cication: -1

    Albert Gu, Tri Dao · (mamba - state-spaces) Star


New model

  • Learning to (Learn at Test Time): RNNs with Expressive Hidden States, arXiv, 2407.04620, arxiv, pdf, cication: -1

    Yu Sun, Xinhao Li, Karan Dalal, Jiarui Xu, Arjun Vikram, Genghan Zhang, Yann Dubois, Xinlei Chen, Xiaolong Wang, Sanmi Koyejo

  • Simple and Effective Masked Diffusion Language Models, arXiv, 2406.07524, arxiv, pdf, cication: -1

    Subham Sekhar Sahoo, Marianne Arriola, Yair Schiff, Aaron Gokaslan, Edgar Marroquin, Justin T Chiu, Alexander Rush, Volodymyr Kuleshov · (mdlm - kuleshov-group) Star

  • A Unified Implicit Attention Formulation for Gated-Linear Recurrent Sequence Models, arXiv, 2405.16504, arxiv, pdf, cication: -1

    Itamar Zimerman, Ameen Ali, Lior Wolf · (UnifiedImplicitAttnRepr - Itamarzimm) Star

  • Attention as an RNN, arXiv, 2405.13956, arxiv, pdf, cication: -1

    Leo Feng, Frederick Tung, Hossein Hajimirsadeghi, Mohamed Osama Ahmed, Yoshua Bengio, Greg Mori · (jiqizhixin)

  • linear_open_lm - tri-ml Star

    A repository for research on medium sized language models.

  • xLSTM: Extended Long Short-Term Memory, arXiv, 2405.04517, arxiv, pdf, cication: -1

    Maximilian Beck, Korbinian Pöppel, Markus Spanring, Andreas Auer, Oleksandra Prudnikova, Michael Kopp, Günter Klambauer, Johannes Brandstetter, Sepp Hochreiter

  • HGRN2: Gated Linear RNNs with State Expansion, arXiv, 2404.07904, arxiv, pdf, cication: -1

    Zhen Qin, Songlin Yang, Weixuan Sun, Xuyang Shen, Dong Li, Weigao Sun, Yiran Zhong

  • Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence, arXiv, 2404.05892, arxiv, pdf, cication: -1

    Bo Peng, Daniel Goldstein, Quentin Anthony, Alon Albalak, Eric Alcaide, Stella Biderman, Eugene Cheah, Xingjian Du, Teddy Ferdinan, Haowen Hou · (RWKV-LM - RWKV) Star · (ChatRWKV - RWKV) Star

  • RWKV-LM - RWKV Star

    RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.

  • Infini-gram: Scaling Unbounded n-gram Language Models to a Trillion Tokens, arXiv, 2401.17377, arxiv, pdf, cication: -1

    Jiacheng Liu, Sewon Min, Luke Zettlemoyer, Yejin Choi, Hannaneh Hajishirzi

  • Transfer Learning for Text Diffusion Models, arXiv, 2401.17181, arxiv, pdf, cication: -1

    Kehang Han, Kathleen Kenealy, Aditya Barua, Noah Fiedel, Noah Constant

  • 🦅 Eagle 7B : Soaring past Transformers with 1 Trillion Tokens Across 100+ Languages (RWKV-v5)

  • Paving the way to efficient architectures: StripedHyena-7B, open source models offering a glimpse into a world beyond Transformers

  • TCNCA: Temporal Convolution Network with Chunked Attention for Scalable Sequence Processing, arXiv, 2312.05605, arxiv, pdf, cication: -1

    Aleksandar Terzic, Michael Hersche, Geethan Karunaratne, Luca Benini, Abu Sebastian, Abbas Rahimi

  • GIVT: Generative Infinite-Vocabulary Transformers, arXiv, 2312.02116, arxiv, pdf, cication: -1

    Michael Tschannen, Cian Eastwood, Fabian Mentzer

  • Text Rendering Strategies for Pixel Language Models, arXiv, 2311.00522, arxiv, pdf, cication: -1

    Jonas F. Lotz, Elizabeth Salesky, Phillip Rust, Desmond Elliott

  • Retentive Network: A Successor to Transformer for Large Language Models, arXiv, 2307.08621, arxiv, pdf, cication: 14

    Yutao Sun, Li Dong, Shaohan Huang, Shuming Ma, Yuqing Xia, Jilong Xue, Jianyong Wang, Furu Wei

  • Copy Is All You Need, arXiv, 2307.06962, arxiv, pdf, cication: 217

    Tian Lan, Deng Cai, Yan Wang, Heyan Huang, Xian-Ling Mao

  • BiPhone: Modeling Inter Language Phonetic Influences in Text, arXiv, 2307.03322, arxiv, pdf, cication: -1

    Abhirut Gupta, Ananya B. Sai, Richard Sproat, Yuri Vasilevski, James S. Ren, Ambarish Jash, Sukhdeep S. Sodhi, Aravindan Raghuveer

  • Deep Language Networks: Joint Prompt Training of Stacked LLMs using Variational Inference, arXiv, 2306.12509, arxiv, pdf, cication: 4

    Alessandro Sordoni, Xingdi Yuan, Marc-Alexandre Côté, Matheus Pereira, Adam Trischler, Ziang Xiao, Arian Hosseini, Friederike Niedtner, Nicolas Le Roux

  • Backpack Language Models, arXiv, 2305.16765, arxiv, pdf, cication: 4

    John Hewitt, John Thickstun, Christopher D. Manning, Percy Liang · (jiqizhixin) · (mp.weixin.qq)

LLM detection

  • MarkLLM: An Open-Source Toolkit for LLM Watermarking, arXiv, 2405.10051, arxiv, pdf, cication: -1

    Leyi Pan, Aiwei Liu, Zhiwei He, Zitian Gao, Xuandong Zhao, Yijian Lu, Binglin Zhou, Shuliang Liu, Xuming Hu, Lijie Wen · (markllm - thu-bpm) Star

  • Min-K%++: Improved Baseline for Detecting Pre-Training Data from Large Language Models, arXiv, 2404.02936, arxiv, pdf, cication: -1

    Jingyang Zhang, Jingwei Sun, Eric Yeats, Yang Ouyang, Martin Kuo, Jianyi Zhang, Hao Yang, Hai Li · (zjysteven.github) · (mink-plus-plus - zjysteven) Star

  • Decoding the AI Pen: Techniques and Challenges in Detecting AI-Generated Text, arXiv, 2403.05750, arxiv, pdf, cication: -1

    Sara Abdali, Richard Anarfi, CJ Barberan, Jia He

  • AI Watermarking 101: Tools and Techniques

  • Watermarking Makes Language Models Radioactive, arXiv, 2402.14904, arxiv, pdf, cication: -1

    Tom Sander, Pierre Fernandez, Alain Durmus, Matthijs Douze, Teddy Furon

  • HuRef: HUman-REadable Fingerprint for Large Language Models, arXiv, 2312.04828, arxiv, pdf, cication: -1

    Boyi Zeng, Chenghu Zhou, Xinbing Wang, Zhouhan Lin · (jiqizhixin)

  • LLM-generated-text-detection - thunlp Star

  • Adaptive Text Watermark for Large Language Models, arXiv, 2401.13927, arxiv, pdf, cication: -1

    Yepeng Liu, Yuheng Bu

  • Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text, arXiv, 2401.12070, arxiv, pdf, cication: -1

    Abhimanyu Hans, Avi Schwarzschild, Valeriia Cherepanova, Hamid Kazemi, Aniruddha Saha, Micah Goldblum, Jonas Geiping, Tom Goldstein

  • LLM-as-a-Coauthor: The Challenges of Detecting LLM-Human Mixcase, arXiv, 2401.05952, arxiv, pdf, cication: -1

    Chujie Gao, Dongping Chen, Qihui Zhang, Yue Huang, Yao Wan, Lichao Sun · (MixSet - Dongping-Chen) Star

  • A Survey of Text Watermarking in the Era of Large Language Models, arXiv, 2312.07913, arxiv, pdf, cication: -1

    Aiwei Liu, Leyi Pan, Yijian Lu, Jingjing Li, Xuming Hu, Xi Zhang, Lijie Wen, Irwin King, Hui Xiong, Philip S. Yu · (jiqizhixin)

  • Fast-DetectGPT: Efficient Zero-Shot Detection of Machine-Generated Text via Conditional Probability Curvature, arXiv, 2310.05130, arxiv, pdf, cication: 17

    Guangsheng Bao, Yanbin Zhao, Zhiyang Teng, Linyi Yang, Yue Zhang · (fast-detect-gpt - baoguangsheng) Star · (jiqizhixin)

  • Ghostbuster: Detecting Text Ghostwritten by Large Language Models, arXiv, 2305.15047, arxiv, pdf, cication: 6

    Vivek Verma, Eve Fleisig, Nicholas Tomlin, Dan Klein · (bair.berkeley)

  • ‘ChatGPT detector’ catches AI-generated papers with unprecedented accuracy

  • GPT detectors are biased against non-native English writers, arXiv, 2304.02819, arxiv, pdf, cication: 42

    Weixin Liang, Mert Yuksekgonul, Yining Mao, Eric Wu, James Zou

  • Can LLM-Generated Misinformation Be Detected?, arXiv, 2309.13788, arxiv, pdf, cication: -1

    Canyu Chen, Kai Shu · (llm-misinformation - llm-misinformation) Star

  • Three Bricks to Consolidate Watermarks for Large Language Models, arXiv, 2308.00113, arxiv, pdf, cication: 3

    Pierre Fernandez, Antoine Chaffin, Karim Tit, Vivien Chappelier, Teddy Furon

  • Robust Distortion-free Watermarks for Language Models, arXiv, 2307.15593, arxiv, pdf, cication: 9

    Rohith Kuditipudi, John Thickstun, Tatsunori Hashimoto, Percy Liang

  • Can AI-Generated Text be Reliably Detected?, arXiv, 2303.11156, arxiv, pdf, cication: 93

    Vinu Sankar Sadasivan, Aounon Kumar, Sriram Balasubramanian, Wenxiao Wang, Soheil Feizi · (mp.weixin.qq)

  • Digital tool spots academic text spawned by ChatGPT with 99% accuracy | The University of Kansas

    · (mp.weixin.qq)

Interpretability

  • Can LLMs Learn by Teaching? A Preliminary Study, arXiv, 2406.14629, arxiv, pdf, cication: -1

    Xuefei Ning, Zifu Wang, Shiyao Li, Zinan Lin, Peiran Yao, Tianyu Fu, Matthew B. Blaschko, Guohao Dai, Huazhong Yang, Yu Wang

    · (lbt - imagination-research) Star

  • The Missing Curve Detectors of InceptionV1: Applying Sparse Autoencoders to InceptionV1 Early Vision, arXiv, 2406.03662, arxiv, pdf, cication: -1

    Liv Gorton

    · (livgorton)

  • From RAGs to rich parameters: Probing how language models utilize external knowledge over parametric information for factual queries, arXiv, 2406.12824, arxiv, pdf, cication: -1

    Hitesh Wadhwa, Rahul Seetharaman, Somyaa Aggarwal, Reshmi Ghosh, Samyadeep Basu, Soundararajan Srinivasan, Wenlong Zhao, Shreyas Chaudhari, Ehsan Aghazadeh

  • How Do Large Language Models Acquire Factual Knowledge During Pretraining?, arXiv, 2406.11813, arxiv, pdf, cication: -1

    Hoyeon Chang, Jinho Park, Seonghyeon Ye, Sohee Yang, Youngkyung Seo, Du-Seong Chang, Minjoon Seo

  • How Do Large Language Models Acquire Factual Knowledge During Pretraining?, arXiv, 2406.11813, arxiv, pdf, cication: -1

    Hoyeon Chang, Jinho Park, Seonghyeon Ye, Sohee Yang, Youngkyung Seo, Du-Seong Chang, Minjoon Seo

  • What Do Neural Networks Really Learn? Exploring the Brain of an AI Model - YouTube

  • Scaling interpretability - YouTube

  • Large Language Model Confidence Estimation via Black-Box Access, arXiv, 2406.04370, arxiv, pdf, cication: -1

    Tejaswini Pedapati, Amit Dhurandhar, Soumya Ghosh, Soham Dan, Prasanna Sattigeri

  • sparse-autoencoders.pdf

  • llm.c by Hand

  • Not All Language Model Features Are Linear, arXiv, 2405.14860, arxiv, pdf, cication: -1

    Joshua Engels, Isaac Liao, Eric J. Michaud, Wes Gurnee, Max Tegmark

  • Your Transformer is Secretly Linear, arXiv, 2405.12250, arxiv, pdf, cication: -1

    Anton Razzhigaev, Matvey Mikhalchuk, Elizaveta Goncharova, Nikolai Gerasimenko, Ivan Oseledets, Denis Dimitrov, Andrey Kuznetsov

  • Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet

    · (anthropic)

  • The Platonic Representation Hypothesis, arXiv, 2405.07987, arxiv, pdf, cication: -1

    Minyoung Huh, Brian Cheung, Tongzhou Wang, Phillip Isola · (platonic-rep - minyoungg) Star · (phillipi.github)

  • Fishing for Magikarp: Automatically Detecting Under-trained Tokens in Large Language Models, arXiv, 2405.05417, arxiv, pdf, cication: -1

    Sander Land, Max Bartolo · (magikarp - cohere-ai) Star

  • A Primer on the Inner Workings of Transformer-based Language Models, arXiv, 2405.00208, arxiv, pdf, cication: -1

    Javier Ferrando, Gabriele Sarti, Arianna Bisazza, Marta R. Costa-jussà

  • Understanding Emergent Abilities of Language Models from the Loss Perspective, arXiv, 2403.15796, arxiv, pdf, cication: -1

    Zhengxiao Du, Aohan Zeng, Yuxiao Dong, Jie Tang

  • Circuits Updates - April 2024

  • Transformers Can Represent $n$-gram Language Models, arXiv, 2404.14994, arxiv, pdf, cication: -1

    Anej Svete, Ryan Cotterell

  • A Multimodal Automated Interpretability Agent, arXiv, 2404.14394, arxiv, pdf, cication: -1

    Tamar Rott Shaham, Sarah Schwettmann, Franklin Wang, Achyuta Rajaram, Evan Hernandez, Jacob Andreas, Antonio Torralba

  • llm-transparency-tool - facebookresearch Star

  • Compression Represents Intelligence Linearly, arXiv, 2404.09937, arxiv, pdf, cication: -1

    Yuzhen Huang, Jinghan Zhang, Zifei Shan, Junxian He

    · (huggingface) · (llm-compression-intelligence - hkust-nlp) Star

  • color-coded-text-generation - joaogante 🤗

  • LVLM-Intrepret: An Interpretability Tool for Large Vision-Language Models, arXiv, 2404.03118, arxiv, pdf, cication: -1

    Gabriela Ben Melech Stan, Raanan Yehezkel Rohekar, Yaniv Gurwicz, Matthew Lyle Olson, Anahita Bhiwandiwalla, Estelle Aflalo, Chenfei Wu, Nan Duan, Shao-Yen Tseng, Vasudev Lal

  • Understanding Emergent Abilities of Language Models from the Loss Perspective, arXiv, 2403.15796, arxiv, pdf, cication: -1

    Zhengxiao Du, Aohan Zeng, Yuxiao Dong, Jie Tang

  • Source-Aware Training Enables Knowledge Attribution in Language Models, arXiv, 2404.01019, arxiv, pdf, cication: -1

    Muhammad Khalifa, David Wadden, Emma Strubell, Honglak Lee, Lu Wang, Iz Beltagy, Hao Peng

  • Future Lens: Anticipating Subsequent Tokens from a Single Hidden State, arXiv, 2311.04897, arxiv, pdf, cication: -1

    Koyena Pal, Jiuding Sun, Andrew Yuan, Byron C. Wallace, David Bau

  • Localizing Paragraph Memorization in Language Models, arXiv, 2403.19851, arxiv, pdf, cication: -1

    Niklas Stoehr, Mitchell Gordon, Chiyuan Zhang, Owen Lewis

  • SAE-VIS: Announcement Post — LessWrong

  • Circuits Updates - March 2024

  • pyvene: A Library for Understanding and Improving PyTorch Models via Interventions, arXiv, 2403.07809, arxiv, pdf, cication: -1

    Zhengxuan Wu, Atticus Geiger, Aryaman Arora, Jing Huang, Zheng Wang, Noah D. Goodman, Christopher D. Manning, Christopher Potts · (pyvene - stanfordnlp) Star

  • transformer-debugger - openai Star

    · (jiqizhixin)

  • Logits of API-Protected LLMs Leak Proprietary Information, arXiv, 2403.09539, arxiv, pdf, cication: -1

    Matthew Finlayson, Xiang Ren, Swabha Swayamdipta

    · (qbitai)

  • Stealing Part of a Production Language Model, arXiv, 2403.06634, arxiv, pdf, cication: -1

    Nicholas Carlini, Daniel Paleka, Krishnamurthy Dj Dvijotham, Thomas Steinke, Jonathan Hayase, A. Feder Cooper, Katherine Lee, Matthew Jagielski, Milad Nasr, Arthur Conmy

    · (qbitai)

    • extracting information from black-box language models like OpenAI's ChatGPT and Google's PaLM-2 (revealing for the first time the hidden dimensions of these models)
  • Reflections on Qualitative Research

  • Claude-3's uncanny "awareness"

  • AtP: An efficient and scalable method for localizing LLM behaviour to components*, arXiv, 2403.00745, arxiv, pdf, cication: -1

    János Kramár, Tom Lieberum, Rohin Shah, Neel Nanda

  • Circuits Updates - February 2024

  • A phase transition between positional and semantic learning in a solvable model of dot-product attention, arXiv, 2402.03902, arxiv, pdf, cication: -1

    Hugo Cui, Freya Behrens, Florent Krzakala, Lenka Zdeborová

  • fractal - sohl-dickstein Star

    The boundary of neural network trainability is fractal

  • Rethinking Interpretability in the Era of Large Language Models, arXiv, 2402.01761, arxiv, pdf, cication: -1

    Chandan Singh, Jeevana Priya Inala, Michel Galley, Rich Caruana, Jianfeng Gao

  • Can Large Language Models Understand Context?, arXiv, 2402.00858, arxiv, pdf, cication: -1

    Yilun Zhu, Joel Ruben Antony Moniz, Shruti Bhargava, Jiarui Lu, Dhivya Piraviperumal, Site Li, Yuan Zhang, Hong Yu, Bo-Hsiang Tseng

  • Circuits Updates - January 2024

  • Patchscope: A Unifying Framework for Inspecting Hidden Representations of Language Models, arXiv, 2401.06102, arxiv, pdf, cication: -1

    Asma Ghandeharioun, Avi Caciularu, Adam Pearce, Lucas Dixon, Mor Geva

  • Vayu Robotics Blog - Interpretable End-to-End Robot Navigation

  • Fact Finding: Attempting to Reverse-Engineer Factual Recall on the Neuron Level (Post 1) — AI Alignment Forum

  • deep learning does approximate Solomonoff induction

  • awesome-llm-interpretability - JShollaj Star

    A curated list of Large Language Model (LLM) Interpretability resources.

  • Site Unreachable

  • Challenges with unsupervised LLM knowledge discovery, arXiv, 2312.10029, arxiv, pdf, cication: -1

    Sebastian Farquhar, Vikrant Varma, Zachary Kenton, Johannes Gasteiger, Vladimir Mikulik, Rohin Shah

  • Using Captum to Explain Generative Language Models, arXiv, 2312.05491, arxiv, pdf, cication: -1

    Vivek Miglani, Aobo Yang, Aram H. Markosyan, Diego Garcia-Olano, Narine Kokhlikyan

  • Beyond Surface: Probing LLaMA Across Scales and Layers, arXiv, 2312.04333, arxiv, pdf, cication: -1

    Nuo Chen, Ning Wu, Shining Liang, Ming Gong, Linjun Shou, Dongmei Zhang, Jia Li

  • llm-viz - bbycroft Star

    3D Visualization of an GPT-style LLM

  • White-Box Transformers via Sparse Rate Reduction: Compression Is All There Is?, arXiv, 2311.13110, arxiv, pdf, cication: -1

    Yaodong Yu, Sam Buchanan, Druv Pai, Tianzhe Chu, Ziyang Wu, Shengbang Tong, Hao Bai, Yuexiang Zhai, Benjamin D. Haeffele, Yi Ma · (mp.weixin.qq)

  • Pretraining Data Mixtures Enable Narrow Model Selection Capabilities in Transformer Models, arXiv, 2311.00871, arxiv, pdf, cication: -1

    Steve Yadlowsky, Lyric Doshi, Nilesh Tripuraneni · (jiqizhixin)

  • The Generative AI Paradox: "What It Can Create, It May Not Understand", arXiv, 2311.00059, arxiv, pdf, cication: -1

    Peter West, Ximing Lu, Nouha Dziri, Faeze Brahman, Linjie Li, Jena D. Hwang, Liwei Jiang, Jillian Fisher, Abhilasha Ravichander, Khyathi Chandu

  • The Impact of Depth and Width on Transformer Language Model Generalization, arXiv, 2310.19956, arxiv, pdf, cication: -1

    Jackson Petty, Sjoerd van Steenkiste, Ishita Dasgupta, Fei Sha, Dan Garrette, Tal Linzen

  • Can Large Language Models Explain Themselves? A Study of LLM-Generated Self-Explanations, arXiv, 2310.11207, arxiv, pdf, cication: -1

    Shiyuan Huang, Siddarth Mamidanna, Shreedhar Jangam, Yilun Zhou, Leilani H. Gilpin

  • Towards Monosemanticity: Decomposing Language Models With Dictionary Learning

    · (qbitai)

  • Representation Engineering: A Top-Down Approach to AI Transparency, arXiv, 2310.01405, arxiv, pdf, cication: 5

    Andy Zou, Long Phan, Sarah Chen, James Campbell, Phillip Guo, Richard Ren, Alexander Pan, Xuwang Yin, Mantas Mazeika, Ann-Kathrin Dombrowski · (representation-engineering - andyzoujm) Star · (mp.weixin.qq)

  • Attention Satisfies: A Constraint-Satisfaction Lens on Factual Errors of Language Models, arXiv, 2309.15098, arxiv, pdf, cication: -1

    Mert Yuksekgonul, Varun Chandrasekaran, Erik Jones, Suriya Gunasekar, Ranjita Naik, Hamid Palangi, Ece Kamar, Besmira Nushi

  • Language Modeling Is Compression, arXiv, 2309.10668, arxiv, pdf, cication: 7

    Grégoire Delétang, Anian Ruoss, Paul-Ambroise Duquenne, Elliot Catt, Tim Genewein, Christopher Mattern, Jordi Grau-Moya, Li Kevin Wenliang, Matthew Aitchison, Laurent Orseau

  • Sorted LLaMA: Unlocking the Potential of Intermediate Layers of Large Language Models for Dynamic Inference Using Sorted Fine-Tuning (SoFT), arXiv, 2309.08968, arxiv, pdf, cication: -1

    Parsa Kavehzadeh, Mojtaba Valipour, Marzieh Tahaei, Ali Ghodsi, Boxing Chen, Mehdi Rezagholizadeh

  • Sparse Autoencoders Find Highly Interpretable Features in Language Models, arXiv, 2309.08600, arxiv, pdf, cication: 5

    Hoagy Cunningham, Aidan Ewart, Logan Riggs, Robert Huben, Lee Sharkey

  • Human Language Understanding & Reasoning

    · (mp.weixin.qq)

  • Do Machine Learning Models Memorize or Generalize?

    · (mp.weixin.qq)

  • CIMI - Daftstone Star

    · (jiqizhixin)

  • Do Machine Learning Models Memorize or Generalize?

    · (qbitai)

  • Studying Large Language Model Generalization with Influence Functions, arXiv, 2308.03296, arxiv, pdf, cication: 12

    Roger Grosse, Juhan Bae, Cem Anil, Nelson Elhage, Alex Tamkin, Amirhossein Tajdini, Benoit Steiner, Dustin Li, Esin Durmus, Ethan Perez

  • Can foundation models label data like humans?

  • Scan and Snap: Understanding Training Dynamics and Token Composition in 1-layer Transformer, arXiv, 2305.16380, arxiv, pdf, cication: 6

    Yuandong Tian, Yiping Wang, Beidi Chen, Simon Du · (mp.weixin.qq)

  • Label Words are Anchors: An Information Flow Perspective for Understanding In-Context Learning, arXiv, 2305.14160, arxiv, pdf, cication: -1

    Lean Wang, Lei Li, Damai Dai, Deli Chen, Hao Zhou, Fandong Meng, Jie Zhou, Xu Sun · (qbitai)

  • How Much Does Attention Actually Attend? Questioning the Importance of Attention in Pretrained Transformers, arXiv, 2211.03495, arxiv, pdf, cication: 16

    Michael Hassid, Hao Peng, Daniel Rotem, Jungo Kasai, Ivan Montero, Noah A. Smith, Roy Schwartz

Generaliazation

  • Time is Encoded in the Weights of Finetuned Language Models, arXiv, 2312.13401, arxiv, pdf, cication: -1

    Kai Nylund, Suchin Gururangan, Noah A. Smith

  • Pretraining Data Mixtures Enable Narrow Model Selection Capabilities in Transformer Models, arXiv, 2311.00871, arxiv, pdf, cication: -1

    Steve Yadlowsky, Lyric Doshi, Nilesh Tripuraneni

LLM editting

  • Composable Interventions for Language Models, arXiv, 2407.06483, arxiv, pdf, cication: -1

    Arinbjorn Kolbeinsson, Kyle O'Brien, Tianjin Huang, Shanghua Gao, Shiwei Liu, Jonathan Richard Schwarz, Anurag Vaidya, Faisal Mahmood, Marinka Zitnik, Tianlong Chen

  • Breaking Boundaries: Investigating the Effects of Model Editing on Cross-linguistic Performance, arXiv, 2406.11139, arxiv, pdf, cication: -1

    Somnath Banerjee, Avik Halder, Rajarshi Mandal, Sayan Layek, Ian Soboroff, Rima Hazra, Animesh Mukherjee

  • Is Bigger Edit Batch Size Always Better? -- An Empirical Study on Model Editing with Llama-3, arXiv, 2405.00664, arxiv, pdf, cication: -1

    Junsang Yoon, Akshat Gupta, Gopala Anumanchipalli

  • Robust and Scalable Model Editing for Large Language Models, arXiv, 2403.17431, arxiv, pdf, cication: -1

    Yingfa Chen, Zhengyan Zhang, Xu Han, Chaojun Xiao, Zhiyuan Liu, Chen Chen, Kuai Li, Tao Yang, Maosong Sun · (EREN - thunlp) Star

  • Editing Conceptual Knowledge for Large Language Models, arXiv, 2403.06259, arxiv, pdf, cication: -1

    Xiaohan Wang, Shengyu Mao, Ningyu Zhang, Shumin Deng, Yunzhi Yao, Yue Shen, Lei Liang, Jinjie Gu, Huajun Chen

    · (zjukg) · (EasyEdit - zjunlp) Star

  • A Comprehensive Study of Knowledge Editing for Large Language Models, arXiv, 2401.01286, arxiv, pdf, cication: -1

    Ningyu Zhang, Yunzhi Yao, Bozhong Tian, Peng Wang, Shumin Deng, Mengru Wang, Zekun Xi, Shengyu Mao, Jintian Zhang, Yuansheng Ni

  • Evaluating the Ripple Effects of Knowledge Editing in Language Models, arXiv, 2307.12976, arxiv, pdf, cication: 5

    Roi Cohen, Eden Biran, Ori Yoran, Amir Globerson, Mor Geva

  • Editing Large Language Models: Problems, Methods, and Opportunities, arXiv, 2305.13172, arxiv, pdf, cication: 12

    Yunzhi Yao, Peng Wang, Bozhong Tian, Siyuan Cheng, Zhoubo Li, Shumin Deng, Huajun Chen, Ningyu Zhang · (easyedit - zjunlp) Star

  • ModelEditingPapers - zjunlp Star

    Must-read Papers on Model Editing.

AGI insights


Callibration

  • Llamas Know What GPTs Don't Show: Surrogate Models for Confidence Estimation, arXiv, 2311.08877, arxiv, pdf, cication: -1

    Vaishnavi Shrivastava, Percy Liang, Ananya Kumar

  • Do Large Language Models Know What They Don't Know?, arXiv, 2305.18153, arxiv, pdf, cication: 16

    Zhangyue Yin, Qiushi Sun, Qipeng Guo, Jiawen Wu, Xipeng Qiu, Xuanjing Huang

Tokenization

  • Tokenization Falling Short: The Curse of Tokenization, arXiv, 2406.11687, arxiv, pdf, cication: -1

    Yekun Chai, Yewei Fang, Qiwei Peng, Xuhong Li

  • Zero-Shot Tokenizer Transfer, arXiv, 2405.07883, arxiv, pdf, cication: -1

    Benjamin Minixhofer, Edoardo Maria Ponti, Ivan Vulić

  • Toward a Theory of Tokenization in LLMs, arXiv, 2404.08335, arxiv, pdf, cication: -1

    Nived Rajaraman, Jiantao Jiao, Kannan Ramchandran

  • Training LLMs over Neurally Compressed Text, arXiv, 2404.03626, arxiv, pdf, cication: -1

    Brian Lester, Jaehoon Lee, Alex Alemi, Jeffrey Pennington, Adam Roberts, Jascha Sohl-Dickstein, Noah Constant

  • Greed is All You Need: An Evaluation of Tokenizer Inference Methods, arXiv, 2403.01289, arxiv, pdf, cication: -1

    Omri Uzan, Craig W. Schmidt, Chris Tanner, Yuval Pinter

  • xT: Nested Tokenization for Larger Context in Large Images, arXiv, 2403.01915, arxiv, pdf, cication: -1

    Ritwik Gupta, Shufan Li, Tyler Zhu, Jitendra Malik, Trevor Darrell, Karttikeya Mangalam · (xT - bair-climate-initiative) Star

Books

Privacy

Misc

Impacts

Course & Tutorial

CUDA

Videos

Blogs

Extra reference