-
agi-survey - ulab-uiuc
-
A Survey on Self-Evolution of Large Language Models,
arXiv, 2404.14387
, arxiv, pdf, cication: -1Zhengwei Tao, Ting-En Lin, Xiancai Chen, Hangyu Li, Yuchuan Wu, Yongbin Li, Zhi Jin, Fei Huang, Dacheng Tao, Jingren Zhou
-
State Space Model for New-Generation Network Alternative to Transformers: A Survey,
arXiv, 2404.09516
, arxiv, pdf, cication: -1Xiao Wang, Shiao Wang, Yuhe Ding, Yuehang Li, Wentao Wu, Yao Rong, Weizhe Kong, Ju Huang, Shihao Li, Haoxiang Yang
-
Multilingual Large Language Model: A Survey of Resources, Taxonomy and Frontiers,
arXiv, 2404.04925
, arxiv, pdf, cication: -1Libo Qin, Qiguang Chen, Yuhang Zhou, Zhi Chen, Yinghui Li, Lizi Liao, Min Li, Wanxiang Che, Philip S. Yu
-
A Survey on Multilingual Large Language Models: Corpora, Alignment, and Bias,
arXiv, 2404.00929
, arxiv, pdf, cication: -1Yuemei Xu, Ling Hu, Jiayi Zhao, Zihan Qiu, Yuqi Ye, Hanwen Gu
-
ChatGPT Alternative Solutions: Large Language Models Survey,
arXiv, 2403.14469
, arxiv, pdf, cication: -1Hanieh Alipour, Nick Pendar, Kohinoor Roy
-
Large Language Models and Causal Inference in Collaboration: A Comprehensive Survey,
arXiv, 2403.09606
, arxiv, pdf, cication: -1Xiaoyu Liu, Paiheng Xu, Junda Wu, Jiaxin Yuan, Yifan Yang, Yuhang Zhou, Fuxiao Liu, Tianrui Guan, Haoliang Wang, Tong Yu
-
Knowledge Conflicts for LLMs: A Survey,
arXiv, 2403.08319
, arxiv, pdf, cication: -1Rongwu Xu, Zehan Qi, Cunxiang Wang, Hongru Wang, Yue Zhang, Wei Xu
-
Large Language Models on Tabular Data -- A Survey,
arXiv, 2402.17944
, arxiv, pdf, cication: -1Xi Fang, Weijie Xu, Fiona Anting Tan, Jiani Zhang, Ziqing Hu, Yanjun Qi, Scott Nickleach, Diego Socolinsky, Srinivasan Sengamedu, Christos Faloutsos
-
Large Language Models and Games: A Survey and Roadmap,
arXiv, 2402.18659
, arxiv, pdf, cication: -1Roberto Gallotta, Graham Todd, Marvin Zammit, Sam Earle, Antonios Liapis, Julian Togelius, Georgios N. Yannakakis
-
A Survey on Recent Advances in LLM-Based Multi-turn Dialogue Systems,
arXiv, 2402.18013
, arxiv, pdf, cication: -1Zihao Yi, Jiarui Ouyang, Yuwen Liu, Tianhao Liao, Zhe Xu, Ying Shen
-
A Survey of Large Language Models in Cybersecurity,
arXiv, 2402.16968
, arxiv, pdf, cication: -1Gabriel de Jesus Coelho da Silva, Carlos Becker Westphall
-
Large Language Models for Data Annotation: A Survey,
arXiv, 2402.13446
, arxiv, pdf, cication: -1Zhen Tan, Alimohammad Beigi, Song Wang, Ruocheng Guo, Amrita Bhattacharjee, Bohan Jiang, Mansooreh Karami, Jundong Li, Lu Cheng, Huan Liu
-
Large Language Models: A Survey,
arXiv, 2402.06196
, arxiv, pdf, cication: -1Shervin Minaee, Tomas Mikolov, Narjes Nikzad, Meysam Chenaghlu, Richard Socher, Xavier Amatriain, Jianfeng Gao
-
Continual Learning for Large Language Models: A Survey,
arXiv, 2402.01364
, arxiv, pdf, cication: -1Tongtong Wu, Linhao Luo, Yuan-Fang Li, Shirui Pan, Thuy-Trang Vu, Gholamreza Haffari
-
From Google Gemini to OpenAI Q (Q-Star): A Survey of Reshaping the Generative Artificial Intelligence (AI) Research Landscape*,
arXiv, 2312.10868
, arxiv, pdf, cication: -1Timothy R. McIntosh, Teo Susnjak, Tong Liu, Paul Watters, Malka N. Halgamuge
-
A Survey of Large Language Models Attribution,
arXiv, 2311.03731
, arxiv, pdf, cication: -1Dongfang Li, Zetian Sun, Xinshuo Hu, Zhenyu Liu, Ziyang Chen, Baotian Hu, Aiguo Wu, Min Zhang · (awesome-llm-attributions - HITsz-TMG)
-
On the Origin of LLMs: An Evolutionary Tree and Graph for 15,821 Large Language Models,
arXiv, 2307.09793
, arxiv, pdf, cication: 1Sarah Gao, Andrew Kean Gao · (constellation.sites.stanford)
-
A Survey of Large Language Models,
arXiv, 2303.18223
, arxiv, pdf, cication: 285Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong · (LLMSurvey - RUCAIBox)
-
The History of Open-Source LLMs: Imitation and Alignment (Part Three)
-
Transformer Taxonomy (the last lit review) | kipply's blog
· (jiqizhixin)
-
chat-langchain - langchain-ai
-
ChatGPT4 - yuntian-deng 🤗
-
amazing-openai-api - soulteary
Convert different model APIs into the OpenAI API format out of the box.
-
jan - janhq
Jan is an open source alternative to ChatGPT that runs 100% offline on your computer
-
GPT_API_free - chatanywhere
Free ChatGPT API Key,免费ChatGPT API,支持GPT4 API(免费),ChatGPT国内可用免费转发API,直连无需代理。可以搭配ChatBox等软件/插件使用,极大降低接口使用成本。国内即可无限制畅快聊天。
-
BricksLLM - bricks-cloud
Simplifying LLM ops in production
-
skypilot - skypilot-org
SkyPilot: Run LLMs, AI, and Batch jobs on any cloud. Get maximum savings, highest GPU availability, and managed execution—all with a simple interface.
· (blog.skypilot)
-
vllm - vllm-project
A high-throughput and memory-efficient inference and serving engine for LLMs
-
langflow - logspace-ai
⛓️ LangFlow is a UI for LangChain, designed with react-flow to provide an effortless way to experiment and prototype flows.
-
torchscale - microsoft
Foundation Architecture for (M)LLMs
-
LLM-As-Chatbot - deep-diver
LLM as a Chatbot Service
-
Llama-2-Open-Source-LLM-CPU-Inference - kennethleungty
Running Llama 2 and other Open-Source LLMs on CPU Inference Locally for Document Q&A
-
ollama - jmorganca
Get up and running with large language models locally
-
OpenLLM - bentoml
An open platform for operating large language models (LLMs) in production. Fine-tune, serve, deploy, and monitor any LLMs with ease.
-
litellm - BerriAI
Call all LLM APIs using the OpenAI format. Use Bedrock, Azure, OpenAI, Cohere, Anthropic, Ollama, Sagemaker, HuggingFace, Replicate (100+ LLMs)
-
ollama - jmorganca
Get up and running with Llama 2 and other large language models locally
-
gpu_poor - RahulSChand
Calculate GPU memory requirement & breakdown for training/inference of LLM models. Supports ggml/bnb quantization
-
leptonai - leptonai
A Pythonic framework to simplify AI service building
-
exllamav2 - turboderp
A fast inference library for running LLMs locally on modern consumer-class GPUs
-
outlines - normal-computing
Generative Model Programming
-
one-api - songquanpeng
OpenAI 接口管理 & 分发系统,支持 Azure、Anthropic Claude、Google PaLM 2、智谱 ChatGLM、百度文心一言、讯飞星火认知以及阿里通义千问,可用于二次分发管理 key,仅单可执行文件,已打包好 Docker 镜像,一键部署,开箱即用. OpenAI key management & redistribution system, using a single API for all LLMs, and features an English UI.
-
LLaMA2-Accessory - Alpha-VLLM
An Open-source Toolkit for LLM Development
-
Flowise - FlowiseAI
Drag & drop UI to build your customized LLM flow
-
simpleaichat - minimaxir
Python package for easily interfacing with chat apps, with robust features and minimal code complexity.
-
TypeChat - Microsoft
TypeChat is a library that makes it easy to build natural language interfaces using types.
-
petals - bigscience-workshop
🌸 Run large language models at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading
-
chatbox - Bin-Huang
Chatbox is a desktop app for GPT/LLM that supports Windows, Mac, Linux & Web Online
-
h2o-llmstudio - h2oai
H2O LLM Studio - a framework and no-code GUI for fine-tuning LLMs
-
LMFlow - OptimalScale
An Extensible Toolkit for Finetuning and Inference of Large Foundation Models. Large Model for All.
-
FlagAI - FlagAI-Open
FlagAI (Fast LArge-scale General AI models) is a fast, easy-to-use and extensible toolkit for large-scale model.
-
To Forget or Not? Towards Practical Knowledge Unlearning for Large Language Models,
arXiv, 2407.01920
, arxiv, pdf, cication: -1Bozhong Tian, Xiaozhuan Liang, Siyuan Cheng, Qingbin Liu, Mengru Wang, Dianbo Sui, Xi Chen, Huajun Chen, Ningyu Zhang
· (KnowUnDo - zjunlp)
-
UnUnlearning: Unlearning is not sufficient for content regulation in advanced generative AI,
arXiv, 2407.00106
, arxiv, pdf, cication: -1Ilia Shumailov, Jamie Hayes, Eleni Triantafillou, Guillermo Ortiz-Jimenez, Nicolas Papernot, Matthew Jagielski, Itay Yona, Heidi Howard, Eugene Bagdasaryan
-
What makes unlearning hard and what to do about it,
arXiv, 2406.01257
, arxiv, pdf, cication: -1Kairan Zhao, Meghdad Kurmanji, George-Octavian Bărbulescu, Eleni Triantafillou, Peter Triantafillou
-
The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning,
arXiv, 2403.03218
, arxiv, pdf, cication: -1Nathaniel Li, Alexander Pan, Anjali Gopal, Summer Yue, Daniel Berrios, Alice Gatti, Justin D. Li, Ann-Kathrin Dombrowski, Shashwat Goel, Long Phan
The WMDP benchmark is a curated dataset of over 4,000 questions designed to gauge and mitigate LLMs' knowledge in areas with misuse potential, such as biosecurity and cybersecurity.
-
Machine Unlearning of Pre-trained Large Language Models,
arXiv, 2402.15159
, arxiv, pdf, cication: -1Jin Yao, Eli Chien, Minxin Du, Xinyao Niu, Tianhao Wang, Zezhou Cheng, Xiang Yue · (Unlearning_LLM - yaojin17)
-
TOFU: A Task of Fictitious Unlearning for LLMs,
arXiv, 2401.06121
, arxiv, pdf, cication: -1Pratyush Maini, Zhili Feng, Avi Schwarzschild, Zachary C. Lipton, J. Zico Kolter
-
Large Language Model Unlearning,
arXiv, 2310.10683
, arxiv, pdf, cication: -1Yuanshun Yao, Xiaojun Xu, Yang Liu
· (jiqizhixin) · (llm_unlearn - kevinyaobytedance)
-
Improving Language Plasticity via Pretraining with Active Forgetting,
arXiv, 2307.01163
, arxiv, pdf, cication: -1Yihong Chen, Kelly Marchisio, Roberta Raileanu, David Ifeoluwa Adelani, Pontus Stenetorp, Sebastian Riedel, Mikel Artetxe
-
Announcing the first Machine Unlearning Challenge – Google Research Blog
-
Large Language Models Understand and Can be Enhanced by Emotional Stimuli,
arXiv, 2307.11760
, arxiv, pdf, cication: 6Cheng Li, Jindong Wang, Yixuan Zhang, Kaijie Zhu, Wenxin Hou, Jianxun Lian, Fang Luo, Qiang Yang, Xing Xie
-
When Large Language Models Meet Personalization: Perspectives of Challenges and Opportunities,
arXiv, 2307.16376
, arxiv, pdf, cication: 7Jin Chen, Zheng Liu, Xu Huang, Chenwang Wu, Qi Liu, Gangwei Jiang, Yuanhao Pu, Yuxuan Lei, Xiaolong Chen, Xingmei Wang
-
Personality Traits in Large Language Models,
arXiv, 2307.00184
, arxiv, pdf, cication: 17Greg Serapio-García, Mustafa Safdari, Clément Crepy, Luning Sun, Stephen Fitz, Peter Romero, Marwa Abdulhai, Aleksandra Faust, Maja Matarić
-
Efficient World Models with Context-Aware Tokenization,
arXiv, 2406.19320
, arxiv, pdf, cication: -1Vincent Micheli, Eloi Alonso, François Fleuret
· (delta-iris - vmicheli)
-
Can Language Models Serve as Text-Based World Simulators?,
arXiv, 2406.06485
, arxiv, pdf, cication: -1Ruoyao Wang, Graham Todd, Ziang Xiao, Xingdi Yuan, Marc-Alexandre Côté, Peter Clark, Peter Jansen
-
Cognitively Inspired Energy-Based World Models,
arXiv, 2406.08862
, arxiv, pdf, cication: -1Alexi Gladstone, Ganesh Nanduru, Md Mofijul Islam, Aman Chadha, Jundong Li, Tariq Iqbal
-
Pandora - maitrix-org
Pandora: Towards General World Model with Natural Language Actions and Video States · (world-model.maitrix)
-
iVideoGPT: Interactive VideoGPTs are Scalable World Models,
arXiv, 2405.15223
, arxiv, pdf, cication: -1Jialong Wu, Shaofeng Yin, Ningya Feng, Xu He, Dong Li, Jianye Hao, Mingsheng Long
-
Diffusion for World Modeling: Visual Details Matter in Atari,
arXiv, 2405.12399
, arxiv, pdf, cication: -1Eloi Alonso, Adam Jelley, Vincent Micheli, Anssi Kanervisto, Amos Storkey, Tim Pearce, François Fleuret · (diamond - eloialonso)
-
Robust agents learn causal world models,
arXiv, 2402.10877
, arxiv, pdf, cication: -1Jonathan Richens, Tom Everitt
-
Learning and Leveraging World Models in Visual Representation Learning,
arXiv, 2403.00504
, arxiv, pdf, cication: -1Quentin Garrido, Mahmoud Assran, Nicolas Ballas, Adrien Bardes, Laurent Najman, Yann LeCun
-
Video as the New Language for Real-World Decision Making,
arXiv, 2402.17139
, arxiv, pdf, cication: -1Sherry Yang, Jacob Walker, Jack Parker-Holder, Yilun Du, Jake Bruce, Andre Barreto, Pieter Abbeel, Dale Schuurmans
· (mp.weixin.qq)
-
Genie: Generative Interactive Environments,
arXiv, 2402.15391
, arxiv, pdf, cication: -1Jake Bruce, Michael Dennis, Ashley Edwards, Jack Parker-Holder, Yuge Shi, Edward Hughes, Matthew Lai, Aditi Mavalankar, Richie Steigerwald, Chris Apps
-
Diffusion World Model,
arXiv, 2402.03570
, arxiv, pdf, cication: -1Zihan Ding, Amy Zhang, Yuandong Tian, Qinqing Zheng
-
The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets,
arXiv, 2310.06824
, arxiv, pdf, cication: -1Samuel Marks, Max Tegmark · (mp.weixin.qq)
-
Language Models Represent Space and Time,
arXiv, 2310.02207
, arxiv, pdf, cication: 2Wes Gurnee, Max Tegmark · (world-models - wesg52)
-
· (mp.weixin.qq)
-
Chronos: Learning the Language of Time Series,
arXiv, 2403.07815
, arxiv, pdf, cication: -1Abdul Fatir Ansari, Lorenzo Stella, Caner Turkmen, Xiyuan Zhang, Pedro Mercado, Huibin Shen, Oleksandr Shchur, Syama Sundar Rangapuram, Sebastian Pineda Arango, Shubham Kapoor
-
Lag-Llama: Towards Foundation Models for Probabilistic Time Series Forecasting,
arXiv, 2310.08278
, arxiv, pdf, cication: 11Kashif Rasul, Arjun Ashok, Andrew Robert Williams, Hena Ghonia, Rishika Bhagwatkar, Arian Khorasani, Mohammad Javad Darvishi Bayazi, George Adamopoulos, Roland Riachi, Nadhir Hassen · (lag-llama - time-series-foundation-models)
-
Time-LLM: Time Series Forecasting by Reprogramming Large Language Models,
arXiv, 2310.01728
, arxiv, pdf, cication: 17Ming Jin, Shiyu Wang, Lintao Ma, Zhixuan Chu, James Y. Zhang, Xiaoming Shi, Pin-Yu Chen, Yuxuan Liang, Yuan-Fang Li, Shirui Pan · (time-llm - kimmeen)
-
A decoder-only foundation model for time-series forecasting,
arXiv, 2310.10688
, arxiv, pdf, cication: 2Abhimanyu Das, Weihao Kong, Rajat Sen, Yichen Zhou · (jiqizhixin)
-
No-code LLM fine-tuning and evaluation at scale – Airtrain.ai
-
Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference,
arXiv, 2403.04132
, arxiv, pdf, cication: -1Wei-Lin Chiang, Lianmin Zheng, Ying Sheng, Anastasios Nikolas Angelopoulos, Tianle Li, Dacheng Li, Hao Zhang, Banghua Zhu, Michael Jordan, Joseph E. Gonzalez
-
GodMode - smol-ai
AI Chat Browser: Fast, Full webapp access to ChatGPT / Claude / Bard / Bing / Llama2! I use this 20 times a day.
-
ChatALL - sunner
Concurrently chat with ChatGPT, Bing Chat, Bard, Alpaca, Vicuna, Claude, ChatGLM, MOSS, 讯飞星火, 文心一言 and more, discover the best answers
-
Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling,
arXiv, 2406.07522
, arxiv, pdf, cication: -1Liliang Ren, Yang Liu, Yadong Lu, Yelong Shen, Chen Liang, Weizhu Chen
-
An Empirical Study of Mamba-based Language Models,
arXiv, 2406.07887
, arxiv, pdf, cication: -1Roger Waleffe, Wonmin Byeon, Duncan Riach, Brandon Norick, Vijay Korthikanti, Tri Dao, Albert Gu, Ali Hatamizadeh, Sudhakar Singh, Deepak Narayanan
-
Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality,
arXiv, 2405.21060
, arxiv, pdf, cication: -1Tri Dao, Albert Gu · (mamba - state-spaces)
· (goombalab.github)
-
Zamba: A Compact 7B SSM Hybrid Model,
arXiv, 2405.16712
, arxiv, pdf, cication: -1Paolo Glorioso, Quentin Anthony, Yury Tokpanov, James Whittington, Jonathan Pilault, Adam Ibrahim, Beren Millidge
-
mamba-7b-rw - TRI-ML 🤗
-
The Illusion of State in State-Space Models,
arXiv, 2404.08819
, arxiv, pdf, cication: -1William Merrill, Jackson Petty, Ashish Sabharwal
-
· (twitter)
-
MambaMixer: Efficient Selective State Space Models with Dual Token and Channel Selection,
arXiv, 2403.19888
, arxiv, pdf, cication: -1Ali Behrouz, Michele Santacatterina, Ramin Zabih
-
Jamba: A Hybrid Transformer-Mamba Language Model,
arXiv, 2403.19887
, arxiv, pdf, cication: -1Opher Lieber, Barak Lenz, Hofit Bata, Gal Cohen, Jhonathan Osin, Itay Dalmedigos, Erez Safahi, Shaked Meirom, Yonatan Belinkov, Shai Shalev-Shwartz · (ai21) · (huggingface)
-
VideoMamba: State Space Model for Efficient Video Understanding,
arXiv, 2403.06977
, arxiv, pdf, cication: -1Kunchang Li, Xinhao Li, Yi Wang, Yinan He, Yali Wang, Limin Wang, Yu Qiao · (VideoMamba - OpenGVLab)
-
DenseMamba: State Space Models with Dense Hidden Connection for Efficient Large Language Models,
arXiv, 2403.00818
, arxiv, pdf, cication: -1Wei He, Kai Han, Yehui Tang, Chengcheng Wang, Yujie Yang, Tianyu Guo, Yunhe Wang
-
Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks,
arXiv, 2402.04248
, arxiv, pdf, cication: -1Jongho Park, Jaeseung Park, Zheyang Xiong, Nayoung Lee, Jaewoong Cho, Samet Oymak, Kangwook Lee, Dimitris Papailiopoulos
-
Repeat After Me: Transformers are Better than State Space Models at Copying,
arXiv, 2402.01032
, arxiv, pdf, cication: -1Samy Jelassi, David Brandfonbrener, Sham M. Kakade, Eran Malach
-
BlackMamba: Mixture of Experts for State-Space Models,
arXiv, 2402.01771
, arxiv, pdf, cication: -1Quentin Anthony, Yury Tokpanov, Paolo Glorioso, Beren Millidge · (BlackMamba - Zyphra)
· (zyphra) · (static1.squarespace)
-
MambaByte: Token-free Selective State Space Model,
arXiv, 2401.13660
, arxiv, pdf, cication: -1Junxiong Wang, Tushaar Gangavarapu, Jing Nathan Yan, Alexander M Rush
-
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts,
arXiv, 2401.04081
, arxiv, pdf, cication: -1Maciej Pióro, Kamil Ciebiera, Krystian Król, Jan Ludziejewski, Sebastian Jaszczur
-
Mamba: Linear-Time Sequence Modeling with Selective State Spaces,
arXiv, 2312.00752
, arxiv, pdf, cication: -1Albert Gu, Tri Dao · (mamba - state-spaces)
- Do we need Attention? A Mamba Primer - YouTube
- Mamba Explained
- Recent Mamba Papers - a julien-c Collection
- Mamba: Linear-Time Sequence Modeling with Selective State Spaces (Paper Explained) - YouTube
-
Learning to (Learn at Test Time): RNNs with Expressive Hidden States,
arXiv, 2407.04620
, arxiv, pdf, cication: -1Yu Sun, Xinhao Li, Karan Dalal, Jiarui Xu, Arjun Vikram, Genghan Zhang, Yann Dubois, Xinlei Chen, Xiaolong Wang, Sanmi Koyejo
-
Simple and Effective Masked Diffusion Language Models,
arXiv, 2406.07524
, arxiv, pdf, cication: -1Subham Sekhar Sahoo, Marianne Arriola, Yair Schiff, Aaron Gokaslan, Edgar Marroquin, Justin T Chiu, Alexander Rush, Volodymyr Kuleshov · (mdlm - kuleshov-group)
-
A Unified Implicit Attention Formulation for Gated-Linear Recurrent Sequence Models,
arXiv, 2405.16504
, arxiv, pdf, cication: -1Itamar Zimerman, Ameen Ali, Lior Wolf · (UnifiedImplicitAttnRepr - Itamarzimm)
-
Attention as an RNN,
arXiv, 2405.13956
, arxiv, pdf, cication: -1Leo Feng, Frederick Tung, Hossein Hajimirsadeghi, Mohamed Osama Ahmed, Yoshua Bengio, Greg Mori · (jiqizhixin)
-
linear_open_lm - tri-ml
A repository for research on medium sized language models.
-
xLSTM: Extended Long Short-Term Memory,
arXiv, 2405.04517
, arxiv, pdf, cication: -1Maximilian Beck, Korbinian Pöppel, Markus Spanring, Andreas Auer, Oleksandra Prudnikova, Michael Kopp, Günter Klambauer, Johannes Brandstetter, Sepp Hochreiter
-
HGRN2: Gated Linear RNNs with State Expansion,
arXiv, 2404.07904
, arxiv, pdf, cication: -1Zhen Qin, Songlin Yang, Weixuan Sun, Xuyang Shen, Dong Li, Weigao Sun, Yiran Zhong
-
Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence,
arXiv, 2404.05892
, arxiv, pdf, cication: -1Bo Peng, Daniel Goldstein, Quentin Anthony, Alon Albalak, Eric Alcaide, Stella Biderman, Eugene Cheah, Xingjian Du, Teddy Ferdinan, Haowen Hou · (RWKV-LM - RWKV)
· (ChatRWKV - RWKV)
-
RWKV-LM - RWKV
RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.
-
Infini-gram: Scaling Unbounded n-gram Language Models to a Trillion Tokens,
arXiv, 2401.17377
, arxiv, pdf, cication: -1Jiacheng Liu, Sewon Min, Luke Zettlemoyer, Yejin Choi, Hannaneh Hajishirzi
-
Transfer Learning for Text Diffusion Models,
arXiv, 2401.17181
, arxiv, pdf, cication: -1Kehang Han, Kathleen Kenealy, Aditya Barua, Noah Fiedel, Noah Constant
-
🦅 Eagle 7B : Soaring past Transformers with 1 Trillion Tokens Across 100+ Languages (RWKV-v5)
-
TCNCA: Temporal Convolution Network with Chunked Attention for Scalable Sequence Processing,
arXiv, 2312.05605
, arxiv, pdf, cication: -1Aleksandar Terzic, Michael Hersche, Geethan Karunaratne, Luca Benini, Abu Sebastian, Abbas Rahimi
-
GIVT: Generative Infinite-Vocabulary Transformers,
arXiv, 2312.02116
, arxiv, pdf, cication: -1Michael Tschannen, Cian Eastwood, Fabian Mentzer
-
Text Rendering Strategies for Pixel Language Models,
arXiv, 2311.00522
, arxiv, pdf, cication: -1Jonas F. Lotz, Elizabeth Salesky, Phillip Rust, Desmond Elliott
-
Retentive Network: A Successor to Transformer for Large Language Models,
arXiv, 2307.08621
, arxiv, pdf, cication: 14Yutao Sun, Li Dong, Shaohan Huang, Shuming Ma, Yuqing Xia, Jilong Xue, Jianyong Wang, Furu Wei
-
Copy Is All You Need,
arXiv, 2307.06962
, arxiv, pdf, cication: 217Tian Lan, Deng Cai, Yan Wang, Heyan Huang, Xian-Ling Mao
-
BiPhone: Modeling Inter Language Phonetic Influences in Text,
arXiv, 2307.03322
, arxiv, pdf, cication: -1Abhirut Gupta, Ananya B. Sai, Richard Sproat, Yuri Vasilevski, James S. Ren, Ambarish Jash, Sukhdeep S. Sodhi, Aravindan Raghuveer
-
Deep Language Networks: Joint Prompt Training of Stacked LLMs using Variational Inference,
arXiv, 2306.12509
, arxiv, pdf, cication: 4Alessandro Sordoni, Xingdi Yuan, Marc-Alexandre Côté, Matheus Pereira, Adam Trischler, Ziang Xiao, Arian Hosseini, Friederike Niedtner, Nicolas Le Roux
-
Backpack Language Models,
arXiv, 2305.16765
, arxiv, pdf, cication: 4John Hewitt, John Thickstun, Christopher D. Manning, Percy Liang · (jiqizhixin) · (mp.weixin.qq)
-
MarkLLM: An Open-Source Toolkit for LLM Watermarking,
arXiv, 2405.10051
, arxiv, pdf, cication: -1Leyi Pan, Aiwei Liu, Zhiwei He, Zitian Gao, Xuandong Zhao, Yijian Lu, Binglin Zhou, Shuliang Liu, Xuming Hu, Lijie Wen · (markllm - thu-bpm)
-
Min-K%++: Improved Baseline for Detecting Pre-Training Data from Large Language Models,
arXiv, 2404.02936
, arxiv, pdf, cication: -1Jingyang Zhang, Jingwei Sun, Eric Yeats, Yang Ouyang, Martin Kuo, Jianyi Zhang, Hao Yang, Hai Li · (zjysteven.github) · (mink-plus-plus - zjysteven)
-
Decoding the AI Pen: Techniques and Challenges in Detecting AI-Generated Text,
arXiv, 2403.05750
, arxiv, pdf, cication: -1Sara Abdali, Richard Anarfi, CJ Barberan, Jia He
-
Watermarking Makes Language Models Radioactive,
arXiv, 2402.14904
, arxiv, pdf, cication: -1Tom Sander, Pierre Fernandez, Alain Durmus, Matthijs Douze, Teddy Furon
-
HuRef: HUman-REadable Fingerprint for Large Language Models,
arXiv, 2312.04828
, arxiv, pdf, cication: -1Boyi Zeng, Chenghu Zhou, Xinbing Wang, Zhouhan Lin · (jiqizhixin)
-
LLM-generated-text-detection - thunlp
-
Adaptive Text Watermark for Large Language Models,
arXiv, 2401.13927
, arxiv, pdf, cication: -1Yepeng Liu, Yuheng Bu
-
Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text,
arXiv, 2401.12070
, arxiv, pdf, cication: -1Abhimanyu Hans, Avi Schwarzschild, Valeriia Cherepanova, Hamid Kazemi, Aniruddha Saha, Micah Goldblum, Jonas Geiping, Tom Goldstein
-
LLM-as-a-Coauthor: The Challenges of Detecting LLM-Human Mixcase,
arXiv, 2401.05952
, arxiv, pdf, cication: -1Chujie Gao, Dongping Chen, Qihui Zhang, Yue Huang, Yao Wan, Lichao Sun · (MixSet - Dongping-Chen)
-
A Survey of Text Watermarking in the Era of Large Language Models,
arXiv, 2312.07913
, arxiv, pdf, cication: -1Aiwei Liu, Leyi Pan, Yijian Lu, Jingjing Li, Xuming Hu, Xi Zhang, Lijie Wen, Irwin King, Hui Xiong, Philip S. Yu · (jiqizhixin)
-
Fast-DetectGPT: Efficient Zero-Shot Detection of Machine-Generated Text via Conditional Probability Curvature,
arXiv, 2310.05130
, arxiv, pdf, cication: 17Guangsheng Bao, Yanbin Zhao, Zhiyang Teng, Linyi Yang, Yue Zhang · (fast-detect-gpt - baoguangsheng)
· (jiqizhixin)
-
Ghostbuster: Detecting Text Ghostwritten by Large Language Models,
arXiv, 2305.15047
, arxiv, pdf, cication: 6Vivek Verma, Eve Fleisig, Nicholas Tomlin, Dan Klein · (bair.berkeley)
-
‘ChatGPT detector’ catches AI-generated papers with unprecedented accuracy
-
GPT detectors are biased against non-native English writers,
arXiv, 2304.02819
, arxiv, pdf, cication: 42Weixin Liang, Mert Yuksekgonul, Yining Mao, Eric Wu, James Zou
-
Can LLM-Generated Misinformation Be Detected?,
arXiv, 2309.13788
, arxiv, pdf, cication: -1Canyu Chen, Kai Shu · (llm-misinformation - llm-misinformation)
-
Three Bricks to Consolidate Watermarks for Large Language Models,
arXiv, 2308.00113
, arxiv, pdf, cication: 3Pierre Fernandez, Antoine Chaffin, Karim Tit, Vivien Chappelier, Teddy Furon
-
Robust Distortion-free Watermarks for Language Models,
arXiv, 2307.15593
, arxiv, pdf, cication: 9Rohith Kuditipudi, John Thickstun, Tatsunori Hashimoto, Percy Liang
-
Can AI-Generated Text be Reliably Detected?,
arXiv, 2303.11156
, arxiv, pdf, cication: 93Vinu Sankar Sadasivan, Aounon Kumar, Sriram Balasubramanian, Wenxiao Wang, Soheil Feizi · (mp.weixin.qq)
-
Digital tool spots academic text spawned by ChatGPT with 99% accuracy | The University of Kansas
· (mp.weixin.qq)
-
Can LLMs Learn by Teaching? A Preliminary Study,
arXiv, 2406.14629
, arxiv, pdf, cication: -1Xuefei Ning, Zifu Wang, Shiyao Li, Zinan Lin, Peiran Yao, Tianyu Fu, Matthew B. Blaschko, Guohao Dai, Huazhong Yang, Yu Wang
· (lbt - imagination-research)
-
The Missing Curve Detectors of InceptionV1: Applying Sparse Autoencoders to InceptionV1 Early Vision,
arXiv, 2406.03662
, arxiv, pdf, cication: -1Liv Gorton
· (livgorton)
-
From RAGs to rich parameters: Probing how language models utilize external knowledge over parametric information for factual queries,
arXiv, 2406.12824
, arxiv, pdf, cication: -1Hitesh Wadhwa, Rahul Seetharaman, Somyaa Aggarwal, Reshmi Ghosh, Samyadeep Basu, Soundararajan Srinivasan, Wenlong Zhao, Shreyas Chaudhari, Ehsan Aghazadeh
-
How Do Large Language Models Acquire Factual Knowledge During Pretraining?,
arXiv, 2406.11813
, arxiv, pdf, cication: -1Hoyeon Chang, Jinho Park, Seonghyeon Ye, Sohee Yang, Youngkyung Seo, Du-Seong Chang, Minjoon Seo
-
How Do Large Language Models Acquire Factual Knowledge During Pretraining?,
arXiv, 2406.11813
, arxiv, pdf, cication: -1Hoyeon Chang, Jinho Park, Seonghyeon Ye, Sohee Yang, Youngkyung Seo, Du-Seong Chang, Minjoon Seo
-
What Do Neural Networks Really Learn? Exploring the Brain of an AI Model - YouTube
-
Large Language Model Confidence Estimation via Black-Box Access,
arXiv, 2406.04370
, arxiv, pdf, cication: -1Tejaswini Pedapati, Amit Dhurandhar, Soumya Ghosh, Soham Dan, Prasanna Sattigeri
-
Not All Language Model Features Are Linear,
arXiv, 2405.14860
, arxiv, pdf, cication: -1Joshua Engels, Isaac Liao, Eric J. Michaud, Wes Gurnee, Max Tegmark
-
Your Transformer is Secretly Linear,
arXiv, 2405.12250
, arxiv, pdf, cication: -1Anton Razzhigaev, Matvey Mikhalchuk, Elizaveta Goncharova, Nikolai Gerasimenko, Ivan Oseledets, Denis Dimitrov, Andrey Kuznetsov
-
Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet
· (anthropic)
-
The Platonic Representation Hypothesis,
arXiv, 2405.07987
, arxiv, pdf, cication: -1Minyoung Huh, Brian Cheung, Tongzhou Wang, Phillip Isola · (platonic-rep - minyoungg)
· (phillipi.github)
-
Fishing for Magikarp: Automatically Detecting Under-trained Tokens in Large Language Models,
arXiv, 2405.05417
, arxiv, pdf, cication: -1Sander Land, Max Bartolo · (magikarp - cohere-ai)
-
A Primer on the Inner Workings of Transformer-based Language Models,
arXiv, 2405.00208
, arxiv, pdf, cication: -1Javier Ferrando, Gabriele Sarti, Arianna Bisazza, Marta R. Costa-jussà
-
Understanding Emergent Abilities of Language Models from the Loss Perspective,
arXiv, 2403.15796
, arxiv, pdf, cication: -1Zhengxiao Du, Aohan Zeng, Yuxiao Dong, Jie Tang
-
Transformers Can Represent
$n$ -gram Language Models,arXiv, 2404.14994
, arxiv, pdf, cication: -1Anej Svete, Ryan Cotterell
-
A Multimodal Automated Interpretability Agent,
arXiv, 2404.14394
, arxiv, pdf, cication: -1Tamar Rott Shaham, Sarah Schwettmann, Franklin Wang, Achyuta Rajaram, Evan Hernandez, Jacob Andreas, Antonio Torralba
-
llm-transparency-tool - facebookresearch
-
Compression Represents Intelligence Linearly,
arXiv, 2404.09937
, arxiv, pdf, cication: -1Yuzhen Huang, Jinghan Zhang, Zifei Shan, Junxian He
· (huggingface) · (llm-compression-intelligence - hkust-nlp)
-
color-coded-text-generation - joaogante 🤗
-
LVLM-Intrepret: An Interpretability Tool for Large Vision-Language Models,
arXiv, 2404.03118
, arxiv, pdf, cication: -1Gabriela Ben Melech Stan, Raanan Yehezkel Rohekar, Yaniv Gurwicz, Matthew Lyle Olson, Anahita Bhiwandiwalla, Estelle Aflalo, Chenfei Wu, Nan Duan, Shao-Yen Tseng, Vasudev Lal
-
Understanding Emergent Abilities of Language Models from the Loss Perspective,
arXiv, 2403.15796
, arxiv, pdf, cication: -1Zhengxiao Du, Aohan Zeng, Yuxiao Dong, Jie Tang
-
Source-Aware Training Enables Knowledge Attribution in Language Models,
arXiv, 2404.01019
, arxiv, pdf, cication: -1Muhammad Khalifa, David Wadden, Emma Strubell, Honglak Lee, Lu Wang, Iz Beltagy, Hao Peng
-
Future Lens: Anticipating Subsequent Tokens from a Single Hidden State,
arXiv, 2311.04897
, arxiv, pdf, cication: -1Koyena Pal, Jiuding Sun, Andrew Yuan, Byron C. Wallace, David Bau
-
Localizing Paragraph Memorization in Language Models,
arXiv, 2403.19851
, arxiv, pdf, cication: -1Niklas Stoehr, Mitchell Gordon, Chiyuan Zhang, Owen Lewis
-
pyvene: A Library for Understanding and Improving PyTorch Models via Interventions,
arXiv, 2403.07809
, arxiv, pdf, cication: -1Zhengxuan Wu, Atticus Geiger, Aryaman Arora, Jing Huang, Zheng Wang, Noah D. Goodman, Christopher D. Manning, Christopher Potts · (pyvene - stanfordnlp)
-
transformer-debugger - openai
· (jiqizhixin)
-
Logits of API-Protected LLMs Leak Proprietary Information,
arXiv, 2403.09539
, arxiv, pdf, cication: -1Matthew Finlayson, Xiang Ren, Swabha Swayamdipta
· (qbitai)
-
Stealing Part of a Production Language Model,
arXiv, 2403.06634
, arxiv, pdf, cication: -1Nicholas Carlini, Daniel Paleka, Krishnamurthy Dj Dvijotham, Thomas Steinke, Jonathan Hayase, A. Feder Cooper, Katherine Lee, Matthew Jagielski, Milad Nasr, Arthur Conmy
· (qbitai)
extracting information from black-box language models like OpenAI's ChatGPT and Google's PaLM-2 (revealing for the first time the hidden dimensions of these models)
-
AtP: An efficient and scalable method for localizing LLM behaviour to components*,
arXiv, 2403.00745
, arxiv, pdf, cication: -1János Kramár, Tom Lieberum, Rohin Shah, Neel Nanda
-
A phase transition between positional and semantic learning in a solvable model of dot-product attention,
arXiv, 2402.03902
, arxiv, pdf, cication: -1Hugo Cui, Freya Behrens, Florent Krzakala, Lenka Zdeborová
-
fractal - sohl-dickstein
The boundary of neural network trainability is fractal
-
Rethinking Interpretability in the Era of Large Language Models,
arXiv, 2402.01761
, arxiv, pdf, cication: -1Chandan Singh, Jeevana Priya Inala, Michel Galley, Rich Caruana, Jianfeng Gao
-
Can Large Language Models Understand Context?,
arXiv, 2402.00858
, arxiv, pdf, cication: -1Yilun Zhu, Joel Ruben Antony Moniz, Shruti Bhargava, Jiarui Lu, Dhivya Piraviperumal, Site Li, Yuan Zhang, Hong Yu, Bo-Hsiang Tseng
-
Patchscope: A Unifying Framework for Inspecting Hidden Representations of Language Models,
arXiv, 2401.06102
, arxiv, pdf, cication: -1Asma Ghandeharioun, Avi Caciularu, Adam Pearce, Lucas Dixon, Mor Geva
-
Vayu Robotics Blog - Interpretable End-to-End Robot Navigation
-
awesome-llm-interpretability - JShollaj
A curated list of Large Language Model (LLM) Interpretability resources.
-
Challenges with unsupervised LLM knowledge discovery,
arXiv, 2312.10029
, arxiv, pdf, cication: -1Sebastian Farquhar, Vikrant Varma, Zachary Kenton, Johannes Gasteiger, Vladimir Mikulik, Rohin Shah
-
Using Captum to Explain Generative Language Models,
arXiv, 2312.05491
, arxiv, pdf, cication: -1Vivek Miglani, Aobo Yang, Aram H. Markosyan, Diego Garcia-Olano, Narine Kokhlikyan
-
Beyond Surface: Probing LLaMA Across Scales and Layers,
arXiv, 2312.04333
, arxiv, pdf, cication: -1Nuo Chen, Ning Wu, Shining Liang, Ming Gong, Linjun Shou, Dongmei Zhang, Jia Li
-
llm-viz - bbycroft
3D Visualization of an GPT-style LLM
-
White-Box Transformers via Sparse Rate Reduction: Compression Is All There Is?,
arXiv, 2311.13110
, arxiv, pdf, cication: -1Yaodong Yu, Sam Buchanan, Druv Pai, Tianzhe Chu, Ziyang Wu, Shengbang Tong, Hao Bai, Yuexiang Zhai, Benjamin D. Haeffele, Yi Ma · (mp.weixin.qq)
-
Pretraining Data Mixtures Enable Narrow Model Selection Capabilities in Transformer Models,
arXiv, 2311.00871
, arxiv, pdf, cication: -1Steve Yadlowsky, Lyric Doshi, Nilesh Tripuraneni · (jiqizhixin)
-
The Generative AI Paradox: "What It Can Create, It May Not Understand",
arXiv, 2311.00059
, arxiv, pdf, cication: -1Peter West, Ximing Lu, Nouha Dziri, Faeze Brahman, Linjie Li, Jena D. Hwang, Liwei Jiang, Jillian Fisher, Abhilasha Ravichander, Khyathi Chandu
-
The Impact of Depth and Width on Transformer Language Model Generalization,
arXiv, 2310.19956
, arxiv, pdf, cication: -1Jackson Petty, Sjoerd van Steenkiste, Ishita Dasgupta, Fei Sha, Dan Garrette, Tal Linzen
-
Can Large Language Models Explain Themselves? A Study of LLM-Generated Self-Explanations,
arXiv, 2310.11207
, arxiv, pdf, cication: -1Shiyuan Huang, Siddarth Mamidanna, Shreedhar Jangam, Yilun Zhou, Leilani H. Gilpin
-
Towards Monosemanticity: Decomposing Language Models With Dictionary Learning
· (qbitai)
-
Representation Engineering: A Top-Down Approach to AI Transparency,
arXiv, 2310.01405
, arxiv, pdf, cication: 5Andy Zou, Long Phan, Sarah Chen, James Campbell, Phillip Guo, Richard Ren, Alexander Pan, Xuwang Yin, Mantas Mazeika, Ann-Kathrin Dombrowski · (representation-engineering - andyzoujm)
· (mp.weixin.qq)
-
Attention Satisfies: A Constraint-Satisfaction Lens on Factual Errors of Language Models,
arXiv, 2309.15098
, arxiv, pdf, cication: -1Mert Yuksekgonul, Varun Chandrasekaran, Erik Jones, Suriya Gunasekar, Ranjita Naik, Hamid Palangi, Ece Kamar, Besmira Nushi
-
Language Modeling Is Compression,
arXiv, 2309.10668
, arxiv, pdf, cication: 7Grégoire Delétang, Anian Ruoss, Paul-Ambroise Duquenne, Elliot Catt, Tim Genewein, Christopher Mattern, Jordi Grau-Moya, Li Kevin Wenliang, Matthew Aitchison, Laurent Orseau
-
Sorted LLaMA: Unlocking the Potential of Intermediate Layers of Large Language Models for Dynamic Inference Using Sorted Fine-Tuning (SoFT),
arXiv, 2309.08968
, arxiv, pdf, cication: -1Parsa Kavehzadeh, Mojtaba Valipour, Marzieh Tahaei, Ali Ghodsi, Boxing Chen, Mehdi Rezagholizadeh
-
Sparse Autoencoders Find Highly Interpretable Features in Language Models,
arXiv, 2309.08600
, arxiv, pdf, cication: 5Hoagy Cunningham, Aidan Ewart, Logan Riggs, Robert Huben, Lee Sharkey
-
Human Language Understanding & Reasoning
· (mp.weixin.qq)
-
Do Machine Learning Models Memorize or Generalize?
· (mp.weixin.qq)
-
CIMI - Daftstone
· (jiqizhixin)
-
Do Machine Learning Models Memorize or Generalize?
· (qbitai)
-
Studying Large Language Model Generalization with Influence Functions,
arXiv, 2308.03296
, arxiv, pdf, cication: 12Roger Grosse, Juhan Bae, Cem Anil, Nelson Elhage, Alex Tamkin, Amirhossein Tajdini, Benoit Steiner, Dustin Li, Esin Durmus, Ethan Perez
-
Scan and Snap: Understanding Training Dynamics and Token Composition in 1-layer Transformer,
arXiv, 2305.16380
, arxiv, pdf, cication: 6Yuandong Tian, Yiping Wang, Beidi Chen, Simon Du · (mp.weixin.qq)
-
Label Words are Anchors: An Information Flow Perspective for Understanding In-Context Learning,
arXiv, 2305.14160
, arxiv, pdf, cication: -1Lean Wang, Lei Li, Damai Dai, Deli Chen, Hao Zhou, Fandong Meng, Jie Zhou, Xu Sun · (qbitai)
-
How Much Does Attention Actually Attend? Questioning the Importance of Attention in Pretrained Transformers,
arXiv, 2211.03495
, arxiv, pdf, cication: 16Michael Hassid, Hao Peng, Daniel Rotem, Jungo Kasai, Ivan Montero, Noah A. Smith, Roy Schwartz
-
Time is Encoded in the Weights of Finetuned Language Models,
arXiv, 2312.13401
, arxiv, pdf, cication: -1Kai Nylund, Suchin Gururangan, Noah A. Smith
-
Pretraining Data Mixtures Enable Narrow Model Selection Capabilities in Transformer Models,
arXiv, 2311.00871
, arxiv, pdf, cication: -1Steve Yadlowsky, Lyric Doshi, Nilesh Tripuraneni
-
Composable Interventions for Language Models,
arXiv, 2407.06483
, arxiv, pdf, cication: -1Arinbjorn Kolbeinsson, Kyle O'Brien, Tianjin Huang, Shanghua Gao, Shiwei Liu, Jonathan Richard Schwarz, Anurag Vaidya, Faisal Mahmood, Marinka Zitnik, Tianlong Chen
-
Breaking Boundaries: Investigating the Effects of Model Editing on Cross-linguistic Performance,
arXiv, 2406.11139
, arxiv, pdf, cication: -1Somnath Banerjee, Avik Halder, Rajarshi Mandal, Sayan Layek, Ian Soboroff, Rima Hazra, Animesh Mukherjee
-
Is Bigger Edit Batch Size Always Better? -- An Empirical Study on Model Editing with Llama-3,
arXiv, 2405.00664
, arxiv, pdf, cication: -1Junsang Yoon, Akshat Gupta, Gopala Anumanchipalli
-
Robust and Scalable Model Editing for Large Language Models,
arXiv, 2403.17431
, arxiv, pdf, cication: -1Yingfa Chen, Zhengyan Zhang, Xu Han, Chaojun Xiao, Zhiyuan Liu, Chen Chen, Kuai Li, Tao Yang, Maosong Sun · (EREN - thunlp)
-
Editing Conceptual Knowledge for Large Language Models,
arXiv, 2403.06259
, arxiv, pdf, cication: -1Xiaohan Wang, Shengyu Mao, Ningyu Zhang, Shumin Deng, Yunzhi Yao, Yue Shen, Lei Liang, Jinjie Gu, Huajun Chen
-
A Comprehensive Study of Knowledge Editing for Large Language Models,
arXiv, 2401.01286
, arxiv, pdf, cication: -1Ningyu Zhang, Yunzhi Yao, Bozhong Tian, Peng Wang, Shumin Deng, Mengru Wang, Zekun Xi, Shengyu Mao, Jintian Zhang, Yuansheng Ni
-
Evaluating the Ripple Effects of Knowledge Editing in Language Models,
arXiv, 2307.12976
, arxiv, pdf, cication: 5Roi Cohen, Eden Biran, Ori Yoran, Amir Globerson, Mor Geva
-
Editing Large Language Models: Problems, Methods, and Opportunities,
arXiv, 2305.13172
, arxiv, pdf, cication: 12Yunzhi Yao, Peng Wang, Bozhong Tian, Siyuan Cheng, Zhoubo Li, Shumin Deng, Huajun Chen, Ningyu Zhang · (easyedit - zjunlp)
-
ModelEditingPapers - zjunlp
Must-read Papers on Model Editing.
-
Open-Endedness is Essential for Artificial Superhuman Intelligence,
arXiv, 2406.04268
, arxiv, pdf, cication: -1Edward Hughes, Michael Dennis, Jack Parker-Holder, Feryal Behbahani, Aditi Mavalankar, Yuge Shi, Tom Schaul, Tim Rocktaschel
-
Artificial Generational Intelligence: Cultural Accumulation in Reinforcement Learning,
arXiv, 2406.00392
, arxiv, pdf, cication: -1Jonathan Cook, Chris Lu, Edward Hughes, Joel Z. Leibo, Jakob Foerster
-
My AI Timelines Have Sped Up (Again)
· (mp.weixin.qq)
-
Computing Power and the Governance of Artificial Intelligence,
arXiv, 2402.08797
, arxiv, pdf, cication: -1Girish Sastry, Lennart Heim, Haydn Belfield, Markus Anderljung, Miles Brundage, Julian Hazell, Cullen O'Keefe, Gillian K. Hadfield, Richard Ngo, Konstantin Pilz
-
Perspectives on the State and Future of Deep Learning -- 2023,
arXiv, 2312.09323
, arxiv, pdf, cication: -1Micah Goldblum, Anima Anandkumar, Richard Baraniuk, Tom Goldstein, Kyunghyun Cho, Zachary C Lipton, Melanie Mitchell, Preetum Nakkiran, Max Welling, Andrew Gordon Wilson
-
AI and Open Source in 2023 - by Sebastian Raschka, PhD
· (mp.weixin.qq)
-
Role play with large language models | Nature
· (qbitai)
-
Levels of AGI: Operationalizing Progress on the Path to AGI,
arXiv, 2311.02462
, arxiv, pdf, cication: -1Meredith Ringel Morris, Jascha Sohl-dickstein, Noah Fiedel, Tris Warkentin, Allan Dafoe, Aleksandra Faust, Clement Farabet, Shane Legg
-
Consciousness in Artificial Intelligence: Insights from the Science of Consciousness,
arXiv, 2308.08708
, arxiv, pdf, cication: 15Patrick Butlin, Robert Long, Eric Elmoznino, Yoshua Bengio, Jonathan Birch, Axel Constant, George Deane, Stephen M. Fleming, Chris Frith, Xu Ji · (jiqizhixin)
-
Collective Intelligence for Deep Learning: A Survey of Recent Developments | 大トロ
- 融合RL与LLM思想,探寻世界模型以迈向AGI「中·下篇」
- 融合RL与LLM思想,探寻世界模型以迈向AGI「上篇」
- 强化学习之父Richard Sutton:通往AGI的另一种可能
- 图灵奖得主、神经网络之父Hinton最新公开演讲:数字智能会取代生物智能吗?(全文及PPT)
- 好问题比好答案更重要|沈向洋大模型五问
-
Llamas Know What GPTs Don't Show: Surrogate Models for Confidence Estimation,
arXiv, 2311.08877
, arxiv, pdf, cication: -1Vaishnavi Shrivastava, Percy Liang, Ananya Kumar
-
Do Large Language Models Know What They Don't Know?,
arXiv, 2305.18153
, arxiv, pdf, cication: 16Zhangyue Yin, Qiushi Sun, Qipeng Guo, Jiawen Wu, Xipeng Qiu, Xuanjing Huang
-
Tokenization Falling Short: The Curse of Tokenization,
arXiv, 2406.11687
, arxiv, pdf, cication: -1Yekun Chai, Yewei Fang, Qiwei Peng, Xuhong Li
-
Zero-Shot Tokenizer Transfer,
arXiv, 2405.07883
, arxiv, pdf, cication: -1 -
Toward a Theory of Tokenization in LLMs,
arXiv, 2404.08335
, arxiv, pdf, cication: -1Nived Rajaraman, Jiantao Jiao, Kannan Ramchandran
-
Training LLMs over Neurally Compressed Text,
arXiv, 2404.03626
, arxiv, pdf, cication: -1Brian Lester, Jaehoon Lee, Alex Alemi, Jeffrey Pennington, Adam Roberts, Jascha Sohl-Dickstein, Noah Constant
-
Greed is All You Need: An Evaluation of Tokenizer Inference Methods,
arXiv, 2403.01289
, arxiv, pdf, cication: -1Omri Uzan, Craig W. Schmidt, Chris Tanner, Yuval Pinter
-
xT: Nested Tokenization for Larger Context in Large Images,
arXiv, 2403.01915
, arxiv, pdf, cication: -1Ritwik Gupta, Shufan Li, Tyler Zhu, Jitendra Malik, Trevor Darrell, Karttikeya Mangalam · (xT - bair-climate-initiative)
-
Prompt2Model: Generating Deployable Models from Natural Language Instructions,
arXiv, 2308.12261
, arxiv, pdf, cication: -1Vijay Viswanathan, Chenyang Zhao, Amanda Bertsch, Tongshuang Wu, Graham Neubig · (mp.weixin.qq)
-
xVal: A Continuous Number Encoding for Large Language Models,
arXiv, 2310.02989
, arxiv, pdf, cication: -1Siavash Golkar, Mariel Pettee, Michael Eickenberg, Alberto Bietti, Miles Cranmer, Geraud Krawezik, Francois Lanusse, Michael McCabe, Ruben Ohana, Liam Parker · (mp.weixin.qq)
-
GraphGPT: Graph Instruction Tuning for Large Language Models,
arXiv, 2310.13023
, arxiv, pdf, cication: 2Jiabin Tang, Yuhao Yang, Wei Wei, Lei Shi, Lixin Su, Suqi Cheng, Dawei Yin, Chao Huang · (mp.weixin.qq)
-
A taxonomy and review of generalization research in NLP | Nature Machine Intelligence
-
Neurons in Large Language Models: Dead, N-gram, Positional,
arXiv, 2309.04827
, arxiv, pdf, cication: -1Elena Voita, Javier Ferrando, Christoforos Nalmpantis
-
On the Societal Impact of Open Foundation Models,
arXiv, 2403.07918
, arxiv, pdf, cication: -1Sayash Kapoor, Rishi Bommasani, Kevin Klyman, Shayne Longpre, Ashwin Ramaswami, Peter Cihon, Aspen Hopkins, Kevin Bankston, Stella Biderman, Miranda Bogen · (crfm.stanford)
-
llama3-from-scratch - naklecha
llama3 implementation one matrix multiplication at a time
-
cookbook - learn 🤗
-
minbpe - karpathy
Minimal, clean, code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.
-
LLMs-from-scratch - rasbt
Implementing a ChatGPT-like LLM from scratch, step by step
-
MachineLearning-QandAI-book - rasbt
Machine Learning Q and AI book
-
ML-YouTube-Courses - dair-ai
📺 Discover the latest machine learning / AI courses on YouTube.
-
llm-course - mlabonne
Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.
-
[1hr Talk] Intro to Large Language Models - YouTube
· (drive.google) · (drive.google)
· (mp.weixin.qq)
-
Stanford CS224N: Natural Language Processing with Deep Learning | 2023 - YouTube
-
lectures - cuda-mode
Material for cuda-mode lectures
-
introduce CUDA in a way that will be accessible to Python folks
· (youtu)
-
· (twitter)
-
Deep Learning Foundations by Soheil Feizi : Large Language Models - YouTube
-
But what is a GPT? Visual intro to Transformers | Deep learning, chapter 5 - YouTube
-
Unsupervised Learning: Redpoint's AI Podcast - YouTube
· (youtube)
-
Making AI accessible with Andrej Karpathy and Stephanie Zhan - YouTube
· (mp.weixin.qq)
-
Deep Dive into Transformers by Hand ✍︎ | by Srijanie Dey, PhD | Towards Data Science
-
Transformers-Tutorials - NielsRogge
This repository contains demos I made with the Transformers library by HuggingFace.
-
mamba_state_space_model_paper_list - event-ahu
[Mamba-Survey-2024] Paper list for State-Space-Model/Mamba and it's Applications
-
Awesome-Mamba-Papers - yyyujintang
Awesome Papers related to Mamba.
-
awesome-generative-ai-guide - aishwaryanr
A one stop repository for generative AI research updates, interview resources, notebooks and much more!
-
awesome-local-ai - janhq
An awesome repository of local AI tools
-
how-to-optim-algorithm-in-cuda - BBuf
how to optimize some algorithm in cuda.