A primer on large language models (LLM) and ChatGPT as of April 2023
- A special side version for 谷雨书苑
- A new side project to Brainstorm how to possibly train better chatbots than ChatGPT
Intro: Building blocks & capabilities
- LM and LLM
- Transformer
- How are LLMs trained?
- LLM decoding
- LLM training in parallel
- LLM capabilities, advanced capabilities and insance capabilities
Core: Models, players, concepts, toolings & applications
- Selected LLMs
- BERT
- GPT family
- T5
- GLM
- LLM Players
- big companies
- institutes and startups)
- LLM concepts
- Pretraining, finetuning, prompt tuning
- Scaling laws
- Prompt engineering
- Prompt tuning (soft prompt)
- "Emergent abilities"
- Chain of thoughts (CoT)
- Least-to-most prompting
- Hallucination
- Retrieval LLM
- RLHF for LLM
- Mixture of Experts (MoE) LLM
- LLM Tooling
- Huggingface
- TF hub, Torch NLP, PaddleNLP
- Transformers lib, Colossal-ai, Ray and NanoGPT
- Other toolings
- LLM Applications
Bonus: Deep dive into ChatGPT
- ChatGPT model evolvment
- Research (InstructGPT) overview
- Possible next steps for ChatGPT?
- Engineering discussion
- Rough estimate to train/server chatgpt
- My thoughts on technical challenges to reproduce ChatGPT
- What less optimal choices Google made related to ChatGPT delayed Google to release similar product?
- Fun facts
- ChatGPT challenges
- Final question: Will ChatGPT become next iPhone, or next Alexa, or next ClubHouse?
- LLM basics
- RL basics
- ChatGPT
- Societal Impact
- V2.1 slides, April 2023
- Appendix 6-page slides to explaing BERT and GPT training, and their embedding ouput and its size
- V2 slides, Jan 2023
- Old V1 slides, July 2022
- Draft notes to prepare V2 topics
- 张俊林:通向AGI之路:大型语言模型(LLM)技术精要
- Yao Fu:How does GPT Obtain its Ability? Tracing Emergent Abilities of Language Models to their Sources
- Stephen Wolfram:What Is ChatGPT Doing and Why Does It Work?
- 李宏毅:ChatGPT (可能)是怎麼煉成的 - GPT 社會化的過程
- 李沐:InstructGPT 论文精读【论文精读】
Click to expand
- [1706.03741] Deep reinforcement learning from human preferences
- [1706.03762] Attention Is All You Need
- [1810.04805] BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
- [1904.00962] Large Batch Optimization for Deep Learning: Training BERT in 76 minutes
- [1907.11692] RoBERTa: A Robustly Optimized BERT Pretraining Approach
- [1910.10683] Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
- [2001.08361] Scaling Laws for Neural Language Models
- [2005.14165] Language Models are Few-Shot Learners
- [2103.00823] M6: A Chinese Multimodal Pretrainer
- [2104.08691] The Power of Scale for Parameter-Efficient Prompt Tuning
- [2104.09864] RoFormer: Enhanced Transformer with Rotary Position Embedding
- [2106.04554] A Survey of Transformers
- [2111.06377] Masked Autoencoders Are Scalable Vision Learners
- [2112.12731] ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation
- [2201.08239] LaMDA: Language Models for Dialog Applications
- [2201.11903] Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
- [2203.15556] Training Compute-Optimal Large Language Models
- [2204.05862] Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
- [2205.01068] OPT: Open Pre-trained Transformer Language Models
- [2205.05198] Reducing Activation Recomputation in Large Transformer Models
- [2205.10625] Least-to-Most Prompting Enables Complex Reasoning in Large Language Models
- [2205.11916] Large Language Models are Zero-Shot Reasoners
- [2206.07682] Emergent Abilities of Large Language Models
- [2207.01780] CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning
- [2208.03299] Atlas: Few-shot Learning with Retrieval Augmented Language Models
- [2210.02414] GLM-130B: An Open Bilingual Pre-trained Model
- [D] GPT-3, The $4,600,000 Language Model : r/MachineLearning
- Alibaba Cloud Launches 'ModelScope,' An Open-Source Model-as-a-Service (MaaS) Platform that Comes with Hundreds of Artificial Intelligence (AI) Models - MarkTechPost
- Aligning Language Models to Follow Instructions
- AlphaGo
- Anthropic
- BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension | Facebook AI Research
- Better Language Models and Their Implications
- BigScience
- BlenderBot 3: An AI Chatbot That Improves Through Conversation | Meta
- Building safer dialogue agents
- Chat GPT (可能)是怎麼煉成的 - GPT 社會化的過程
- ChatGPT cheats? Triangle professors grapple with viral AI technology as semester starts
- ChatGPT produces made-up nonexistent references | Hacker News
- ChatGPT, Open AI's Chatbot, Is Spitting Out Biased, Sexist Results - Bloomberg
- ChatGPT: Optimizing Language Models for Dialogue
- Code for CodeT5: a new code-aware pre-trained encoder-decoder model.
- Colossal-AI
- DeepMind’s AlphaCode AI writes code at a competitive level | TechCrunch
- Democratizing access to large-scale language models with OPT-175B
- Don’t Ban ChatGPT in Schools. Teach With It. - The New York Times
- EleutherAI
- EleutherAI/gpt-neox-20b · Hugging Face
- Exploring Transfer Learning with T5: the Text-To-Text Transfer Transformer – Google AI Blog
- galactica research model by Meta
- Generative adversarial network - Wikipedia
- GitHub - f/awesome-chatgpt-prompts: This repo includes ChatGPT prompt curation to use ChatGPT better.
- GitHub - google/BIG-bench: Beyond the Imitation Game collaborative benchmark for measuring and extrapolating the capabilities of language models
- GitHub - huggingface/transformers: 🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
- GitHub - karpathy/nanoGPT: The simplest, fastest repository for training/finetuning medium-sized GPTs.
- GitHub - openai/openai-cookbook: Techniques to improve reliability
- GitHub - PaddlePaddle/PaddleNLP
- GitHub Copilot · Your AI pair programmer
- github: Jianlin Su bojone
- GitHub's AI Coding Assistant Copilot Launches - Voicebot.ai
- GLM-130B: An Open Bilingual Pre-Trained Model
- gluebenchmark Leaderboard
- Google AI Introduces Minerva: A Natural Language Processing (NLP) Model That Solves Mathematical Questions - MarkTechPost
- Google Sidelines Engineer Who Claims Its A.I. Is Sentient - The New York Times
- Google's Massive New Language Model Can Explain Jokes
- Got It AI creates truth checker for ChatGPT 'hallucinations' | VentureBeat
- GPT-3 Powers the Next Generation of Apps
- GSM8K Dataset | Papers With Code
- How come GPT can seem so brilliant one minute and so breathtakingly dumb the next?
- How does GPT Obtain its Ability? Tracing Emergent Abilities of Language Models to their Sources
- HUAWEI Noah's Ark Lab · GitHub
- HuggingFace: Deploy Hugging Face models easily with Amazon SageMaker
- HuggingFace: Fine-tune a pretrained model
- HuggingFace: How to generate text: using different decoding methods for language generation with Transformers
- HuggingFace: Models
- HuggingFace: Pipelines
- HuggingFace: Tokenizer
- HuggingFace: Uploading models
- Improving language models by retrieving from trillions of tokens
- Improving Language Understanding by Generative Pre-Training
- Introducing FLAN: More generalizable Language Models with Instruction Fine-Tuning
- Introducing Pathways: A next-generation AI architecture
- Jonathan Hui: How much do I like ChatGPT?
- LaMDA and the Sentient AI Trap | WIRED
- LaMDA: our breakthrough conversation technology
- Language Model – AI2 Blog
- Large Language Models and Where to Use Them: Part 1
- M6 by Alibaba: MultiModality-to-MultiModality Multitask Mega-transformer
- Microsoft dumping ton of cash into ChatGPT Office infusion | AppleInsider
- Microsoft Set To Integrate ChatGPT With Bing | CDOTrends
- Minerva: Solving Quantitative Reasoning Problems with Language Models – Google AI Blog
- Mosaic LLMs (Part 2): GPT-3 quality for <$500k
- nanoGPT/scaling_laws.ipynb at master
- New and Improved Content Moderation Tooling
- New York City Department of Education Bans ChatGPT
- OpenAI ‘GPT-f’ Delivers SOTA Performance in Automated Mathematical Theorem Proving | Synced
- OpenAI begins piloting ChatGPT Professional, a premium version of its viral chatbot | TechCrunch
- OpenAI Codex
- OpenAI Just Released the AI It Said Was Too Dangerous to Share
- OpenAI Model index for researchers
- OpenAI says its text-generating algorithm GPT-2 is too dangerous to release.
- OpenAI's new ChatGPT bot: 10 dangerous things it's capable of
- Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer – Google Research
- Pathways Language Model (PaLM): Scaling to 540 Billion Parameters for Breakthrough Performance
- PyTorch-NLP
- Ray Distributed Computing - Anyscale
- Research | Stanford HAI
- Researcher Tells AI to Write a Paper About Itself, Then Submits It to Academic Journal
- Salesforce’s CodeRL Achieves SOTA Code Generation Results With Strong Zero-Shot Transfer Capabilities | Synced
- Solving (Some) Formal Math Olympiad Problems
- Stable Diffusion 2-1 - a Hugging Face Space by stabilityai
- SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems
- Techniques for Training Large Neural Networks
- Temporary policy: ChatGPT is banned - Meta Stack Overflow
- TensorFlow Hub
- The Annotated Transformer
- Twitter @goodside as of Jan 2023
- Twitter @goodside as of Jan 2023
- Twitter @Grady_Booch as of Jan 2023
- Twitter @sama as of Jan 2023
- What is GPT-3? Everything your business needs to know about OpenAI’s breakthrough AI language program | ZDNET
- Who Ultimately Owns Content Generated By ChatGPT And Other AI Platforms?
- Why Meta’s latest large language model only survived three days online | MIT Technology Review
- Wu Dao 2.0: China’s Answer To GPT-3. Only Better
- Zhuiyi Technology