This project contains a list of interesting research papers in the field of GenAI.
This repository is dedicated to the aggregation and discussion of groundbreaking research in the field of Generative AI.
Generative AI, or GenAI, refers to the subset of artificial intelligence focused on creating new content, ranging from text and images to code and beyond. The collection of papers included herein spans a variety of topics within GenAI, such as Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), and Transformer-based models.
This compendium serves as a resource for scholars, practitioners, and enthusiasts seeking to advance the state of the art in AI-driven content generation.
The primary goals of this repository are:
- Knowledge Consolidation: To centralize seminal and cutting-edge research papers that define and advance the GenAI field.
- Community Collaboration: To foster a collaborative environment where ideas and findings can be shared, discussed, and critiqued by the Gen AI research community.
- Innovation Promotion: To inspire and guide new research initiatives and practical applications of GenAI technologies.
- Interdisciplinary Integration: To encourage the cross-pollination of ideas from diverse fields such as computer science, cognitive psychology, and digital arts to enrich the GenAI research.
The scope of this repository is to encompass a wide array of research within GenAI, including but not limited to:
- Theoretical foundations of generative models
- Technical advancements in algorithm design
- Applications of GenAI in various domains (e.g., art, healthcare, software development)
- Ethical considerations and societal impacts of GenAI
The GenAI field is situated at the intersection of multiple disciplines. It leverages deep learning, statistical modeling, and computational creativity to generate novel outputs that can mimic or even surpass human-level creativity in certain aspects. With the rapid pace of advancement in AI, it is crucial to maintain a clear and organized overview of the progress in this area, which this repository aims to provide.
📝 Note: Not in a particular order.
Category | Papers | Description |
---|---|---|
Language Models & General AI | 1, 10, 11, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 31, 34, 35, 36, 37, 38, 40, 41, 42, 43, 44, 45, 48, 54, 56, 58, 60, 66, 69, 74, 76, 79, 80, 82, 84, 86, 87, 89, 90, 92, 93, 95, 98, 99, 101, 103, 104 | Papers related to language models, their applications, ethical considerations, and improvements in training or functionality. |
Vision & Language Integration | 3, 4, 29, 30, 33, 64 | Focusing on the integration of visual data with language models, including vision transformers and text-to-image personalization. |
Attention Mechanisms & Transformers | 8, 9, 25, 28, 73 | Discussing the theory of attention in deep learning and optimization of transformer models. |
Music & Creative AI | 5 | A unique paper on music generation using AI. |
High-Resolution Image Synthesis | 6, 7, 63 | Discussing high-resolution image synthesis using diffusion models and vision transformers. |
Efficiency & Scaling in AI | 2, 25, 26, 27, 28, 59, 61, 71, 72, 83, 88, 97 | Covering AI efficiency in terms of memory, inference, and scaling. |
Environmental Impact of AI | 12 | A unique paper focusing on the environmental impact of AI systems. |
Dialog & Interaction-Focused AI | 13, 24, 34, 35, 36, 37, 39, 53, 67, 81, 91 | Involving dialogue applications and platforms for interactive language agents. |
AI Enhancement & Meta-Learning | 27, 31, 32, 37, 46, 47, 49, 55, 57, 62, 65, 68, 70, 75, 78, 96 | On improving AI capabilities through self-improvement, preference optimization, and distillation. |
Miscellaneous AI Applications | 29, 30, 33, 50, 52, 77, 85, 94, 100, 102 | Discussing niche AI applications like commonsense norms and visual instruction tuning. |
- Enhancing CLIP with GPT-4: Harnessing Visual Descriptions as Prompts
- EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention
- Key-Locked Rank One Editing for Text-to-Image Personalization
- ELIXR: Towards a general purpose X-ray artificial intelligence system through alignment of large language models and radiology vision encoders
- Simple and Controllable Music Generation
- High-Resolution Image Synthesis with Latent Diffusion Models
- All are Worth Words: A ViT Backbone for Diffusion Models
- Attention Is All You Need
- A Mathematical View of Attention Models in Deep Learning
- Improving Language Understanding by Generative Pre-Training
- Large Language Models and the Reverse Turing Test
- Estimating the Carbon Footprint of Bloom, a 176b Parameter Language Model
- LaMDA: Language Models for Dialog Applications
- Gorilla: Large Language Model Connected with Massive APIs
- Foundation Models for Decision Making Problems, Methods, and Opportunities
- Continual Pre-training of Language Models
- How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources
- Alpagasus: Training a Better Alpaca with Fewer Data
- Ethical and social risks of harm from Language Models
- Holistic Evaluation of Language Models
- On the Risk of Misinformation Pollution with Large Language Models
- The Capacity for Moral Self-Correction in Large Language Models
- HONEST: Measuring Hurtful Sentence Completion in Language Models
- ReAct: Synergizing Reasoning and Acting in Language Models
- Efficiently Scaling Transformer Inference
- Hungry Hungry Hippos: Towards Language Modeling with State Space Models
- Promptbreeder: Self-Referential Self-Improvement via Prompt Evolution
- Efficient Streaming Language Models with Attention Sinks
- Visual Instruction Tuning
- Improved Baselines with Visual Instruction Tuning
- Direct Preference Optimization: Your Language Model is Secretly a Reward Model
- Distil-Whisper: Robust Knowledge Distilation via Large-Scale Pseudo Labelling
- Reading Books is Great, But Not if You Are Driving! Visually Grounded Reasoning about Defeasible Commonsense Norms
- TeacherLM: Teaching to Fish Rather Than Giving the Fish, Language Modeling Likewise
- Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
- InstructRetro: Instruction Tuning post Retrieval-Augmented Pretraining
- OpenAgents: An Open Platform for Language Agents in the Wild
- Large Language Models Understand and Can be Enhanced by Emotional Stimuli
- Communicative Agents for Software Development
- Large Language Models Are Human-Level Prompt Engineers
- Least-to-Most Prompting Enables Complex Reasoning in Large Language Models
- Self-Consistency Improves Chain of Thought Reasoning in Language Models
- Language Models can be Logical Solvers
- Lost in the Middle: How Language Models Use Long Contexts
- Contrastive Chain-of-Thought Prompting
- RankZephyr: Effective and Robust Zero-Shot Listwise Reranking is a Breeze!
- LLM in a flash: Efficient Large Language Model Inference with Limited Memory
- PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU
- Human Centered Loss Functions (HALOs)
- A Challenger to GPT-4V? Early Explorations of Gemini in Visual Expertise
- Distributed Inference and Fine-tuning of Large Language Models Over The Internet
- GAIA: Zero-shot Talking Avatar Generation
- SLEEPER AGENTS: TRAINING DECEPTIVE LLMS THAT PERSIST THROUGH SAFETY TRAINING
- LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models
- Foundations of Vector Retrieval
- Self-Rewarding Language Models
- BloombergGPT: A Large Language Model for Finance
- Mistral 7B
- Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
- MegaBlocks: Efficient Sparse Training with Mixture-of-Experts
- Mixture-of-Experts Meets Instruction Tuning: A Winning Combination for Large Language Models
- Orca 2: Teaching Small Language Models How to Reason
- ConvNets Match Vision Transformers at Scale
- Emu Video: Factorizing Text-to-Video Generation by Explicit Image Conditioning
- Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling
- Llama 2: Open Foundation and Fine-Tuned Chat Models
- QLoRA: Efficient Finetuning of Quantized LLMs
- RAG vs Fine-tuning: Pipelines, Tradeoffs, and a Case Study on Agriculture
- Training language models to follow instructions with human feedback
- Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text
- Sparse Networks from Scratch: Faster Training without Losing Performance
- ZeRO: Memory Optimizations Toward Training Trillion Parameter Models
- Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
- DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
- MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts
- Code Llama: Open Foundation Models for Code
- LLaMA Pro: Progressive LLaMA with Block Expansion
- Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models
- Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture
- Large Language Model based Multi-Agents: A Survey of Progress and Challenges
- Retrieval-Augmented Generation for Large Language Models: A Survey
- ReAugKD: Retrieval-Augmented Knowledge Distillation For Pre-trained Language Models
- The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
- Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models
- RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval
- Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping
- Datasets for Large Language Models: A Comprehensive Survey
- An LLM Compiler for Parallel Function Calling
- Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
- Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models
- ReWOO: Decoupling Reasoning from Observations for Efficient Augmented Language Models
- StructLM: Towards Building Generalist Models for Structured Knowledge Grounding
- A Critical Evaluation of AI Feedback for Aligning Large Language Models
- Are More LLM Calls All You Need? Towards Scaling Laws of Compound Inference Systems
- Are Emergent Abilities of Large Language Models a Mirage?
- Yi: Open Foundation Models by 01.AI
- ORPO: Monolithic Preference Optimization without Reference Model
- Do Large Language Models Understand Logic or Just Mimick Context?
- Evaluating Large Language Models Trained on Code
- Self-Refine: Iterative Refinement with Self-Feedback
- Reflexion: Language Agents with Verbal Reinforcement Learning
- MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action
- HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face
- AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation
- Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length
- A survey of Generative AI Applications
- MAC-SQL: A Multi-Agent Collaborative Framework for Text-to-SQL
- Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models
- MetaGPT: Meta Programming For A Multi-Agent Collaborative Framework
- Understanding Transformer Reasoning Capabilities via Graph Algorithms
- Banishing LLM Hallucinations Requires Rethinking Generalization
- Your Context Is Not an Array: Unveiling Random Access Limitations in Transformers
- LLM Lies: Hallucinations are not Bugs, but Features as Adversarial Examples
- Memory^3 : Language Modeling with Explicit Memory
- NeuroLogic Decoding: (Un)supervised Neural Text Generation with Predicate Logic Constraints
- LOTUS: Enabling Semantic Queries with LLMs Over Tables of Unstructured and Structured Data
- Text2SQL is Not Enough: Unifying AI and Databases with TAG
- Chain-of-Thought Reasoning Without Prompting
- Chain of Thought Empowers Transformers to Solve Inherently Serial Problems
- Premise Order Matters in Reasoning with Large Language Models
- Teaching Large Language Models to Self-Debug
- SciAgents: Automating scientific discovery through multi-agent intelligent graph reasoning
- Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
- Agentic Retrieval-Augmented Generation for Time Series Analysis
- Reflection-Tuning: Data Recycling Improves LLM Instruction-Tuning
- OLMoE: Open Mixture-of-Experts Language Models
- Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking
- Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents
- Let's Verify Step by Step
- Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning
- V-STaR: Training Verifiers for Self-Taught Reasoners
- Agent Workflow Memory
- Conversational Text-To-SQL: An Odyssey Into State-of-the-art And Challenges Ahead
- Guiding LLMs The Right Way: Fast, Non-Invasive Constrained Generation
- Enhancing Few-shot Text-to-SQL Capabilities of Large Language Models: A Study on Prompt Design Strategies
- Efficient schema-less text-to-SQL conversion using large language models
- SQL-to-Schema Enhances Schema Linking in Text-to-SQL
- CodeS: Towards Building Open-source Language Models for Text-to-SQL
- A Theoretical Understanding of Chain-of-Thought: Coherent Reasoning and Error-Aware Demonstration
- Principled Instructions Are All You Need for Questioning LLaMA-1/2, GPT-3.5/4
- Steering Large Language Models between Code Execution and Textual Reasoning
- Paired Open-Ended Trailblazer (POET): Endlessly Generating Increasingly Complex and Diverse Learning Environments and Their Solutions
Date | Learning |
---|---|