Property | Data |
---|---|
Created | 2023-07-02 |
Updated | 2023-09-22 |
Author | @Aiden |
Tags | #spike |
Research Area | Contribution/Proposal | Contributor/Team |
---|---|---|
Efficiency | LoRa | Edward J. Hu |
LOMO | Kai Lv | |
SmoothQuant | Guangxuan Xiao | |
QLoRa | Tim Dettmers | |
Large Model Capabilities | HumanEval dataset | Mark Chen |
Causal reasoning ability study | Zhijing Jin | |
Multi-step reasoning analysis | Haoran Wu | |
Emergent abilities exploration | Jason Wei | |
Doubts on emergent abilities | Rylan Schaeffer | |
Input length expansion | Aydar Bulatov | |
Multimodal Learning | LLaMA-Adapter | Renrui Zhang |
LLaVa | Haotian Liu |
This year (2023) marks an important milestone in the field of deep learning generative models, especially with the introduction of Stable Diffusion and ChatGPT, the superiority of these two models has fully demonstrated the enormous application potential of generative models in computer vision (CV) and natural language processing (NLP). The field of generative model research is flourishing like spring bamboo after the rain, especially as Meta company recently decided to open-source its large language model, LLaMA.
The research trends in deep generative models have covered many major research directions. However, despite the excellent performance of these large models, their consumption of computational and memory resources remains very significant. Therefore, many researchers focus on improving the efficiency of model resource utilization. For example, Edward J. Hu and his team proposed Low-Rank Adaptation (LoRa), which drastically reduces the number of parameters required for fine-tuning downstream tasks by fixing the weights of the pre-trained model and injecting a trainable rank decomposition matrix into each layer of the Transformer architecture. In addition, Kai Lv and his team proposed a new optimizer, LOw-Memory Optimization (LOMO), to further reduce the memory usage of large language models. Meanwhile, quantization techniques are considered an effective solution as they can reduce memory usage and speed up model inference computation. However, existing quantization methods often struggle to balance accuracy and hardware performance. Guangxuan Xiao and his team proposed SmoothQuant, which smooths the outliers of the activation function to speed up computation and significantly reduce memory usage through mathematical transformations, moving the difficulty of quantization from activation to weights. Tim Dettmers and his team proposed QLoRa, which backpropagates gradients to the frozen 4-bit quantized pre-trained language model and injects it into LoRa, achieving fine-tuning of large language models with up to 65B parameters on a single 48GB GPU. These innovative methods allow even researchers with limited resources to participate in the training and research of large generative models.
With the huge potential of ChatGPT in General Artificial Intelligence (AGI), many studies have begun to focus on the capabilities of large language models like GPT and LLaMA in real tasks. For example, Mark Chen and his team investigated whether it is possible to generate functionally correct code in large language models and released the HumanEval dataset to measure the effectiveness of the code. Zhijing Jin and her team explored whether LLM truly has causal reasoning ability and proposed a new benchmark dataset to test the model's inference skills. Haoran Wu and his team analyzed the performance of large language models in solving tasks that require multi-step reasoning and the model's code generation ability, emphasizing that the performance of Transformer-based models will rapidly decline with increasing task complexity. Researchers like Jason Wei are also exploring the phenomenon of emergent abilities in large language models, while Rylan Schaeffer and his team have raised doubts about this and conducted verification. They proposed a mathematical model and compared different evaluation metrics, trying to explain whether emergent abilities are just the result of evaluation metric selection. Moreover, researchers like Aydar Bulatov are trying to solve the input length limitation problem of language models. They successfully expanded the context length of models like BERT to two million words, while maintaining high memory retrieval accuracy. This method not only enhances the model's ability to handle long-term dependencies but also achieves large-scale context processing for memory-intensive applications.
Going further, researchers have also started to explore improving the multimodal learning ability of generative models to expand their application in handling images, speech, and text data. Renrui Zhang and his team proposed the LLaMA-Adapter, which can efficiently fine-tune without updating the LLaMA model parameters. On the contrary, Haotian Liu and his team proposed LLaVa, which combines the Vision Encoder with the LLaMA-Based LLM and fine-tunes it on a multimodal language image dataset generated using GPT-4 (text-only). Both methods have effectively established multimodal general language models for visual and language understanding.
The advancements in these researches not only open a new phase for generative AI, but also have a profound impact on the fields of natural language processing and computer vision. They also generate many research topics. From these important milestones, we can foresee that future research will continue to delve deeper into efficiency, usability, and modality diversity.
- Rombach, Robin, et al. "High-resolution image synthesis with latent diffusion models." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022.
- Liu, Haotian, et al. "Visual instruction tuning." arXiv preprint arXiv:2304.08485 (2023).
- Touvron, Hugo, et al. "Llama: Open and efficient foundation language models." arXiv preprint arXiv:2302.13971 (2023).
- Hu, Edward J., et al. "Lora: Low-rank adaptation of large language models." arXiv preprint arXiv:2106.09685 (2021).
- Lv, Kai, et al. "Full Parameter Fine-tuning for Large Language Models with Limited Resources." arXiv preprint arXiv:2306.09782 (2023).
- Xiao, Guangxuan, et al. "Smoothquant: Accurate and efficient post-training quantization for large language models." arXiv preprint arXiv:2211.10438 (2022).
- Dettmers, Tim, et al. "Qlora: Efficient finetuning of quantized llms." arXiv preprint arXiv:2305.14314 (2023).
- Chen, Mark, et al. "Evaluating large language models trained on code." arXiv preprint arXiv:2107.03374 (2021).
- Jin, Zhijing, et al. "Can Large Language Models Infer Causation from Correlation?." arXiv preprint arXiv:2306.05836 (2023).
- Dziri, Nouha, et al. "Faith and Fate: Limits of Transformers on Compositionality." arXiv preprint arXiv:2305.18654 (2023).
- Bulatov, Aydar, Yuri Kuratov, and Mikhail S. Burtsev. "Scaling Transformer to 1M tokens and beyond with RMT." arXiv preprint arXiv:2304.11062 (2023).
- Wei, Jason, et al. "Emergent abilities of large language models." arXiv preprint arXiv:2206.07682 (2022).
- Schaeffer, Rylan, Brando Miranda, and Sanmi Koyejo. "Are emergent abilities of Large Language Models a mirage?." arXiv preprint arXiv:2304.15004 (2023).
- Zhang, Renrui, et al. "Llama-adapter: Efficient fine-tuning of language models with zero-init attention." arXiv preprint arXiv:2303.16199 (2023).
- Liu, Haotian, et al. "Visual instruction tuning." arXiv preprint arXiv:2304.08485 (2023).