📢 Open source implementation for ChatGPT replica to build the end-to-end pipeline from SFT to RLHF.
- 🔥 Step 1) SFT: Surpervised Fine-tuning
- 🔥 Step 2) RM: Reward Model
- 🔥 Step 3) PPO: Proximal Policy Optimization
$ pip install nextgpt
or install from the git repo to get always the latest version.
$ git clone https://github.com/louiezzang/next-gpt.git
$ cd next-gpt/
$ pip install .
$ cd ../
See chatGPT example
What is RLHF?
Implementation of RLHF (Reinforcement Learning with Human Feedback) was powered by Colossal-AI. More details can be found in the blog.
The RLHF was forked and modified from these git repos.
- https://github.com/airobotlab/KoChatGPT/tree/main/colossalai_ChatGPT_230319
- https://github.com/hpcaitech/ColossalAI/tree/main/applications/Chat
- https://github.com/airobotlab/KoChatGPT
- https://github.com/airobotlab/KoChatGPT/tree/main/colossalai_ChatGPT_230319
- https://github.com/hpcaitech/ColossalAI/tree/main/applications/Chat
- https://github.com/juncongmoo/chatllama
- https://github.com/huggingface/peft
- https://github.com/dredwardhyde/gpt-neo-fine-tuning-example/blob/main/gpt_neo.py
- https://github.com/databrickslabs/dolly