Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback
reinforcement-learning transformers transformer safety llama gpt datasets beaver alpaca ai-safety safe-reinforcement-learning vicuna deepspeed large-language-models llm llms rlhf reinforcement-learning-from-human-feedback safe-rlhf safe-reinforcement-learning-from-human-feedback
-
Updated
Jun 13, 2024 - Python