PKU-Alignment

All

16 repositories

align-anything
Public
Align Anything: Training All-modality Model with Feedback
chameleon multimodal dpo large-language-models rlhf vision-language-model
Python
•
Apache License 2.0
•120•675•6•1•Updated Jan 22, 2025Jan 22, 2025
aligner
Public
[NeurIPS 2024 Oral] Aligner: Efficient Alignment by Learning to Correct
alignment aligner interpretability aisafety llm rlhf weak-to-strong mecinterp
Python
•8•139•0•0•Updated Jan 16, 2025Jan 16, 2025
.github
Public
0•0•0•0•Updated Jan 16, 2025Jan 16, 2025
ProgressGym
Public
Alignment with a millennium of moral progress. Spotlight@NeurIPS 2024 Track on Datasets and Benchmarks.
Python
•
MIT License
•3•20•0•0•Updated Jan 2, 2025Jan 2, 2025
Aligner2024.github.io
Public
HTML
•1•0•0•0•Updated Oct 31, 2024Oct 31, 2024
omnisafe
Public
JMLR: OmniSafe is an infrastructural framework for accelerating SafeRL research.
benchmark machine-learning reinforcement-learning deep-learning deep-reinforcement-learning constraint-satisfaction-problem pytorch safety-critical saferl safe-reinforcement-learning
Python
•
Apache License 2.0
•119•888•11•3•Updated Oct 15, 2024Oct 15, 2024
safe-sora
Public
SafeSora is a human preference dataset designed to support safety alignment research in the text-to-video generation field, aiming to enhance the helpfulness and harmlessness of Large Vision Models (LVMs).
alignment human-preferences text-to-video-generation large-vision-models
Python
•5•29•1•0•Updated Aug 20, 2024Aug 20, 2024
safe-rlhf
Public
Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback
reinforcement-learning transformers transformer safety llama gpt datasets beaver alpaca ai-safety
Python
•
Apache License 2.0
•118•1.4k•15•0•Updated Jun 13, 2024Jun 13, 2024
llms-resist-alignment
Public
Repo for paper "Language Models Resist Alignment"
alignment llama safe alpaca ai-safety vicuna llm llms rlhf safe-rlhf
Python
•0•6•0•0•Updated Jun 9, 2024Jun 9, 2024
safety-gymnasium
Public
NeurIPS 2023: Safety-Gymnasium: A Unified Safe Reinforcement Learning Benchmark
reinforcement-learning constraint-satisfaction-problem safety-critical safety-critical-systems safe-reinforcement-learning safe-reinforcement-learning-environments constraint-rl safe-policy-optimization
Python
•
Apache License 2.0
•53•416•4•0•Updated May 14, 2024May 14, 2024
ProAgent
Public
ProAgent: Building Proactive Cooperative Agents with Large Language Models
language-model cooperative human-ai overcooked human-ai-interaction cooperative-ai llm-agent
JavaScript
•
MIT License
•8•69•1•0•Updated Apr 8, 2024Apr 8, 2024
SafeDreamer
Public
ICLR 2024: SafeDreamer: Safe Reinforcement Learning with World Models
reinforcement-learning constraint-satisfaction-problem safety-critical-systems safe-reinforcement-learning constraint-rl safe-policy-optimization
Python
•
Apache License 2.0
•5•54•1•0•Updated Apr 8, 2024Apr 8, 2024
Safe-Policy-Optimization
Public
NeurIPS 2023: Safe Policy Optimization: A benchmark repository for safe reinforcement learning algorithms
benchmarks reinforcement-learning-algorithms safe safe-reinforcement-learning constrained-reinforcement-learning
Python
•
Apache License 2.0
•46•336•1•0•Updated Mar 20, 2024Mar 20, 2024
AlignmentSurvey
Public
AI Alignment: A Comprehensive Survey
awesome reinforcement-learning ai deep-learning survey alignment papers interpretability red-teaming large-language-models
0•133•0•0•Updated Nov 2, 2023Nov 2, 2023
beavertails
Public
BeaverTails is a collection of datasets designed to facilitate research on safety alignment in large language models (LLMs).
safety llama gpt datasets language-model beaver ai-safety human-feedback-data llm llms
Makefile
•
Apache License 2.0
•5•118•3•0•Updated Oct 27, 2023Oct 27, 2023
ReDMan
Public
ReDMan is an open-source simulation platform that provides a standardized implementation of safe RL algorithms for Reliable Dexterous Manipulation.
Python
•
Apache License 2.0
•2•16•0•0•Updated May 2, 2023May 2, 2023