notrichardren

Follow

Richard Ren notrichardren

Follow

10 followers · 20 following

Zürich, CH
https://huggingface.co/notrichardren

Achievements

Achievements

Highlights

Pro

notrichardren/README.md

👋 Hi, I’m Richard Ren. I work on large language models and specialize in evaluations, adversarial robustness, and model transparency.

📫 Email | 🎓 Google Scholar

Pinned Loading

centerforaisafety/safetywashing centerforaisafety/safetywashing Public

Measuring correlations between safety benchmarks and general AI capabilities benchmarks.

Python
representation-engineering representation-engineering Public

Forked from andyzoujm/representation-engineering

Representation Engineering: A Top-Down Approach to AI Transparency

Jupyter Notebook
magikarp01/iti_capstone magikarp01/iti_capstone Public

Analyzing truth representations in LLMs across different kinds of truth and intervening on their hidden states to make LLMs more truthful

Jupyter Notebook 5 1
jam3scampbell/llama-lying jam3scampbell/llama-lying Public

Code for our paper "Localizing Lying in Llama"

Jupyter Notebook 10 2