ojaffe

Oliver Jaffe ojaffe

Achievements

openai/mle-bench openai/mle-bench Public

MLE-bench is a benchmark for measuring how well AI agents perform at machine learning engineering

Python 546 60
openai/evals openai/evals Public

Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.

Python 15.2k 2.6k