Skip to content

Commit

Permalink
Release 2.0.3
Browse files Browse the repository at this point in the history
  • Loading branch information
john-b-yang committed Jul 2, 2024
1 parent 8198707 commit 4498af9
Show file tree
Hide file tree
Showing 17 changed files with 33 additions and 32 deletions.
Empty file removed .gitmodules
Empty file.
21 changes: 21 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,27 @@ All notable changes to the PyPI package for SWE-bench ([`swebench`](https://pypi

Prior to version 1.1.0, not all deployed versions are listed, as the PyPI package was going through development and testing. The noteworthy versions and the respective changes that were introduced by that version are included. All versions 1.1.0 onwards are fully listed.

## [2.0.3] - 7/2/2024
* #149 Interface fix: run_id is required
* #151 Fix: Support JSON datasets (avoid loading json twice)
* #152 Add very simple CI
* #153 Various nitpicks
* #155 Fix link to collection tutorial
* #161 Fix path to image in docs
* #162 Fix evaluation hanging issue and improve patch apply
* #164 Fix so it doesn't crash when no env imgs to build
* #166 Fix newline outputs for django's log parser
* #168 Update reporting and skip empty model patch predictions

## [2.0.0] - 6/27/2024
Major release - the SWE-bench evaluation harness has been upgraded to incorporate containerized, sandboxed execution environments based on Docker. There are several chances to the API resulting from this:
* Removal of the `swebench.metrics` module
* Updates to the API of `swebench.harness` functionality
* Significant modifications to underlying evaluation logic
* Minor updates to installation specifications for different repos + versions.

Read the full report [here](https://github.com/princeton-nlp/SWE-bench/tree/main/docs/20240627_docker)

## [1.1.5] - 5/15/2024
* Add support for HumanEvalFix (Python, JS, Go, Java) ([source](https://huggingface.co/datasets/bigcode/humanevalpack))

Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
<p align="center">
<a href="https://github.com/princeton-nlp/Llamao">
<img src="assets/swellama_banner.png" width="50%" alt="Kawi the SWE-Llama" />
<img src="assets/figures/swellama_banner.png" width="50%" alt="Kawi the SWE-Llama" />
</a>
</p>

Expand Down Expand Up @@ -39,7 +39,7 @@ Please refer our [website](http://swe-bench.github.io) for the public leaderboar
SWE-bench is a benchmark for evaluating large language models on real world software issues collected from GitHub.
Given a *codebase* and an *issue*, a language model is tasked with generating a *patch* that resolves the described problem.

<img src="assets/teaser.png">
<img src="assets/figures/teaser.png">

To access SWE-bench, copy and run the following code:
```python
Expand Down
File renamed without changes.
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
4 changes: 2 additions & 2 deletions docs/README_CN.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
<p align="center">
<a href="https://github.com/princeton-nlp/Llamao">
<img src="assets/swellama_banner.png" width="50%" alt="Kawi the SWE-Llama" />
<img src="assets/figures/swellama_banner.png" width="50%" alt="Kawi the SWE-Llama" />
</a>
</p>

Expand Down Expand Up @@ -33,7 +33,7 @@
SWE-bench 是一个用于评估大型语言模型的基准,这些模型是从 GitHub 收集的真实软件问题。
给定一个 *代码库* 和一个 *问题*,语言模型的任务是生成一个 *补丁* 来解决所描述的问题。

<img src="assets/teaser.png">
<img src="assets/figures/teaser.png">

## 🚀 设置
要从源代码构建 SWE-bench,请按照以下步骤操作:
Expand Down
4 changes: 2 additions & 2 deletions docs/README_JP.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
<document_content>
<p align="center">
<a href="https://github.com/princeton-nlp/Llamao">
<img src="https://raw.githubusercontent.com/Sunwood-ai-labs/SWE-bench/main/assets/swellama_banner.png" width="50%" alt="Kawi the SWE-Llama" />
<img src="https://raw.githubusercontent.com/Sunwood-ai-labs/SWE-bench/main/assets/figures/swellama_banner.png" width="50%" alt="Kawi the SWE-Llama" />
</a>
</p>

Expand Down Expand Up @@ -34,7 +34,7 @@ ICLR 2024 の論文 <a href="http://swe-bench.github.io/paper.pdf">SWE-bench: Ca
SWE-bench は、GitHub から収集された実世界のソフトウェアの課題に関する大規模言語モデルを評価するためのベンチマークです。
*コードベース**イシュー*が与えられ、言語モデルは記述された問題を解決する*パッチ*を生成するタスクを行います。

<img src="https://raw.githubusercontent.com/Sunwood-ai-labs/SWE-bench/main/assets/teaser.png">
<img src="https://raw.githubusercontent.com/Sunwood-ai-labs/SWE-bench/main/assets/figures/teaser.png">

## 🚀 セットアップ
SWE-bench をソースからビルドするには、以下の手順に従ってください:
Expand Down
4 changes: 2 additions & 2 deletions docs/README_TW.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
<p align="center">
<a href="https://github.com/princeton-nlp/Llamao">
<img src="assets/swellama_banner.png" width="50%" alt="Kawi the SWE-Llama" />
<img src="assets/figures/swellama_banner.png" width="50%" alt="Kawi the SWE-Llama" />
</a>
</p>

Expand Down Expand Up @@ -33,7 +33,7 @@
SWE-bench 是一個用於評估大型語言模型的基準,這些模型是從 GitHub 收集的真實軟體問題。
給定一個 *代碼庫* 和一個 *問題*,語言模型的任務是生成一個 *修補程式* 來解決所描述的問題。

<img src="assets/teaser.png">
<img src="assets/figures/teaser.png">

## 🚀 設置
要從源代碼構建 SWE-bench,請按照以下步驟操作:
Expand Down
20 changes: 0 additions & 20 deletions environment.yml

This file was deleted.

2 changes: 1 addition & 1 deletion swebench/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
__version__ = "2.0.2"
__version__ = "2.0.3"

from swebench.collect.build_dataset import main as build_dataset
from swebench.collect.get_tasks_pipeline import main as get_tasks_pipeline
Expand Down
2 changes: 1 addition & 1 deletion swebench/collect/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ We include a comprehensive [tutorial](https://github.com/princeton-nlp/SWE-bench

> SWE-bench's collection pipeline is currently designed to target PyPI packages. We hope to expand SWE-bench to more repositories and languages in the future.
<img src="../../assets/collection.png">
<img src="../../assets/figures/collection.png">

## Collection Procedure
To run collection on your own repositories, run the `run_get_tasks_pipeline.sh` script. Given a repository or list of repositories (formatted as `owner/name`), for each repository this command will generate...
Expand Down
2 changes: 1 addition & 1 deletion swebench/collect/collection.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ In this tutorial, we explain how to use the SWE-Bench repository to collect eval
> SWE-bench's collection pipeline is currently designed to target PyPI packages. We hope to expand SWE-bench to more repositories and languages in the future.
<div align="center">
<img style="width:70%" src="../assets/collection.png">
<img style="width:70%" src="../assets/figures/collection.png">
</div>

## 🔍 Selecting a Repository
Expand Down
2 changes: 1 addition & 1 deletion swebench/harness/evaluation.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ python run_evaluation.py \
Additional arguments are defined in `run_evaluation.py`. The following diagram captures, at a high level, what `run_evaluation.py` does. More details are provided in `harness/` and the Appendix of the main paper.

<div align="center">
<img style="width:70%" src="../../assets/evaluation.png">
<img style="width:70%" src="../../assets/figures/evaluation.png">
</div>

## 📈 Metrics
Expand Down

0 comments on commit 4498af9

Please sign in to comment.