Skip to content

Commit

Permalink
refine readme
Browse files Browse the repository at this point in the history
  • Loading branch information
wshiqi-aws authored Jan 2, 2023
1 parent be73870 commit 58e7707
Showing 1 changed file with 3 additions and 3 deletions.
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ We provide general APIs for running our benchmark in `run_robust.py`. We have fo

Overall we have multiple steps for benchmark as described in detail in the following sections: (1) [perturb] creating perturbed datasets, (2) [exec] run the models on nominal/perturbed datasets, and (3) [report_coarse] collect and summarize running results according to our proposed robustness metrics.

### Create perturbed datasets [perturb]
### Step1: Create perturbed datasets [perturb]

[perturb] option is used to create perturbed datasets. One can run the following commands to perturb based on one's own nominal datasets (path config in `config.json`).

Expand Down Expand Up @@ -84,7 +84,7 @@ To debug and customize perturbations, one can use low-level APIs. Turn on --prin
python perturb.py --method format --aug_method 0 --print_sample
```

### Run on perturbed datasets [exec]
### Step2: Run on perturbed datasets [exec]

[exec] option is used for evaluating targeted models on perturbed datasets. To evaluate models with our benchmark, please config the targeted nominal/perturbed datasets and model path correctly in `config.json`. One can then run with:
```
Expand All @@ -107,7 +107,7 @@ python run_robust.py perturb func_name --datasets humaneval mbpp --models codege
python run_robust.py exec func_name --datasets humaneval mbpp --models codegen-350M-multi codegen-350M-mono # evaluate model on dataset humaneval mbpp on codegen-350M-multi and codegen-350M-mono
```

### Summarize running results [report_coarse]
### Step3: Summarize running results [report_coarse]

In our paper, we proposed three main robustness metrics: robust pass@k, robust drop@k, and robust relative@k. To summarize and collect the evaluated results, one can run the following commands. In specific, `report_coarse` option summarizes the robustness numbers for all thee metrics (as shown in main tables in paper). `report` option summarizes the detailed robustness results into csv (detailed tables in appendix of paper). The results will be saved as tables in `csv_coarse` and `csv`.
```
Expand Down

0 comments on commit 58e7707

Please sign in to comment.