refine readme

amazon-science · Jan 2, 2023 · 58e7707 · 58e7707
1 parent be73870
commit 58e7707
Showing 1 changed file with 3 additions and 3 deletions.
diff --git a/README.md b/README.md
@@ -54,7 +54,7 @@ We provide general APIs for running our benchmark in `run_robust.py`. We have fo
 
 Overall we have multiple steps for benchmark as described in detail in the following sections: (1) [perturb] creating perturbed datasets, (2) [exec] run the models on nominal/perturbed datasets, and (3) [report_coarse] collect and summarize running results according to our proposed robustness metrics.
 
-### Create perturbed datasets [perturb] 
+### Step1: Create perturbed datasets [perturb] 
 
 [perturb] option is used to create perturbed datasets. One can run the following commands to perturb based on one's own nominal datasets (path config in `config.json`). 
 
@@ -84,7 +84,7 @@ To debug and customize perturbations, one can use low-level APIs. Turn on --prin
 python perturb.py --method format --aug_method 0 --print_sample
 ```
 
-### Run on perturbed datasets [exec] 
+### Step2: Run on perturbed datasets [exec] 
 
 [exec] option is used for evaluating targeted models on perturbed datasets. To evaluate models with our benchmark, please config the targeted nominal/perturbed datasets and model path correctly in `config.json`. One can then run with:
 ```
@@ -107,7 +107,7 @@ python run_robust.py perturb func_name --datasets humaneval mbpp --models codege
 python run_robust.py exec func_name --datasets humaneval mbpp --models codegen-350M-multi codegen-350M-mono # evaluate model on dataset humaneval mbpp on codegen-350M-multi and codegen-350M-mono
 ```
 
-### Summarize running results [report_coarse]
+### Step3: Summarize running results [report_coarse]
 
 In our paper, we proposed three main robustness metrics: robust pass@k, robust drop@k, and robust relative@k. To summarize and collect the evaluated results, one can run the following commands. In specific, `report_coarse` option summarizes the robustness numbers for all thee metrics (as shown in main tables in paper). `report` option summarizes the detailed robustness results into csv (detailed tables in appendix of paper). The results will be saved as tables in `csv_coarse` and `csv`.
 ```