Skip to content

Fine-tuning-free Shapley value (FreeShap) for instance attribution

Notifications You must be signed in to change notification settings

JTWang2000/FreeShap

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FreeShap: Helpful or Harmful Data? Fine-tuning-free Shapley Attribution for Explaining Language Model Predictions [ICML 2024]

This is the official implementation of the ICML 2024 paper "Helpful or Harmful Data? Fine-tuning-free Shapley Attribution for Explaining Language Model Predictions"

Prepare the conda environment

conda create --name freeshap python=3.10
conda activate freeshap
pip install -r requirements.txt

Quick start: Test Prediction Explanation

A beginner-friendly Jupyter notebook titled vinfo/SST2_explanation.ipynb illustrates the computation of FreeShap and the explanation of test predictions using training examples.

Code structure

  • /configs The example configuration file is put under configs/dshap/sst2/ntk_prompt.yaml yaml files start with ntk in the name are used to specify the hyperparameters for the ntk. yaml files start with finetune in the name are used to specify the hyperparameters for the prompt-based fine-tuning. The code structure is based on public repo cords.
  • /vinfo
    • /dvutils: Data Shapley code. Core file is Data_Shaley.py.
    • /entks: NTK kernel building and regression. The ntk module it built on the public empirical-ntks repo.
      • nlpmodels.py: NTK model classes.
      • ntk.py: NTK kernel building.
      • ntk_regression.py: Kernel regression.
    • dataset.py: construction of dataset classes.
    • probe.py: computation of utility function and prompt-based fine-tuning model classes.

How to run the code

python vinfo/ntk.py --yaml_path={YAML_PATH} --dataset_name={DATASET} --file_path={PATH}

For instance, to run on SST-2 dataset with 5000 points:

python vinfo/ntk.py --yaml_path="configs/dshap/sst2/ntk_prompt.yaml" --dataset_name=sst2 --file_path={PATH}

You may also run ntk_shapley.sh file to run the code.

bash ntk_shapley.sh "sst2" "0" "5000" "True" "{PATH}"

The running result will be stored at {PATH} folder.

We have also provided a kernel in Drive. Can save the kernel in the {PATH} to play with explanations.

BibTeX

@inproceedings{wang2024helpful,
        title={Helpful or Harmful Data? Fine-tuning-free Shapley Attribution for Explaining Language Model Predictions},
        author={Jingtan Wang, Xiaoqiang Lin, Rui Qiao, Chuan-Sheng Foo, Bryan Kian Hsiang Low},
        year={2024},
        booktitle={Proc. ICML}
}

About

Fine-tuning-free Shapley value (FreeShap) for instance attribution

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published