FewShotFairness

Software Setup

Tested using Python 3.11.0, miniconda 23.1.0, git 2.25.1

Set up conda environment called fairness with

conda env create -f environment.yml

Set environment variables HF_ACCESS_TOKEN to your huggingface API token and OPENAI_API_KEY to your openai API key
Optional: Set TRANSFORMERS_CACHE to your lab's transformer cache especially on HPC environments!

Data Setup

We use 3 datasets.

Bias in Bios
HateXplain
TwitterAAE

Bias in Bios

To setup bias in bios run these commands

wget https://storage.googleapis.com/ai2i/nullspace/biasbios/train.pickle -P path/to/data/folder/
wget https://storage.googleapis.com/ai2i/nullspace/biasbios/dev.pickle -P path/to/data/folder/
wget https://storage.googleapis.com/ai2i/nullspace/biasbios/test.pickle -P path/to/data/folder/

HateExplain

To setup HateExplain run this command

git clone https://github.com/hate-alert/HateXplain.git

Twitter AAE

There are a couple steps to setup twitter aae. We follow the steps found here.

We reproduce them here for your convenience

Download TwitterAAE

wget http://slanglab.cs.umass.edu/TwitterAAE/TwitterAAE-full-v1.zip

Clone demog-text-removal to prepare the data

https://github.com/yanaiela/demog-text-removal.git

Setup environment for demog-text-removal (Requires python 2.7)

conda create -n adv-demog-text python==2.7 anaconda
source activate adv-demog-text
pip install -r requirements.txt

Run make_data.py found in demog-text-removal/src/data with adv-demog-text environment activated

python make_data.py /path/to/downloaded/twitteraae_all /path/to/project/data/processed/sentiment_race sentiment race

Running

We use a toml config (WIP) to run the main function. You can take a look at the one provided to get a feel for how to use it

To run this program, activate fairness conda environment and run

python -m src --config /path/to/config.toml

Tests

We include tests for sanity checking to run these

python -m pytest

WIP LLaMa and Alpaca Models

LLaMa models and Alpaca Models have been erroring out recently but we are still going to experiment with them

To fit them into the API, I have modified the hfoffline.py file to compensate for their weirdness

I have successfully

Integrated them into the api
Loading their models
Loaded them onto gpus using device map (dont change it from balanced_low_0 it good for generation)
Resized the model embeddings to fit the length
Loading their tokenizer
Set their tokenizer padding tokens accordingly
Set the generation parameters

I have not successfully generated anything because of CUDA OOM and CUBLAS not initalized without trying to increase gpus :(

Converting to float16 to fit on gpus
Lowering batch_size
Putting CUDA_LAUNCH_BLOCKING=0

I have not tried

Being greedy with GPUs :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FewShotFairness

Software Setup

Data Setup

Bias in Bios

HateExplain

Twitter AAE

Running

Tests

WIP LLaMa and Alpaca Models

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 105 Commits
config		config
experiments		experiments
src		src
tests		tests
README.md		README.md
config.toml		config.toml
environment.yml		environment.yml

pocaguirre/FewShotFairness

Folders and files

Latest commit

History

Repository files navigation

FewShotFairness

Software Setup

Data Setup

Bias in Bios

HateExplain

Twitter AAE

Running

Tests

WIP LLaMa and Alpaca Models

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages