This repository holds all of the code associated with the project considering prompt engineering for large language models. This includes work around reference implementations and demo notebooks.
The static code checker and all implementations run on python3.9
All reference implementations are housed in src/reference_implementations/
. Datasets to be used are placed in resources/datasets
or in the relevant resources subfolder for the implementation. There is also some information about JAX for those curious in learning more about that framework and the implementation of prompt tuning by Google (This is included but not supported in the prompt engineering lab due to issues with their current implementation).
-
As part of your cluster account, you have been allocated a scratch folder where checkpoints, training artifacts, and other files will be stored. It should be located at the path:
/scratch/ssd004/scratch/<cluster_username>
If you don't see this path, please let your facilitator know, and we will ensure that it exists.
-
If you are running any experiments in
prompt_zoo/
, it is best to use an A40 GPU. This can be achieved by following the instructions insrc/reference_implementations/prompt_zoo/README.md
.Note: Using JupyterHub to directly access a GPU is limited to T4V2 GPUs, which are generally insufficient for running
prompt_zoo
experiments. -
We have two pre-constructed environments for running experiments. They are not interchangeable.
-
/ssd003/projects/aieng/public/prompt_zoo
is used to run the experiments in theprompt_zoo
directory only. -
/ssd003/projects/aieng/public/prompt_engineering
is used to run all of the other code in this repository.
-
-
We have provided some exploration guidance in the markdown
Exploration_Guide.md
. This guide provides some suggestions for exploration for each hands-on session based on the concepts covered in preceding lectures.Note: This guide is simply a suggestion. You should feel free to explore whatever is most interesting to you.
Below is a brief description of the contents of each folder in the reference implementations directory. In addition, each directory has at least a few readmes with more in-depth discussions. Finally, many of the notebooks are heavily documented
This repository is organized as follows
Automatic Prompt Tuning Methods are implemented under src/reference_implementations/prompt_zoo/
. Currently supported methods are:
There are also several alternatives to prompt optimization implemented, including full model tuning and partial fine-tuning (classifier layer, input layer).
For more information about using and running the prompt tuning experiments using the T5 language model, please see README.md. The README describes the steps to source the environment and access gpus on Vector's cluster for the experiments around different prompt techniques.
These reference implementations are housed in src/reference_implementations/prompting_vector_llms/
This folder contains notebooks and implementations of prompting large language models hosted on Vector's compute cluster. There are notebooks for demonstrating various prompted downstream tasks, the affects of prompts on tasks like aspect-based sentiment analysis, text classification, summarization, and translation, along with prompt ensembling, activation fine-tuning, and experimenting with whether discrete prompts are transferable across architectures.
These reference implementations reside in src/reference_implementations/fairness_measurement/
This folder contains implementations for measuring fairness for languagle models. There is an implementation that assesses fairness through fine-tuning or prompting to complete a sentiment classification task. We also consider LLM performance on the CrowS-Pairs task and the BBQ task as a means of probing model bias and fairness.
These reference implmenetations are in src/reference_implementations/hugging_face_basics/
.
The reference implementations here are of two kinds. The first is a collection of examples of using HuggingFace for basic ML tasks. The second is a discussion of some important metrics associated with NLP, specifically generative NLP.
In the folder src/reference_implementations/llama_llm/
, we have scripts that facilitate using one of the new large language models known as LLaMA. The language model has been trained for much longer than the traditional LLMs and, while much smaller than OPT-175, can demonstrate equivalent or better performance.
These implementations exist in src/reference_implementations/t5x
and src/reference_implementations/google_prompt_tuning
, respectively. These folders contain scripts for fine-tuning a JAX implementation of T5 and using prompt tuning in JAX for T5 as well. These folders offer a good idea as to how you might use JAX to perform large model training and prompt tuning. However, they are not fully supported by this laboratory because their implementation is currently broken on the Google side of the repositories.
From any of the v-login nodes, run the following. This will reserve an A40 GPU and provide you a terminal to run commands on that node.
srun --gres=gpu:1 -c 8 --mem 16G -p a40 --pty bash
Note that -p a40
requests an a40 gpu. You can also access smaller t4v2
and rtx6000
gpus this way. The -c 8
requests 8 supporting CPUs and --mem 16G
request 16 GB of cpu memory.
As mentioned above, we offer two pre-built environments for running different parts of the code.
-
/ssd003/projects/aieng/public/prompt_zoo
is used to run the experiments in theprompt_zoo
directory only. -
/ssd003/projects/aieng/public/prompt_engineering
is used to run all of the other code in this repository.
Before starting a notebook or running code, you should source one of these two environments with the command:
source /ssd003/projects/aieng/public/prompt_zoo/bin/activate
or
source /ssd003/projects/aieng/public/prompt_engineering/bin/activate
When using the pre-built environments, you are not allowed to pip install to them. If you would like to setup your own environments, see section Installing Custom Dependencies
.
Once an interactive session has been started, we can run a jupyter notebook on the gpu node.
We start the notebook on the example port 8888
: If the port 8888
is taken, try another random port between 1024 and 65000.
Also note the URL output by the command to be used later. (ex. http://127.0.0.1:8888/?token=7ba0ba5c3e9f5668f92518e4c5e723fea8b69aca065b4d57)
jupyter notebook --ip 0.0.0.0 --port 8888
Using a new terminal window from our personal laptop, we need to create an ssh tunnel to that specific port of the gpu node:
Note that gpu001
is the name of the gpu we reserved at the beginnging. Remember that the port needs to be the same as your jupyter notebook port above.
ssh username@v.vectorinstitute.ai -L 8888:gpu001:8888
Keep the new connection alive by starting a tmux session in the new local terminal:
tmux
Now we can access the notebooks using our local browser. Copy the URL given by the jupyter notebook server into your local webbrowser:
(Example Token)
http://127.0.0.1:8888/?token=7ba0ba5c3e9f5668f92518e4c5e723fea8b69aca065b4d57
You should now be able to navigate to the notebooks and run them.
Don't close the local terminal windows in your personal laptop!
Rather than working through hosted Jupyter Notebooks, you can also connect directly to a VS Code instance on the GPU. After the cluster has fulfilled your request for a GPU session, run the following to set up a VSCode Server on the GPU node.
This command downloads and saves VSCode in your home folder on the cluster. You need to do this only once:
cd ~/
curl -Lk 'https://code.visualstudio.com/sha/download?build=stable&os=cli-alpine-x64' --output vscode_cli.tar.gz
tar -xf vscode_cli.tar.gz
rm vscode_cli.tar.gz
Please verify the beginning of the command prompt and make sure that you are running this command from a GPU node (e.g., user@gpu001
) and not the login node (user@v
). After that, you can spin up a tunnel to the GPU node using the following command:
~/code tunnel
You will be prompted to authenticate via Github. On the first run, you might also need to review Microsoft's terms of services. After that, you can access the tunnel through your browser. If you've logged into Github on your VSCode desktop app, you can also connect from there by installing the extension ms-vscode.remote-server
, pressing Shift-Command-P (Shift-Control-P), and entering Remote-Tunnels: Connect to Tunnel
.
Note that you will need to keep the SSH connection running while using the tunnel. After you are done with the work, stop your session by pressing Control-C.
Note: The following instructions are for anyone who would like to create their own python virtual environment to run experiments in prompt_zoo
. If you would just like to run the code you can use one of our pre-built virtual environments by following the instructions in Section Virtual Environments
, above. Instructions for creating environments for experiments outside of prompt_zoo
are contained in the relevant subfolders.
If you wish to install the package in macOS for local development, you should call the following script to install python3.9
on macOS and then setup the virtual env for the module you want to install. This approach only installs the ML libraries (pytorch
, tensorflow
, jax
) for the CPU. If you also want to install the package in the editable mode with all the development requirements, you should use the flag DEV=true
when you run the script, otherwise use the flag DEV=false
.
bash setup.sh OS=mac ENV_NAME=env_name DEV=true
You can call setup.sh
with the OS=vcluster
flag. This installs python in the linux cluster of Vector and installs the ML libraries for the GPU cards.
bash setup.sh OS=vcluster ENV_NAME=env_name DEV=true
The setup.sh
script takes an ENV_NAME argument value of prompt_torch
. The value prompt_torch
should be used for our prompt_zoo
Many of the experiments, especially in prompt_zoo
, in this respository will end up writing to your scratch directory. An example path is:
/scratch/ssd004/scratch/snajafi/
where snajafi
is replaced with your cluster username. This directory has a maximum capacity of 50GB. If you run multiple hyperparameter sweeps, you may fill this directory with model checkpoints. If this directory fills, it may interrupt your jobs or cause them to fail. Please be cognizant of the space and clean up old runs if you begin to fill the directory.
To check your code at commit time
pre-commit install
You can also get pre-commit to fix your code
pre-commit run