GitHub - jchenghu/sharebert: Implementation of the work "ShareBERT: Embeddings Are Capable of Learning Hidden Layers".

ShareBERT: Embeddings Are Capable of Learning Hidden Layers

Implementation of the work "ShareBERT: Embeddings Are Capable of Learning Hidden Layers" [Pdf], presented at the 38th Annual AAAI Conference on Artificial Intelligence (AAAI-38).

Requirements

Software:

python (>= 3.7)
torch
wandb
deepspeed

More details can be found in requirements.txt.

Disk:

We recommend at least 100 GB of free SSD disk memory for the data preparation.

GPU:

Our models were trained on NVIDIA A100 (40GB), in case of lower device memory, we suggest to decrease --train_micro_batch_size_per_gpu.

Data preparation details can be found in./dataset/README.md.

Disclaimer

At the moment the code has been tested on single GPU. Multi-GPU support will be added in the future. Feel free to open an issue if you need it urgently.

Usage

Pre-training

Assuming a single GPU the command for the training of ShareBERT Base:

python run_pretraining.py \
    --local_rank=0 \
    --model_type bert-mlm \
    --tokenizer_name bert-base-uncased \
    --hidden_act gelu \
    --hidden_size 2048 \
    --factor_size 384 \
    --num_hidden_layers 12 \
    --num_attention_heads 16 \
    --intermediate_size 4096 \
    --hidden_dropout_prob 0.1 \
    --attention_probs_dropout_prob 0.1 \
    --encoder_ln_mode pre-ln \
    --lr 1e-3 \
    --train_batch_size 4000 \
    --train_micro_batch_size_per_gpu 250 \
    --lr_schedule step \
    --curve linear \
    --warmup_proportion 0.06 \
    --gradient_clipping 0.5 \
    --optimizer_type adamw \
    --weight_decay 0.01 \
    --adam_beta1 0.9 \
    --adam_beta2 0.98 \
    --adam_eps 1e-6 \
    --max_steps 23000 \
    --dataset_path <hdf5_dir> \
    --output_dir <save_path> \
    --print_steps 100 \
    --num_epochs_between_checkpoints 3000 \
    --job_name pretraining_experiment \
    --project_name sharebert-pretraining \
    --validation_epochs 6 \
    --validation_epochs_begin 1 \
    --validation_epochs_end 1 \
    --validation_begin_proportion 0.05 \
    --validation_end_proportion 0.01 \
    --validation_micro_batch 16 \
    --deepspeed \
    --data_loader_type dist \
    --do_validation \
    --seed 42 \
    --fp16 \
    --fp16_backend ds \
    --layer_norm_type pytorch \
    --total_training_time 3000.0 \
    --early_exit_time_marker 3000.0 &> output.txt &

training will use the samples located in <hdf5_dir> and the final model will be saved in <save_path>. Set factor_size to 128 for ShareBERT Small, 384 in case of ShareBERT Base, and 768 for ShareBERT Large. In the latter case, set also num_hidden_layers to 6.

It might be necessary to set CUDA_VISIBLE_DEVICES=0 python run_pretrainin.py ... in some environments.

Fine-Tuning

Fine-Tuning command:

python run_glue.py \
     --local_rank=0 \
     --model_name_or_path <save_path>/epoch... \
     --task_name <task_name> \
     --max_seq_length 128 \
     --output_dir ./finetuning-out/ \
     --overwrite_output_dir \
     --do_train --do_eval --do_predict \
      --evaluation_strategy steps \
      --per_device_train_batch_size 64 --gradient_accumulation_steps 1 \
      --per_device_eval_batch_size 64 \
      --learning_rate 5e-5 \
      --weight_decay 0.01 \
      --eval_steps 50 --evaluation_strategy steps \
      --max_grad_norm 1.0 \
      --num_train_epochs 5 \
      --fp16 \
      --fp16_backend apex \
      --lr_scheduler_type polynomial \
      --warmup_steps 50 &> fine_output.txt &

where <task_name> can be sst2, mnli, qqp, mrpc, cola, stsb, rte, or qnli

Pretrained models

Pretrained models can be found in the following drive.

Acknowledgments

If you find this repository useful, please consider citing (no obligation):

@inproceedings{hu2024sharebert,
  title={ShareBERT: Embeddings Are Capable of Learning Hidden Layers},
  author={Hu, Jia Cheng and Cavicchioli, Roberto and Berardinelli, Giulia and Capotondi, Alessandro},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={38},
  number={16},
  pages={18225--18233},
  year={2024}
}

Repository is based on academy-budget-bert, we thank the authors for the beautiful code and the sharing of their work.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
assets		assets
bert_asset		bert_asset
dataset		dataset
pretraining		pretraining
.gitignore		.gitignore
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
generate_training_commands.py		generate_training_commands.py
requirements.txt		requirements.txt
run_glue.py		run_glue.py
run_pretraining.py		run_pretraining.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ShareBERT: Embeddings Are Capable of Learning Hidden Layers

Requirements

Disclaimer

Usage

Pre-training

Fine-Tuning

Pretrained models

Acknowledgments

About

Releases

Packages

Languages

License

jchenghu/sharebert

Folders and files

Latest commit

History

Repository files navigation

ShareBERT: Embeddings Are Capable of Learning Hidden Layers

Requirements

Disclaimer

Usage

Pre-training

Fine-Tuning

Pretrained models

Acknowledgments

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages