confused with the pooling strategy? #5

rxqy · 2024-06-14T03:00:33Z

Hi, I'm confused with the pooling strategy you used here.

For training, you use the avg token

BeLLM/README.md

Line 52 in 9da9269

--pooling_strategy avg \

While for evaluation, you are not specifing any pooling flag here,

BeLLM/README.md

Lines 99 to 105 in 9da9269

    
           2) evaluate on STS benchmark 
        
           ```bash 
        
           BiLLM_START_INDEX=31 CUDA_VISIBLE_DEVICES=0 python eval_sts.py \ 
        
           --model_name_or_path NousResearch/Llama-2-7b-hf \ 
        
           --lora_name_or_path SeanLee97/bellm-llama-7b-nli \ 
        
           --apply_bfloat16 0 
        
           ```

so this should be default value [cls], right?

BeLLM/eval_sts.py

Line 57 in 9da9269

parser.add_argument("--pooling_strategy", type=str, default='cls')

As for the paper, you mentioned that you used the representative word as the pivot, so this should be the last non-padding token, right? So I'm wondering which token should I use or does it make no difference in a decoder based model like llama?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

confused with the pooling strategy? #5

confused with the pooling strategy? #5

rxqy commented Jun 14, 2024

confused with the pooling strategy? #5

confused with the pooling strategy? #5

Comments

rxqy commented Jun 14, 2024