You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As for the paper, you mentioned that you used the representative word as the pivot, so this should be the last non-padding token, right? So I'm wondering which token should I use or does it make no difference in a decoder based model like llama?
The text was updated successfully, but these errors were encountered:
Hi, I'm confused with the pooling strategy you used here.
For training, you use the avg token
BeLLM/README.md
Line 52 in 9da9269
While for evaluation, you are not specifing any pooling flag here,
BeLLM/README.md
Lines 99 to 105 in 9da9269
so this should be default value [cls], right?
BeLLM/eval_sts.py
Line 57 in 9da9269
As for the paper, you mentioned that you used the representative word as the pivot, so this should be the last non-padding token, right? So I'm wondering which token should I use or does it make no difference in a decoder based model like llama?
The text was updated successfully, but these errors were encountered: