-
Notifications
You must be signed in to change notification settings - Fork 206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MASS-summarization how to generate unmasked output? #129
Comments
Can you show me your input for generation? |
Here is what I wrote to run the generation: where |
The model is the pre-trained model or has been fine-tuned by your dataset? If it has been fine-tuned by your dataset, can you show me some part of your dataset or the log for binarizing dataset? In my setting, [UNK] just represent UNK tokens. |
I'm sorry for the misunderstanding. |
According to the log of your preprocessing stage, we can find that nearly 20% of tokens have been replaced by [UNK]. I am sure your dataset is not tokenized to sub-word. You first need to tokenize your dataset as a word-piece level. We provide a script to tokenize dataset into word-pieces. A demo is like:
|
Thank you so much for your advice. I have tokenised the dataset using the code you have suggested. However, it appear to me that the model does not really 'summarise' the text. Output generated by the
Do you have any suggestion about the cause? |
Have you tried to fine-tune model with your data or you just test your results by the pre-trained model? |
I am using Wikihow Dataset to test performance of MASS. The output I get from
fairseq-generate
is :S-5945 [UNK] now ready for [UNK] and placing on [UNK] T-5945 <[UNK]> H-5945 -1.701836347579956 remove the [UNK] from the oven and place it in the [UNK] [UNK] and allow it to cool for a few minutes before it is ready to dry for [UNK] to cool and then [UNK] it on the oven for the [UNK] to cook for about 15 minutes before [UNK] P-5945 -2.3607 -1.3653 -0.8389 -1.6368 -0.2720 -1.8948 -0.3451 -2.1066 -0.8099 -1.2254 -0.6529 -0.5549 -2.3445 -4.6659 -3.0849 -0.8033 -0.0442 -0.7458 -0.7507 -1.7043 -0.4074 -1.3986 -0.4831 -3.4457 -1.3255 -2.6874 -0.1853 -3.0822 -1.1561 -0.5247 -4.2249 -2.3640 -1.9811 -2.1335 -0.6717 -4.4360 -1.9174 -0.8858 -3.1737 -1.4359 -3.6464 -0.5433 -3.6553 -4.3460 -1.1267 -2.0195 -2.6686 -1.1864 -0.7702 -0.5627 -0.1414
Where I believe the
UNK
is the mask. The question is, how do I get an Unmasked output???The text was updated successfully, but these errors were encountered: