-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Testing #17
Comments
Yes. |
I'm trying to understand the code, in the summary of the results of the training you did, why did the results only 3 words? while the maximum summary is 31. |
I don't think you can control how many words are generated directly. The
maximum length is juts that: the maximum upperbound. In practice, the model
is trained to predict a special token called e d of sequence (eos) token
after the summary. The eos token marks the end of the generation. During
inference I ignore every token after eos . If you are getting only 3 words
it means the model is generating eos after the 3 words.
The reason why it is mostly around 3 words could be partly because a lot
of training data has similar 3-4 words summary. Otherwise, it may be a
model issue, hyperparameters issue, or lack of more training issue. If the
data has mostly few short words, and you generally want bigger summaries,
then it's probably best to try a different dataset. I think there are also
some paper available that tries to do more length controlled generation if
you want more control. But yeah, in general you can't do much here.
You can change the code where I filter tokens after the eos token to print
the whole generated tokens if you want to see the whole generated sequence.
I think you will find the code around the section where I am printing the
generated text.
…On Wed, Aug 18, 2021, 4:08 AM Karima Marwazia Shaliha < ***@***.***> wrote:
I'm trying to understand the code, in the summary of the results of the
training you did, why did the results only 3 words? while the maximum
summary is 31.
I tried to use longer data and only 3 words were issued.
Are there any parameters that must be changed other than the maximum
summary?
please answer, thanks in advance
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#17 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACYBQNTHE7NIQAMFB6MA76DT5N2ADANCNFSM45MVKHDQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email>
.
|
umm, okay For that problem I understand, maybe because the data I use is not so good so the accuracy produced is not as good as the accuracy you produce. |
Yes. You can also try other models from github.
…On Thu, Aug 19, 2021, 4:38 AM Karima Marwazia Shaliha < ***@***.***> wrote:
umm, okay For that problem I understand, maybe because the data I use is
not so good so the accuracy produced is not as good as the accuracy you
produce.
This text summarization model can be used in other language datasets,
right? Maybe just need to replace the pre-trained embedding Glove
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#17 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACYBQNXZ2E56TWPPSTAIJC3T5TGKJANCNFSM45MVKHDQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email>
.
|
Vp = tf.get_variable("Vp", shape=[128,1], |
It's the number of neurons in the layers used to predict the local attention window position. https://arxiv.org/pdf/1508.04025.pdf (eqn. 9) if Wp transforms some vector of dimension d to 128, then Vp transformes the 128 dimension to 1. It's a hyperparameter. |
why use the value 128? thanks for all the answers |
I don't remember, I probably chose 128 randomly. Ideally, we are supposed to hyperparameter tune it. |
in your code there is no testing yet, but I see the testing data already exists. how to implement it? is it the same as validation
Can you help me for the steps i need to do?
The text was updated successfully, but these errors were encountered: