Testing #17

karimarwah · 2021-05-24T07:13:39Z

in your code there is no testing yet, but I see the testing data already exists. how to implement it? is it the same as validation
Can you help me for the steps i need to do?

JRC1995 · 2021-05-24T07:40:41Z

is it the same as validation

Yes.

karimarwah · 2021-08-18T09:08:06Z

I'm trying to understand the code, in the summary of the results of the training you did, why did the results only 3 words? while the maximum summary is 31.
I tried to use longer data and only 3 words were issued.
Are there any parameters that must be changed other than the maximum summary?
please answer, thanks in advance

JRC1995 · 2021-08-18T09:38:10Z

I don't think you can control how many words are generated directly. The maximum length is juts that: the maximum upperbound. In practice, the model is trained to predict a special token called e d of sequence (eos) token after the summary. The eos token marks the end of the generation. During inference I ignore every token after eos . If you are getting only 3 words it means the model is generating eos after the 3 words. The reason why it is mostly around 3 words could be partly because a lot of training data has similar 3-4 words summary. Otherwise, it may be a model issue, hyperparameters issue, or lack of more training issue. If the data has mostly few short words, and you generally want bigger summaries, then it's probably best to try a different dataset. I think there are also some paper available that tries to do more length controlled generation if you want more control. But yeah, in general you can't do much here. You can change the code where I filter tokens after the eos token to print the whole generated tokens if you want to see the whole generated sequence. I think you will find the code around the section where I am printing the generated text.

…

On Wed, Aug 18, 2021, 4:08 AM Karima Marwazia Shaliha < ***@***.***> wrote: I'm trying to understand the code, in the summary of the results of the training you did, why did the results only 3 words? while the maximum summary is 31. I tried to use longer data and only 3 words were issued. Are there any parameters that must be changed other than the maximum summary? please answer, thanks in advance — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#17 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACYBQNTHE7NIQAMFB6MA76DT5N2ADANCNFSM45MVKHDQ> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email> .

karimarwah · 2021-08-19T09:38:33Z

umm, okay For that problem I understand, maybe because the data I use is not so good so the accuracy produced is not as good as the accuracy you produce.
This text summarization model can be used in other language datasets, right? Maybe just need to replace the pre-trained embedding Glove

JRC1995 · 2021-08-19T09:46:50Z

Yes. You can also try other models from github.

…

On Thu, Aug 19, 2021, 4:38 AM Karima Marwazia Shaliha < ***@***.***> wrote: umm, okay For that problem I understand, maybe because the data I use is not so good so the accuracy produced is not as good as the accuracy you produce. This text summarization model can be used in other language datasets, right? Maybe just need to replace the pre-trained embedding Glove — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#17 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACYBQNXZ2E56TWPPSTAIJC3T5TGKJANCNFSM45MVKHDQ> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email> .

karimarwah · 2021-09-01T02:15:09Z

Vp = tf.get_variable("Vp", shape=[128,1],
dtype=tf.float32,
trainable=True,
initializer=tf.glorot_uniform_initializer())
Okay, I'm still curious about the number 128 in the code above. What size is that for?

JRC1995 · 2021-09-01T02:19:26Z

It's the number of neurons in the layers used to predict the local attention window position. https://arxiv.org/pdf/1508.04025.pdf (eqn. 9) if Wp transforms some vector of dimension d to 128, then Vp transformes the 128 dimension to 1. It's a hyperparameter.

karimarwah · 2021-09-01T02:31:18Z

why use the value 128?
and maybe this is my last question, I don't quite understand yet for the value 5 in Gradient Clipping. Can you explain a little?
this is the code:
capped_gvs = [(tf.clip_by_norm(grad,5), var) for grad, var in gvs]

thanks for all the answers

JRC1995 · 2021-09-01T02:33:52Z

I don't remember, I probably chose 128 randomly. Ideally, we are supposed to hyperparameter tune it.
Same for 5. I have seen 1 or 5 been used as reasonable values for gradient clipping. I just randomly chose 5.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Testing #17

Testing #17

karimarwah commented May 24, 2021

JRC1995 commented May 24, 2021

karimarwah commented Aug 18, 2021

JRC1995 commented Aug 18, 2021 via email

karimarwah commented Aug 19, 2021

JRC1995 commented Aug 19, 2021 via email

karimarwah commented Sep 1, 2021

JRC1995 commented Sep 1, 2021 •

edited

Loading

karimarwah commented Sep 1, 2021

JRC1995 commented Sep 1, 2021

Testing #17

Testing #17

Comments

karimarwah commented May 24, 2021

JRC1995 commented May 24, 2021

karimarwah commented Aug 18, 2021

JRC1995 commented Aug 18, 2021 via email

karimarwah commented Aug 19, 2021

JRC1995 commented Aug 19, 2021 via email

karimarwah commented Sep 1, 2021

JRC1995 commented Sep 1, 2021 • edited Loading

karimarwah commented Sep 1, 2021

JRC1995 commented Sep 1, 2021

JRC1995 commented Sep 1, 2021 •

edited

Loading