Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Testing #17

Open
karimarwah opened this issue May 24, 2021 · 9 comments
Open

Testing #17

karimarwah opened this issue May 24, 2021 · 9 comments

Comments

@karimarwah
Copy link

in your code there is no testing yet, but I see the testing data already exists. how to implement it? is it the same as validation
Can you help me for the steps i need to do?

@JRC1995
Copy link
Owner

JRC1995 commented May 24, 2021

is it the same as validation

Yes.

@karimarwah
Copy link
Author

I'm trying to understand the code, in the summary of the results of the training you did, why did the results only 3 words? while the maximum summary is 31.
I tried to use longer data and only 3 words were issued.
Are there any parameters that must be changed other than the maximum summary?
please answer, thanks in advance

@JRC1995
Copy link
Owner

JRC1995 commented Aug 18, 2021 via email

@karimarwah
Copy link
Author

umm, okay For that problem I understand, maybe because the data I use is not so good so the accuracy produced is not as good as the accuracy you produce.
This text summarization model can be used in other language datasets, right? Maybe just need to replace the pre-trained embedding Glove

@JRC1995
Copy link
Owner

JRC1995 commented Aug 19, 2021 via email

@karimarwah
Copy link
Author

Vp = tf.get_variable("Vp", shape=[128,1],
dtype=tf.float32,
trainable=True,
initializer=tf.glorot_uniform_initializer())
Okay, I'm still curious about the number 128 in the code above. What size is that for?

@JRC1995
Copy link
Owner

JRC1995 commented Sep 1, 2021

It's the number of neurons in the layers used to predict the local attention window position. https://arxiv.org/pdf/1508.04025.pdf (eqn. 9) if Wp transforms some vector of dimension d to 128, then Vp transformes the 128 dimension to 1. It's a hyperparameter.

@karimarwah
Copy link
Author

why use the value 128?
and maybe this is my last question, I don't quite understand yet for the value 5 in Gradient Clipping. Can you explain a little?
this is the code:
capped_gvs = [(tf.clip_by_norm(grad,5), var) for grad, var in gvs]

thanks for all the answers

@JRC1995
Copy link
Owner

JRC1995 commented Sep 1, 2021

I don't remember, I probably chose 128 randomly. Ideally, we are supposed to hyperparameter tune it.
Same for 5. I have seen 1 or 5 been used as reasonable values for gradient clipping. I just randomly chose 5.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants