Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Dy2stat] Add word2vec as unittest #24944

Merged
merged 2 commits into from
Jun 9, 2020

Conversation

zhhsplendid
Copy link
Member

@zhhsplendid zhhsplendid commented Jun 5, 2020

PR types

Others

PR changes

Others

Describe

Add word2vec as ProgramTranslator unit test。
The word2vec model is referred from https://github.com/PaddlePaddle/models/blob/856c428fe0f1f451f8a22acee50c2e3af33c834e/dygraph/word2vec/word2vec.py#L238

@paddle-bot-old
Copy link

paddle-bot-old bot commented Jun 5, 2020

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@paddle-bot-old
Copy link

paddle-bot-old bot commented Jun 5, 2020

❌This PR is not created using PR's template. You can refer to this Demo.
Please use PR's template, it helps save our maintainers' time so that more developers get helped.

Copy link
Contributor

@Aurelius84 Aurelius84 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for the rest.

indices = indices[np.argsort(-flat[indices])]
for i in indices: # Remove the input words
print('for word %s, the similar word is %s' %
(query_token, str(id2word_dict[i])))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that get_similar_tokens is not used in the unittest.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is copied code so I didn't care much about this. The get_similar_tokens is used in the original code but it now cannot run because of API compatibility issue. I will remove it in this unittest.



#for _, batch in zip(range(10), build_batch(dataset, 128, 3)):
# print(batch)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same as above

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code is at the original code. I commented it so that we don't print much info. I will remove line 208-209.


for epoch in range(epoch_num):

#random.shuffle(dataset)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this commented code necessary?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the commented code is in original code. I commented because the random.shuffle can cause the input to dygraph/static be different thus our check can fail.

@zhhsplendid zhhsplendid merged commit 97708ad into PaddlePaddle:develop Jun 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants