Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finding the synset of a word #1

Open
miranthajayatilake opened this issue Jun 26, 2019 · 6 comments
Open

Finding the synset of a word #1

miranthajayatilake opened this issue Jun 26, 2019 · 6 comments
Labels
question Further information is requested

Comments

@miranthajayatilake
Copy link

Hi,

Thanks for sharing this work!

I have a question about the 60k words chosen from Wordnet. I see that in Wordnet, one word can belong to several synsets. For example: dog (noun) has synsets annotated as dog.n.01, dog.n.02 ..so on. With regard to this, is there a way to find exactly which synset of a word is chosen to calculate the final embeddings?

Thanks in advance!

@RubenBranco RubenBranco added the question Further information is requested label Jun 27, 2019
@RubenBranco
Copy link
Collaborator

Hello,

You are absolutely right, in WordNet a word can belong to several senses, as the meaning of the word can depend on the sense (context) it's inserted in.

In this method, to answer your question, all senses and therefor synsets that the word is found in are taken into account to calculate the final embeddings, so you cannot pinpoint any single synset as all of them are chosen!

When building the adjacency matrix, which holds the relation weight between words, the chosen words are iterated over, and all the synset relations in which each word is inserted in (the previously mentioned senses) are accounted for to calculate the total relation weigh between words.

Let me know if you still have questions regarding this!

@miranthajayatilake
Copy link
Author

Thanks for the feedback.

So in the final list of calculated embeddings, each word has only one embedding. Does it mean that all relations in all synsets that word is a part of, affect that embedding?

I'm curious here because, the meanings between different synsets are different from each other, even for the same word. And that cannot be considered a relation. Please correct me if I'm wrong. I would like to clarify this.

Thanks!

@RubenBranco
Copy link
Collaborator

Hi,

Sorry for the delay in the response.

You are right in assuming that all relations in all synsets the word is a part of affect the embedding, but that does not invalidate it as a relation. If you take a look at other embedding methods that work on text corpora, the context in which words appear in will change the embedding accordingly, and that's a good thing as it captures the semantic meaning of the word across different contexts.

That's precisely what happens here and why using wordnet might be even better in this case, since it's a semantic lexicon built with linguistic precision, when you take into account relations you are connecting words across different senses. This is important for the next stages of the algorithm, where you perform a walk across the whole graph.

After the walk, you do not have the original adjacency matrix, you have a matrix resulting from a walk over the graph (infinite at that, due to the equation described in the article), thus you have a matrix composed of weights that come from the path that you can do from any word to every other word.

The important concept here is: the more paths you have connecting two words, and the closer the paths are, the more semantic affine they are.

Does that make sense?

@fatmalearningphd
Copy link

Could you help me to load the pre-trained model ( wn2vec.txt ) by python iwould like to use it like word2vec .
thanks

@mbarbouch
Copy link

mbarbouch commented Nov 2, 2020

This is actually an interesting point. Currently I am looking how to deal with word-sense disambiguation. So let's say for a word like 'bank', which can have multiple meanings, is considered by this method as just one word with one embedding that is determined by all relations with corresponding synsests?

If true, I think we are then introducing unnecessary bias, as in sentence like "the man is walking along the river bank" we only want to consider the embedding of bank as a raised land surrounding the river, and not to include the one of a financial organization as well.

@RubenBranco, @miranthajayatilake

@Alezas
Copy link

Alezas commented Jun 26, 2022

Hi, I have a question. Why did you choose to use inverse matrix, PMI, L2 and PCA, any previous experiment? any mathematical definition?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants