-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Finding the synset of a word #1
Comments
Hello, You are absolutely right, in WordNet a word can belong to several senses, as the meaning of the word can depend on the sense (context) it's inserted in. In this method, to answer your question, all senses and therefor synsets that the word is found in are taken into account to calculate the final embeddings, so you cannot pinpoint any single synset as all of them are chosen! When building the adjacency matrix, which holds the relation weight between words, the chosen words are iterated over, and all the synset relations in which each word is inserted in (the previously mentioned senses) are accounted for to calculate the total relation weigh between words. Let me know if you still have questions regarding this! |
Thanks for the feedback. So in the final list of calculated embeddings, each word has only one embedding. Does it mean that all relations in all synsets that word is a part of, affect that embedding? I'm curious here because, the meanings between different synsets are different from each other, even for the same word. And that cannot be considered a relation. Please correct me if I'm wrong. I would like to clarify this. Thanks! |
Hi, Sorry for the delay in the response. You are right in assuming that all relations in all synsets the word is a part of affect the embedding, but that does not invalidate it as a relation. If you take a look at other embedding methods that work on text corpora, the context in which words appear in will change the embedding accordingly, and that's a good thing as it captures the semantic meaning of the word across different contexts. That's precisely what happens here and why using wordnet might be even better in this case, since it's a semantic lexicon built with linguistic precision, when you take into account relations you are connecting words across different senses. This is important for the next stages of the algorithm, where you perform a walk across the whole graph. After the walk, you do not have the original adjacency matrix, you have a matrix resulting from a walk over the graph (infinite at that, due to the equation described in the article), thus you have a matrix composed of weights that come from the path that you can do from any word to every other word. The important concept here is: the more paths you have connecting two words, and the closer the paths are, the more semantic affine they are. Does that make sense? |
Could you help me to load the pre-trained model ( wn2vec.txt ) by python iwould like to use it like word2vec . |
This is actually an interesting point. Currently I am looking how to deal with word-sense disambiguation. So let's say for a word like 'bank', which can have multiple meanings, is considered by this method as just one word with one embedding that is determined by all relations with corresponding synsests? If true, I think we are then introducing unnecessary bias, as in sentence like "the man is walking along the river bank" we only want to consider the embedding of bank as a raised land surrounding the river, and not to include the one of a financial organization as well. |
Hi, I have a question. Why did you choose to use inverse matrix, PMI, L2 and PCA, any previous experiment? any mathematical definition? |
Hi,
Thanks for sharing this work!
I have a question about the 60k words chosen from Wordnet. I see that in Wordnet, one word can belong to several synsets. For example: dog (noun) has synsets annotated as dog.n.01, dog.n.02 ..so on. With regard to this, is there a way to find exactly which synset of a word is chosen to calculate the final embeddings?
Thanks in advance!
The text was updated successfully, but these errors were encountered: