GitHub - nguyenkh/HyperVec: Hierarchical Embeddings for Hypernymy Detection and Directionality

HyperVec

Hierarchical Embeddings for Hypernymy Detection and Directionality

Prerequisite

spaCy: for parsing, version 2.0.11
a corpus such as wikipedia corpus (plain-text)

Preprocess

Create the feature files:

python create_features.py -input corpus-file.txt -output output-file-name -pos pos_tag

in which: pos_tag is either NN (for the noun features) or VB (for the verb features)

Configuration

See the config.cfg to set agruments for model.

Training embeddings

java -jar HyperVec.jar config.cfg vector-size window-size

For example, training embeddings with 100 dimensions; window-size = 5:

java -jar HyperVec.jar config.cfg 100 5

Pretrained (hypervec) embeddings

The embeddings used in our paper can be downloaded by using the script in get-pretrainedHyperVecEmbeddings/download_embeddings.sh. Note that the script downloads 9 files and concatenates them again to a single file (hypervec.txt.gz). The format is the default word2vec format: first line with header information, other lines word followed by whitespace seperated vector.

Information about the embeddings: creatd using the ENCOW14A corpus (14.5bn token), 100 dimensions, sym. window of 5, 15 negative samples, 0.025 learning rate, threshhold set to 0.05. The resulting vocabulary contains about 2.7m words.

Example usage: Evaluation BLESS,BIBLESS and AWBLESS

To reproduce our experiments from Table 3 use the code in the datasets_classification/, assuming your vector file is located in the same folder and named hypervec.txt.gz. java -jar eval-dir.jar hypervec.txt.gz (Evaluate directionality on BLESS.txt using hyperscore) java -jar eval-bless.jar hypervec.txt.gz 2 1000 (Evaluate classification on BIBLESS.txt, AWBLESS.txt using 2% of the training data and 1000 random iterations)

Citation info

If you use the code or the created feature norms, please cite our paper (Bibtex), the paper can be found here: PDF, the poster from EMNLP can be found here: Poster

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.metadata		.metadata
.settings		.settings
code_mapping_across_languages		code_mapping_across_languages
datasets_across_languages		datasets_across_languages
datasets_classification		datasets_classification
evaluation_scripts		evaluation_scripts
get-pretrainedHyperVecEmbeddings		get-pretrainedHyperVecEmbeddings
hypernymy_resources		hypernymy_resources
src		src
.classpath		.classpath
.gitignore		.gitignore
.project		.project
.pydevproject		.pydevproject
HyperVec.jar		HyperVec.jar
README.md		README.md
config.cfg		config.cfg
create_features.py		create_features.py
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HyperVec

Prerequisite

Preprocess

Configuration

Training embeddings

Pretrained (hypervec) embeddings

Example usage: Evaluation BLESS,BIBLESS and AWBLESS

Citation info

About

Releases

Packages

Contributors 3

Languages

nguyenkh/HyperVec

Folders and files

Latest commit

History

Repository files navigation

HyperVec

Prerequisite

Preprocess

Configuration

Training embeddings

Pretrained (hypervec) embeddings

Example usage: Evaluation BLESS,BIBLESS and AWBLESS

Citation info

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages