Deep sentence embedding using Sequence to Sequence learning

Installing

Install Torch.
Install the following additional Lua libs:
```
luarocks install nn
luarocks install rnn
luarocks install penlight
```
To train with CUDA install the latest CUDA drivers, toolkit and run:
```
luarocks install cutorch
luarocks install cunn
```
To train with opencl install the lastest Opencl torch lib:
```
luarocks install cltorch
luarocks install clnn
```
Download the Cornell Movie-Dialogs Corpus and extract all the files into data/cornell_movie_dialogs.

Training

th train.lua [-h / options]

Use the --dataset NUMBER option to control the size of the dataset. Training on the full dataset takes about 5h for a single epoch.

The model will be saved to data/model.t7 after each epoch if it has improved (error decreased).

Getting a pretrained model

Download:

The pretraned model.t7
Vocabulary vocab.t7

Put them into the data directory.

Extracting embeddings from sentences

Run the following command

th -i extract_embeddings.lua --model_file data/model.t7 --input_file data/test_sentences.txt --output_file data/embeddings.t7 --cuda

To visualize 2D projections of the embeddings refer to: example.ipynb

Acknowledgments

This implementation utilizes code from Marc-André Cournoyer's repo

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
data		data
images		images
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cornell_movie_dialogs.lua		cornell_movie_dialogs.lua
dataset.lua		dataset.lua
eval.lua		eval.lua
example.ipynb		example.ipynb
extract_embeddings.lua		extract_embeddings.lua
movie_script_parser.lua		movie_script_parser.lua
neuralconvo.lua		neuralconvo.lua
seq2seq.lua		seq2seq.lua
tokenizer.lua		tokenizer.lua
train.lua		train.lua

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deep sentence embedding using Sequence to Sequence learning

Installing

Training

Getting a pretrained model

Extracting embeddings from sentences

Acknowledgments

License

About

Releases

Packages

Languages

License

kostyaev/sentence2vec

Folders and files

Latest commit

History

Repository files navigation

Deep sentence embedding using Sequence to Sequence learning

Installing

Training

Getting a pretrained model

Extracting embeddings from sentences

Acknowledgments

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages