Skip to content

Latest commit

 

History

History
36 lines (27 loc) · 1.98 KB

README.md

File metadata and controls

36 lines (27 loc) · 1.98 KB

Models for Nature Language Inference (NLI)

We are trying to reproduce some classical models in literal papers for Nature Language Inferece, and report performance on the Stanford Natural Language Inference data set (SNLI).

Models

Environments

  • TensorFlow 1.3 or higher
  • Python 3.5
  • Numpy
  • Sklearn

Data preparation

nliutils.py can be used for data preparation.

  • build_vocab(): Build vocabulary according the training data.
  • load_vocab(): Load vocabulary from file.
  • convert_data(): Convert NLI data from 'JSON' format to the following 'TXT' format: gold_label ||| sentence1 ||| sentence2.
  • process_file(): Prepare data for model, including converting words into indexes according the vocabulary, padding sentences into fix length, creating the corresponding mask arrays, and loading the classification labels of data into a 1-D array.
  • batch_iter(): Generate a batch of data.
  • convert_embeddings(): Convert embeddings from TXT (one word embedding per line) to a easy-to-use format in Python, which consists of a 2-d numpy array for embeddings and a dictionary for vocabulary.
  • pre-trained word embeddings: You can download pre-trained word embeddings from GloVe, and use convert_embeddings() to get needed format in the code.

Hyper-parameters

  • decompose:

Train model: python3 decompose/train.py --embeddings ../../res/embeddings/glove.840B.300d.we --train_em 0 -op adagrad -lr 0.05 --require_improvement 50000000 --vocab ../cdata/snli/vocab.txt -ep 300 --normalize 1 -l2 0.0 -bs 4 --report 16000 --save_per_batch 16000 -cl 100

Test model: python3 decompose/test.py -m modelfile -d testdata

Results

Model Acc reported in papers Our Acc
decompose 86.3% 86.28%