This repository contains the Torch implementation of our ECIR 2017 work.
Download the user profile attribute dataset from here
Download the Glove word vectors trained on a super-large twitter corpus.
To train our model,
th main.lua
- Torch
- xlua
- tds
- optim
- nnx
- cutorch
- cunn
- cunnx
Packages (b) to (h) can be installed using:
luarocks install <package-name>
data_dir
: directory for accessing the user profile prediction data for an attribute (spouse or education or job) [data/spouse/]glove_dir
: directory for accesssing the pre-trained glove word embeddings [data/]pred_dir
: directory for storing the output (i.e., word, tweet and user embeddings) [predictions/]to_lower
: should we change the case of word to lower case [1=yes (default), 0=no]wdim
: dimensionality of word embeddings [200]wwin
: size of the context window for word context model. add 1 for target word. [21]twin
: size of the context window for tweet context model. add 1 for target tweet. [21]min_freq
: words that occur less than times will not be taken for training [5]pad_tweet
: should we need to pad the tweet ? [1=yes (default), 0=no]is_word_center_target
: should we model the center word as target. if marked 0, the last word will be considered as target. [0]is_tweet_center_target
: should we model the center tweet as target. if marked 0, the last tweet will be considered as target. [1]pre_train
: should we initialize word embeddings with pre-trained vectors? [1=yes (default), 0=no]wc_mode
: how to get the hidden representation for the word context model? [1=concatenation, 2=sum (default), 3=average, 4=attention based average of the context embeddings]tc_mode
: how to get the hidden representation for the tweet context model? [1=concatenation, 2=sum, 3=average, 4=attention based average (default) of the context embeddings]tweet
: should we use the tweet based model too? [1=yes (default), 0=no]user
: should we use the user based model too? [1=yes, 0=no (default)]wpred
: what softmax to use for the final prediction in the word context model? [1=normal (time-consuming for large dataset), 2=hierarchical (default), 3=brown softmax]tpred
: what softmax to use for the final prediction in the tweet context model? [1=normal (time-consuming for large dataset), 2=hierarchical (default), 3=brown softmax]learning_rate
: learning rate for the gradient descent algorithm [0.001]batch_size
: number of sequences to train on in parallel [128]max_epochs
: number of full passes through the training data [25]
MIT