-
Notifications
You must be signed in to change notification settings - Fork 23
/
README
30 lines (16 loc) · 1.56 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
TA-Seq2Seq
This project is built on Theano 0.9, python 2.7 and Blocks(https://github.com/mila-udem/blocks). Please make sure they are installed before running this project.
Step 1: preparing the data
This project requires 3 vocabularies, the query vocabulary, response vocabulary and topic vocabulary. You should build every vocabulary as a dictionary like {'I': 0, 'UNK': 1, 'a': 2, 'student': 3, '</s>': 4} and save it as an pkl file.
This project also requires a query file, a response file and a topic word file, in which the query, response and topic word list attached of a case are saved separately in the same line of the three files.
Step 2: checking the configurations
Please refer to the function topicAawareJPData() in configurations_base.py as an example of how to write configuration of your experiment.
Let me explain some important features:
The 'topic_vocab_output' and 'topic_vocab_output' are set as the same topic vocabulary built beforehand.
'topic_embeddings' is the embedding matrix of all topic words in which the i-th row is the embedding of the i-th word in the topic word vocabulary.
'topical_word_num' is the number of topic words attached for every query (number of words in every line of topic word file).
'tw_vocab_overlap' is a one-hot matrix that maps topic words with their numbers in the response vocabulary. A simple case is as follows,
I UNK a student </s>
student(topic word)[[0 0 0 1 0]
a(topic word) [0 0 1 0 0]]
Step 3: Run!