Trying to achieve same results as "End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF" paper #13

ayrtondenner · 2018-04-30T19:34:51Z

Hello

I am trying to achieve the same results as "End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF" paper, but it doesn't seem to match the results that the paper claims to have after 50 epochs. I've also read #8 issue.

Because I'm using Windows, I got the hyper-parameters off the .sh script and wrote them direct into the NERCRF.py code.

After 50 epochs, using the GloVe embeddings with 100 dimensions and CoNLL-2003 corpus (which I downloaded from this repository), I've only managed a 84.76% F1 score in my dev data and a 80.32% F1 score in my test data. Are the hyper-parameters rights? Did you use eng.testa for dev data and eng.testb for test data, or did you used different files? Should I pay attention to anything else?

Thanks.

XuezheMax · 2018-04-30T20:18:01Z

Hi, The hyper-parameters seems reasonable, but the results are surprisingly low. I used standard train/dev/test data in CoNLL-2003. I am not familiar with Pytorch in windows, but I guess you need to use pytorch0.4, right? In this case, please switch to branch 'pytorch4.0'

…

On Mon, Apr 30, 2018 at 3:34 PM, Ayrton Denner ***@***.***> wrote: Hello I am trying to achieve the same results as "End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF" paper, but it doesn't seem to match the results that the paper claims to have after 50 epochs. I've also read XuezheMax/NeuroNLP2#8 <#8> issue. Because I'm using Windows, I got the hyper-parameters off the .sh script and wrote them direct into the NERCRF.py code. [image: image] <https://user-images.githubusercontent.com/13112588/39445920-03eb5880-4c93-11e8-90e2-cb73ad5f355e.png> After 50 epochs, using the GloVe embeddings with 100 dimensions and CoNLL-2003 corpus (which I downloaded from this repository <https://github.com/synalp/NER/tree/master/corpus/CoNLL-2003>), I've only managed a 84.76% F1 score in my dev data and a 80.32% F1 score in my test data. Are the hyper-parameters rights? Did you use eng.testa for dev data and eng.testb for test data, or did you used different files? Should I pay attention to anything else? Thanks. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#13>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADUtljuPdNndXkvtbZGVFJfjCsb_v-ptks5tt2dbgaJpZM4TtGiE> .

-- ------------------ Best regards, Ma，Xuezhe Language Technologies Institute, School of Computer Science, Carnegie Mellon University Tel: +1 206-512-5977

ayrtondenner · 2018-04-30T20:38:47Z

Hello. I'm actually using 0.3.1.post2 of PyTorch. Should I update it to 0.4? Could a different version produce a different performance outcome as well? Seems weird...

XuezheMax · 2018-04-30T20:53:34Z

No, I just make sure that you used the correct version because there are some major changes from pytorch0.3 to 0.4 which may cause some wired issues.

…

On Mon, Apr 30, 2018 at 4:38 PM, Ayrton Denner ***@***.***> wrote: Hello. I'm actually using 0.3.1.post2 of PyTorch. Should I update it to 0.4? Could a different version produce a different performance outcome as well? Seems weird... — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#13 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADUtlm1TMKAb14yrYFHYNYSf4avmmivOks5tt3ZYgaJpZM4TtGiE> .

-- ------------------ Best regards, Ma，Xuezhe Language Technologies Institute, School of Computer Science, Carnegie Mellon University Tel: +1 206-512-5977

pvcastro · 2018-06-06T14:46:35Z

Hi @XuezheMax, I'm also running the run_ner_crf script and I'm having problems getting to the results reported in your paper. I'm getting results similar to the ones @ayrtondenner got.
I'm using your pytorch0.4 branch with the following settings:

Anaconda 4.5.1 with python 3.6.3
pytorch 0.4.0
gensim 3.4.0
glove embeddings glove.6B.100d.gz
train, test and dev data are the ones I got from https://github.com/glample/tagger/tree/master/dataset. I adapted your code in my fork to disregard those starting numbers at each line. Should this make any difference?
The rest of the hyperparameters are the default ones that are set in the examples/run_ner_crf.sh script.

What could be wrong?

Thanks!

XuezheMax · 2018-06-06T17:07:19Z

Hi,
I am not sure what is the problem. One possible reason might be the tagging schema (BIO). If you are using the original data from conll 03, you need to convert it to the standard bio schema or the more advanced bioes (marginal improvement)

pvcastro · 2018-06-06T17:31:35Z

I see, I noticed that the annotation scheme is really messed up. The LSTM-CRF from Lample fixes this in memory, but the training file is the same, that's why it doesn't matter for his code.
Do you know where I could get this conll 2003 corpus annotated the proper way? In either BIO or BIOES scheme.

XuezheMax · 2018-06-06T17:51:24Z

Here is the code I used to convert it to BIO

def transform(ifile, ofile):
	with open(ifile, 'r') as reader, open(ofile, 'w') as writer:
		prev = 'O'
		for line in reader:
			line = line.strip()
			if len(line) == 0:
				prev = 'O'
				writer.write('\n')
				continue

			tokens = line.split()
			# print tokens
			label = tokens[-1]
			if label != 'O' and label != prev:
				if prev == 'O':
					label = 'B-' + label[2:]
				elif label[2:] != prev[2:]:
					label = 'B-' + label[2:]
				else:
					label = label
			writer.write(" ".join(tokens[:-1]) + " " + label)
			writer.write('\n')
			prev = tokens[-1]

pvcastro · 2018-06-06T17:53:15Z

Great, thanks @XuezheMax !

pvcastro · 2018-06-06T18:31:18Z

Strangely, it doesn't seem to have made any difference 🤔
I don't suppose using those starting numbers is relevant to determine where each document or sentence finishes, is it?
Can you confirm that the exact parameters in run_ner_crf.sh should be enough to reach a 90% F1 score on the test set? Some of them are different from what you report on your paper, but maybe the difference doesn't matter.

XuezheMax · 2018-06-06T18:45:53Z

Yes, I am sure that using the exact parameters in run_ner_crf.sh should give around 91% F1 score on test set.

XuezheMax · 2018-06-06T18:48:38Z

Would you please paste your log here so I can check the possible issues.
Again, make sure to remove the alphabets folder in data/ to create new vocabulary files.

pvcastro · 2018-06-06T19:00:39Z

Yes, I did remove the alphabets folder 👍

I'm running a new training now with the latest adjustments. Fixed another place in the code that was referring to the word token with the wrong index (after removing the starting numbers). Here's the log so far:

/home/pedro/virtualenv/pytorch/bin/python /home/pedro/pycharm-community-2017.3.2/helpers/pydev/pydevd.py --multiproc --qt-support=auto --client 127.0.0.1 --port 37531 --file /home/pedro/repositorios/NeuroNLP2/examples/NERCRF.py --cuda --mode LSTM --num_epochs 200 --batch_size 16 --hidden_size 256 --num_layers 1 --char_dim 30 --num_filters 30 --tag_space 128 --learning_rate 0.01 --decay_rate 0.05 --schedule 1 --gamma 0.0 --dropout std --p_in 0.33 --p_rnn 0.33 0.5 --p_out 0.5 --unk_replace 0.0 --bigram --embedding glove --embedding_dict /media/discoD/embeddings/English/Glove/glove.6B/glove.6B.100d.gz --train data/conll2003/english/eng.train.bios --dev data/conll2003/english/eng.testa.bios --test data/conll2003/english/eng.testb.bios
Connected to pydev debugger (build 181.4203.547)
pydev debugger: process 4141 is connecting

loading embedding: glove from /media/discoD/embeddings/English/Glove/glove.6B/glove.6B.100d.gz
2018-06-06 15:49:10,504 - NERCRF - INFO - Creating Alphabets
2018-06-06 15:49:10,504 - Create Alphabets - INFO - Creating Alphabets: data/alphabets/ner_crf/
2018-06-06 15:49:11,628 - Create Alphabets - INFO - Total Vocabulary Size: 20102
2018-06-06 15:49:11,628 - Create Alphabets - INFO - Total Singleton Size: 9178
2018-06-06 15:49:11,630 - Create Alphabets - INFO - Total Vocabulary Size (w.o rare words): 19046
2018-06-06 15:49:12,295 - Create Alphabets - INFO - Word Alphabet Size (Singleton): 23598 (8122)
2018-06-06 15:49:12,296 - Create Alphabets - INFO - Character Alphabet Size: 86
2018-06-06 15:49:12,296 - Create Alphabets - INFO - POS Alphabet Size: 47
2018-06-06 15:49:12,296 - Create Alphabets - INFO - Chunk Alphabet Size: 19
2018-06-06 15:49:12,296 - Create Alphabets - INFO - NER Alphabet Size: 10
2018-06-06 15:49:12,296 - NERCRF - INFO - Word Alphabet Size: 23598
2018-06-06 15:49:12,296 - NERCRF - INFO - Character Alphabet Size: 86
2018-06-06 15:49:12,296 - NERCRF - INFO - POS Alphabet Size: 47
2018-06-06 15:49:12,296 - NERCRF - INFO - Chunk Alphabet Size: 19
2018-06-06 15:49:12,296 - NERCRF - INFO - NER Alphabet Size: 10
2018-06-06 15:49:12,296 - NERCRF - INFO - Reading Data
Reading data from data/conll2003/english/eng.train.bios
reading data: 10000
Total number of data: 14987
Reading data from data/conll2003/english/eng.testa.bios
Total number of data: 3466
Reading data from data/conll2003/english/eng.testb.bios
Total number of data: 3684
oov: 339
2018-06-06 15:53:01,370 - NERCRF - INFO - constructing network...
/home/pedro/virtualenv/pytorch/lib/python3.6/site-packages/torch/nn/modules/rnn.py:38: UserWarning: dropout option adds dropout after all but last recurrent layer, so non-zero dropout expects num_layers greater than 1, but got dropout=0.5 and num_layers=1
"num_layers={}".format(dropout, num_layers))
2018-06-06 15:53:01,387 - NERCRF - INFO - Network: LSTM, num_layer=1, hidden=256, filter=30, tag_space=128, crf=bigram
2018-06-06 15:53:01,387 - NERCRF - INFO - training: l2: 0.000000, (#training data: 14987, batch: 16, unk replace: 0.00)
2018-06-06 15:53:01,387 - NERCRF - INFO - dropout(in, out, rnn): (0.33, 0.50, (0.33, 0.5))
Epoch 1 (LSTM(std), learning rate=0.0100, decay rate=0.0500 (schedule=1)):
train: 100/937 loss: 11.1595, time left (estimated): 15.22s
train: 200/937 loss: 7.2109, time left (estimated): 12.09s
train: 300/937 loss: 5.8057, time left (estimated): 10.10s
train: 400/937 loss: 5.0669, time left (estimated): 8.42s
train: 500/937 loss: 4.5988, time left (estimated): 6.86s
train: 600/937 loss: 4.2958, time left (estimated): 5.30s
train: 700/937 loss: 4.0640, time left (estimated): 3.72s
train: 800/937 loss: 3.8781, time left (estimated): 2.16s
train: 900/937 loss: 3.7093, time left (estimated): 0.59s
train: 937 loss: 3.6504, time: 14.58s
dev acc: 97.02%, precision: 79.24%, recall: 75.75%, F1: 77.45%
best dev acc: 97.02%, precision: 79.24%, recall: 75.75%, F1: 77.45% (epoch: 1)
best test acc: 96.35%, precision: 74.47%, recall: 71.87%, F1: 73.15% (epoch: 1)
Epoch 2 (LSTM(std), learning rate=0.0095, decay rate=0.0500 (schedule=1)):
train: 100/937 loss: 2.3227, time left (estimated): 12.82s
train: 200/937 loss: 2.4067, time left (estimated): 11.64s
train: 300/937 loss: 2.4593, time left (estimated): 10.47s
train: 400/937 loss: 2.4737, time left (estimated): 8.83s
train: 500/937 loss: 2.4559, time left (estimated): 7.14s
train: 600/937 loss: 2.4435, time left (estimated): 5.52s
train: 700/937 loss: 2.4438, time left (estimated): 3.89s
train: 800/937 loss: 2.4204, time left (estimated): 2.26s
train: 900/937 loss: 2.3705, time left (estimated): 0.61s
train: 937 loss: 2.3726, time: 15.26s
dev acc: 97.55%, precision: 80.98%, recall: 79.55%, F1: 80.26%
best dev acc: 97.55%, precision: 80.98%, recall: 79.55%, F1: 80.26% (epoch: 2)
best test acc: 96.58%, precision: 75.61%, recall: 74.53%, F1: 75.07% (epoch: 2)
Epoch 3 (LSTM(std), learning rate=0.0091, decay rate=0.0500 (schedule=1)):
train: 100/937 loss: 2.1304, time left (estimated): 13.11s
train: 200/937 loss: 2.1364, time left (estimated): 11.71s
train: 300/937 loss: 2.2066, time left (estimated): 10.44s
train: 400/937 loss: 2.1977, time left (estimated): 8.77s
train: 500/937 loss: 2.1580, time left (estimated): 7.15s
train: 600/937 loss: 2.1675, time left (estimated): 5.62s
train: 700/937 loss: 2.1589, time left (estimated): 3.94s
train: 800/937 loss: 2.1703, time left (estimated): 2.29s
train: 900/937 loss: 2.1547, time left (estimated): 0.62s
train: 937 loss: 2.1668, time: 15.58s
dev acc: 97.69%, precision: 81.49%, recall: 79.97%, F1: 80.72%
best dev acc: 97.69%, precision: 81.49%, recall: 79.97%, F1: 80.72% (epoch: 3)
best test acc: 96.99%, precision: 77.07%, recall: 75.80%, F1: 76.43% (epoch: 3)
Epoch 4 (LSTM(std), learning rate=0.0087, decay rate=0.0500 (schedule=1)):
train: 100/937 loss: 1.8794, time left (estimated): 12.88s
train: 200/937 loss: 1.9610, time left (estimated): 11.79s
train: 300/937 loss: 1.9138, time left (estimated): 9.95s
train: 400/937 loss: 1.8985, time left (estimated): 8.52s
train: 500/937 loss: 1.9170, time left (estimated): 7.04s
train: 600/937 loss: 1.8895, time left (estimated): 5.45s
train: 700/937 loss: 1.8744, time left (estimated): 3.83s
train: 800/937 loss: 1.8929, time left (estimated): 2.23s
train: 900/937 loss: 1.8825, time left (estimated): 0.61s
train: 937 loss: 1.8929, time: 15.16s
dev acc: 98.00%, precision: 82.79%, recall: 81.04%, F1: 81.91%
best dev acc: 98.00%, precision: 82.79%, recall: 81.04%, F1: 81.91% (epoch: 4)
best test acc: 97.13%, precision: 77.70%, recall: 76.02%, F1: 76.85% (epoch: 4)
Epoch 5 (LSTM(std), learning rate=0.0083, decay rate=0.0500 (schedule=1)):
train: 100/937 loss: 1.6122, time left (estimated): 12.56s
train: 200/937 loss: 1.7545, time left (estimated): 11.42s
train: 300/937 loss: 1.8272, time left (estimated): 10.19s
train: 400/937 loss: 1.8695, time left (estimated): 8.71s
train: 500/937 loss: 1.8206, time left (estimated): 6.98s
train: 600/937 loss: 1.8122, time left (estimated): 5.43s
train: 700/937 loss: 1.7974, time left (estimated): 3.80s
train: 800/937 loss: 1.7895, time left (estimated): 2.21s
train: 900/937 loss: 1.7844, time left (estimated): 0.60s
train: 937 loss: 1.7592, time: 14.92s
dev acc: 98.03%, precision: 82.51%, recall: 82.19%, F1: 82.35%
best dev acc: 98.03%, precision: 82.51%, recall: 82.19%, F1: 82.35% (epoch: 5)
best test acc: 97.14%, precision: 77.33%, recall: 77.21%, F1: 77.27% (epoch: 5)
Epoch 6 (LSTM(std), learning rate=0.0080, decay rate=0.0500 (schedule=1)):
train: 100/937 loss: 1.7967, time left (estimated): 13.76s
train: 200/937 loss: 1.7380, time left (estimated): 12.11s
train: 300/937 loss: 1.7062, time left (estimated): 10.29s
train: 400/937 loss: 1.7048, time left (estimated): 8.64s
train: 500/937 loss: 1.7066, time left (estimated): 7.05s
train: 600/937 loss: 1.7288, time left (estimated): 5.49s
train: 700/937 loss: 1.7400, time left (estimated): 3.88s
train: 800/937 loss: 1.7497, time left (estimated): 2.23s
train: 900/937 loss: 1.7627, time left (estimated): 0.61s
train: 937 loss: 1.7641, time: 15.22s
dev acc: 98.04%, precision: 82.49%, recall: 82.88%, F1: 82.69%
best dev acc: 98.04%, precision: 82.49%, recall: 82.88%, F1: 82.69% (epoch: 6)
best test acc: 97.07%, precision: 76.98%, recall: 78.21%, F1: 77.59% (epoch: 6)
Epoch 7 (LSTM(std), learning rate=0.0077, decay rate=0.0500 (schedule=1)):
train: 100/937 loss: 1.6099, time left (estimated): 12.96s
train: 200/937 loss: 1.7350, time left (estimated): 11.93s
train: 300/937 loss: 1.7129, time left (estimated): 10.28s
train: 400/937 loss: 1.7469, time left (estimated): 8.82s
train: 500/937 loss: 1.7572, time left (estimated): 7.17s
train: 600/937 loss: 1.7370, time left (estimated): 5.55s
train: 700/937 loss: 1.7093, time left (estimated): 3.89s
train: 800/937 loss: 1.6880, time left (estimated): 2.23s
train: 900/937 loss: 1.6875, time left (estimated): 0.61s
train: 937 loss: 1.6810, time: 15.08s
dev acc: 98.21%, precision: 83.37%, recall: 82.10%, F1: 82.73%
best dev acc: 98.21%, precision: 83.37%, recall: 82.10%, F1: 82.73% (epoch: 7)
best test acc: 97.24%, precision: 78.31%, recall: 77.17%, F1: 77.74% (epoch: 7)
Epoch 8 (LSTM(std), learning rate=0.0074, decay rate=0.0500 (schedule=1)):
train: 100/937 loss: 1.4601, time left (estimated): 12.74s
train: 200/937 loss: 1.7144, time left (estimated): 12.05s
train: 300/937 loss: 1.6738, time left (estimated): 10.41s
train: 400/937 loss: 1.6353, time left (estimated): 8.70s
train: 500/937 loss: 1.6488, time left (estimated): 7.16s
train: 600/937 loss: 1.6255, time left (estimated): 5.44s
train: 700/937 loss: 1.6026, time left (estimated): 3.82s
train: 800/937 loss: 1.5943, time left (estimated): 2.20s
train: 900/937 loss: 1.5904, time left (estimated): 0.60s
train: 937 loss: 1.5851, time: 15.00s
dev acc: 98.16%, precision: 83.43%, recall: 81.04%, F1: 82.22%
best dev acc: 98.21%, precision: 83.37%, recall: 82.10%, F1: 82.73% (epoch: 7)
best test acc: 97.24%, precision: 78.31%, recall: 77.17%, F1: 77.74% (epoch: 7)
Epoch 9 (LSTM(std), learning rate=0.0071, decay rate=0.0500 (schedule=1)):
train: 100/937 loss: 1.5425, time left (estimated): 12.59s
train: 200/937 loss: 1.6459, time left (estimated): 11.54s
train: 300/937 loss: 1.6891, time left (estimated): 10.43s
train: 400/937 loss: 1.6785, time left (estimated): 8.78s
train: 500/937 loss: 1.6821, time left (estimated): 7.16s
train: 600/937 loss: 1.6776, time left (estimated): 5.53s
train: 700/937 loss: 1.6908, time left (estimated): 3.96s
train: 800/937 loss: 1.6926, time left (estimated): 2.29s
train: 900/937 loss: 1.6696, time left (estimated): 0.62s
train: 937 loss: 1.6775, time: 15.54s
dev acc: 98.28%, precision: 83.63%, recall: 82.79%, F1: 83.21%
best dev acc: 98.28%, precision: 83.63%, recall: 82.79%, F1: 83.21% (epoch: 9)
best test acc: 97.40%, precision: 78.73%, recall: 78.42%, F1: 78.58% (epoch: 9)
Epoch 10 (LSTM(std), learning rate=0.0069, decay rate=0.0500 (schedule=1)):
train: 100/937 loss: 1.5739, time left (estimated): 14.88s
train: 200/937 loss: 1.4929, time left (estimated): 12.41s
train: 300/937 loss: 1.4723, time left (estimated): 10.52s
train: 400/937 loss: 1.5359, time left (estimated): 8.98s
train: 500/937 loss: 1.4927, time left (estimated): 7.15s
train: 600/937 loss: 1.4833, time left (estimated): 5.50s
train: 700/937 loss: 1.4559, time left (estimated): 3.83s
train: 800/937 loss: 1.4410, time left (estimated): 2.18s
train: 900/937 loss: 1.4595, time left (estimated): 0.60s
train: 937 loss: 1.4702, time: 15.02s
dev acc: 98.34%, precision: 83.74%, recall: 83.01%, F1: 83.37%
best dev acc: 98.34%, precision: 83.74%, recall: 83.01%, F1: 83.37% (epoch: 10)
best test acc: 97.48%, precision: 78.92%, recall: 78.56%, F1: 78.74% (epoch: 10)
Epoch 11 (LSTM(std), learning rate=0.0067, decay rate=0.0500 (schedule=1)):
train: 100/937 loss: 1.5810, time left (estimated): 14.03s
train: 200/937 loss: 1.5853, time left (estimated): 12.40s
train: 300/937 loss: 1.5423, time left (estimated): 10.69s
train: 400/937 loss: 1.5091, time left (estimated): 8.81s
train: 500/937 loss: 1.4996, time left (estimated): 7.09s
train: 600/937 loss: 1.4911, time left (estimated): 5.46s
train: 700/937 loss: 1.4757, time left (estimated): 3.83s
train: 800/937 loss: 1.4645, time left (estimated): 2.21s
train: 900/937 loss: 1.4694, time left (estimated): 0.61s
train: 937 loss: 1.4674, time: 15.13s
dev acc: 98.36%, precision: 83.55%, recall: 83.36%, F1: 83.46%
best dev acc: 98.36%, precision: 83.55%, recall: 83.36%, F1: 83.46% (epoch: 11)
best test acc: 97.57%, precision: 78.82%, recall: 79.07%, F1: 78.94% (epoch: 11)
Epoch 12 (LSTM(std), learning rate=0.0065, decay rate=0.0500 (schedule=1)):
train: 100/937 loss: 1.1637, time left (estimated): 12.80s
train: 200/937 loss: 1.2805, time left (estimated): 11.64s
train: 300/937 loss: 1.3509, time left (estimated): 10.32s
train: 400/937 loss: 1.3464, time left (estimated): 8.69s
train: 500/937 loss: 1.3561, time left (estimated): 7.01s
train: 600/937 loss: 1.3453, time left (estimated): 5.38s
train: 700/937 loss: 1.3587, time left (estimated): 3.78s
train: 800/937 loss: 1.3513, time left (estimated): 2.19s
train: 900/937 loss: 1.3726, time left (estimated): 0.61s
train: 937 loss: 1.3741, time: 15.10s
dev acc: 98.16%, precision: 83.13%, recall: 83.11%, F1: 83.12%
best dev acc: 98.36%, precision: 83.55%, recall: 83.36%, F1: 83.46% (epoch: 11)
best test acc: 97.57%, precision: 78.82%, recall: 79.07%, F1: 78.94% (epoch: 11)
Epoch 13 (LSTM(std), learning rate=0.0062, decay rate=0.0500 (schedule=1)):
train: 100/937 loss: 1.5685, time left (estimated): 15.13s
train: 200/937 loss: 1.5330, time left (estimated): 13.42s
train: 300/937 loss: 1.5295, time left (estimated): 11.47s
train: 400/937 loss: 1.4667, time left (estimated): 9.38s
train: 500/937 loss: 1.5124, time left (estimated): 7.85s
train: 600/937 loss: 1.5023, time left (estimated): 6.03s
train: 700/937 loss: 1.4821, time left (estimated): 4.17s
train: 800/937 loss: 1.4831, time left (estimated): 2.41s
train: 900/937 loss: 1.4986, time left (estimated): 0.66s
train: 937 loss: 1.4936, time: 16.46s
dev acc: 98.51%, precision: 84.16%, recall: 83.58%, F1: 83.87%
best dev acc: 98.51%, precision: 84.16%, recall: 83.58%, F1: 83.87% (epoch: 13)
best test acc: 97.72%, precision: 79.45%, recall: 79.14%, F1: 79.30% (epoch: 13)
Epoch 14 (LSTM(std), learning rate=0.0061, decay rate=0.0500 (schedule=1)):
train: 100/937 loss: 1.2822, time left (estimated): 12.56s
train: 200/937 loss: 1.3552, time left (estimated): 11.52s
train: 300/937 loss: 1.3195, time left (estimated): 9.87s
train: 400/937 loss: 1.3449, time left (estimated): 8.48s
train: 500/937 loss: 1.3591, time left (estimated): 6.98s
train: 600/937 loss: 1.3216, time left (estimated): 5.32s
train: 700/937 loss: 1.3230, time left (estimated): 3.79s
train: 800/937 loss: 1.3476, time left (estimated): 2.21s
train: 900/937 loss: 1.3365, time left (estimated): 0.60s
train: 937 loss: 1.3412, time: 14.99s
dev acc: 98.42%, precision: 83.90%, recall: 83.42%, F1: 83.66%
best dev acc: 98.51%, precision: 84.16%, recall: 83.58%, F1: 83.87% (epoch: 13)
best test acc: 97.72%, precision: 79.45%, recall: 79.14%, F1: 79.30% (epoch: 13)
Epoch 15 (LSTM(std), learning rate=0.0059, decay rate=0.0500 (schedule=1)):
train: 100/937 loss: 1.5639, time left (estimated): 14.34s
train: 200/937 loss: 1.5256, time left (estimated): 12.75s
train: 300/937 loss: 1.5398, time left (estimated): 11.06s
train: 400/937 loss: 1.5272, time left (estimated): 9.35s
train: 500/937 loss: 1.5028, time left (estimated): 7.52s
train: 600/937 loss: 1.4775, time left (estimated): 5.78s
train: 700/937 loss: 1.4980, time left (estimated): 4.12s
train: 800/937 loss: 1.4719, time left (estimated): 2.37s
train: 900/937 loss: 1.4516, time left (estimated): 0.64s
train: 937 loss: 1.4439, time: 15.85s
dev acc: 98.45%, precision: 83.76%, recall: 82.96%, F1: 83.36%
best dev acc: 98.51%, precision: 84.16%, recall: 83.58%, F1: 83.87% (epoch: 13)
best test acc: 97.72%, precision: 79.45%, recall: 79.14%, F1: 79.30% (epoch: 13)
Epoch 16 (LSTM(std), learning rate=0.0057, decay rate=0.0500 (schedule=1)):
train: 100/937 loss: 1.0337, time left (estimated): 11.95s
train: 200/937 loss: 1.2146, time left (estimated): 11.50s
train: 300/937 loss: 1.2163, time left (estimated): 10.00s
train: 400/937 loss: 1.2734, time left (estimated): 8.66s
train: 500/937 loss: 1.3102, time left (estimated): 7.14s
train: 600/937 loss: 1.3274, time left (estimated): 5.56s
train: 700/937 loss: 1.3259, time left (estimated): 3.90s
train: 800/937 loss: 1.3224, time left (estimated): 2.24s
train: 900/937 loss: 1.3096, time left (estimated): 0.61s
train: 937 loss: 1.3034, time: 15.19s
dev acc: 98.43%, precision: 83.86%, recall: 83.54%, F1: 83.70%
best dev acc: 98.51%, precision: 84.16%, recall: 83.58%, F1: 83.87% (epoch: 13)
best test acc: 97.72%, precision: 79.45%, recall: 79.14%, F1: 79.30% (epoch: 13)
Epoch 17 (LSTM(std), learning rate=0.0056, decay rate=0.0500 (schedule=1)):
train: 100/937 loss: 1.5186, time left (estimated): 13.96s
train: 200/937 loss: 1.4127, time left (estimated): 11.87s
train: 300/937 loss: 1.3337, time left (estimated): 9.97s
train: 400/937 loss: 1.3327, time left (estimated): 8.48s
train: 500/937 loss: 1.3473, time left (estimated): 6.99s
train: 600/937 loss: 1.3244, time left (estimated): 5.42s
train: 700/937 loss: 1.3301, time left (estimated): 3.82s
train: 800/937 loss: 1.3322, time left (estimated): 2.22s
train: 900/937 loss: 1.3217, time left (estimated): 0.61s
train: 937 loss: 1.3175, time: 15.08s
dev acc: 98.46%, precision: 84.10%, recall: 83.46%, F1: 83.78%
best dev acc: 98.51%, precision: 84.16%, recall: 83.58%, F1: 83.87% (epoch: 13)
best test acc: 97.72%, precision: 79.45%, recall: 79.14%, F1: 79.30% (epoch: 13)
Epoch 18 (LSTM(std), learning rate=0.0054, decay rate=0.0500 (schedule=1)):
train: 100/937 loss: 1.2661, time left (estimated): 13.31s

XuezheMax · 2018-06-06T19:50:05Z

Here is my log. You are using python 3.6, right? what is your pytorch version?
Could you trying to using python 2.7 with pytorch 0.3.1 to re-run your experiments to see if it is the issue of the versions.
loading embedding: glove from data/glove/glove.6B/glove.6B.100d.gz
2018-06-06 15:44:55,126 - NERCRF - INFO - Creating Alphabets
2018-06-06 15:44:55,126 - Create Alphabets - INFO - Creating Alphabets: data/alphabets/ner_crf/
2018-06-06 15:44:56,115 - Create Alphabets - INFO - Total Vocabulary Size: 20102
2018-06-06 15:44:56,116 - Create Alphabets - INFO - Total Singleton Size: 9178
2018-06-06 15:44:56,120 - Create Alphabets - INFO - Total Vocabulary Size (w.o rare words): 19046
2018-06-06 15:44:56,499 - Create Alphabets - INFO - Word Alphabet Size (Singleton): 23598 (8122)
2018-06-06 15:44:56,499 - Create Alphabets - INFO - Character Alphabet Size: 86
2018-06-06 15:44:56,499 - Create Alphabets - INFO - POS Alphabet Size: 47
2018-06-06 15:44:56,499 - Create Alphabets - INFO - Chunk Alphabet Size: 19
2018-06-06 15:44:56,499 - Create Alphabets - INFO - NER Alphabet Size: 10
2018-06-06 15:44:56,499 - NERCRF - INFO - Word Alphabet Size: 23598
2018-06-06 15:44:56,500 - NERCRF - INFO - Character Alphabet Size: 86
2018-06-06 15:44:56,500 - NERCRF - INFO - POS Alphabet Size: 47
2018-06-06 15:44:56,500 - NERCRF - INFO - Chunk Alphabet Size: 19
2018-06-06 15:44:56,500 - NERCRF - INFO - NER Alphabet Size: 10
2018-06-06 15:44:56,500 - NERCRF - INFO - Reading Data
Reading data from data/conll2003/english/eng.train.bio.conll
reading data: 10000
Total number of data: 14987
Reading data from data/conll2003/english/eng.dev.bio.conll
Total number of data: 3466
Reading data from data/conll2003/english/eng.test.bio.conll
Total number of data: 3684
oov: 339
2018-06-06 15:45:01,810 - NERCRF - INFO - constructing network...
2018-06-06 15:45:02,979 - NERCRF - INFO - Network: LSTM, num_layer=1, hidden=256, filter=30, tag_space=128, crf=bigram
2018-06-06 15:45:02,980 - NERCRF - INFO - training: l2: 0.000000, (#training data: 14987, batch: 16, unk replace: 0.00)
2018-06-06 15:45:02,980 - NERCRF - INFO - dropout(in, out, rnn): (0.33, 0.50, (0.33, 0.5))
Epoch 1 (LSTM(std), learning rate=0.0100, decay rate=0.0500 (schedule=1)):
train: 937 loss: 3.6320, time: 23.30s
dev acc: 96.81%, precision: 86.45%, recall: 83.52%, F1: 84.96%
best dev acc: 96.81%, precision: 86.45%, recall: 83.52%, F1: 84.96% (epoch: 1)
best test acc: 95.90%, precision: 81.77%, recall: 80.05%, F1: 80.90% (epoch: 1)
Epoch 2 (LSTM(std), learning rate=0.0095, decay rate=0.0500 (schedule=1)):
train: 937 loss: 2.3164, time: 19.93s
dev acc: 97.53%, precision: 89.47%, recall: 87.39%, F1: 88.42%
best dev acc: 97.53%, precision: 89.47%, recall: 87.39%, F1: 88.42% (epoch: 2)
best test acc: 96.79%, precision: 85.61%, recall: 84.37%, F1: 84.98% (epoch: 2)
Epoch 3 (LSTM(std), learning rate=0.0091, decay rate=0.0500 (schedule=1)):
train: 937 loss: 2.0166, time: 20.60s
dev acc: 97.56%, precision: 89.06%, recall: 87.24%, F1: 88.14%
best dev acc: 97.53%, precision: 89.47%, recall: 87.39%, F1: 88.42% (epoch: 2)
best test acc: 96.79%, precision: 85.61%, recall: 84.37%, F1: 84.98% (epoch: 2)
Epoch 4 (LSTM(std), learning rate=0.0087, decay rate=0.0500 (schedule=1)):
train: 937 loss: 1.9072, time: 21.19s
dev acc: 97.81%, precision: 91.33%, recall: 88.66%, F1: 89.97%
best dev acc: 97.81%, precision: 91.33%, recall: 88.66%, F1: 89.97% (epoch: 4)
best test acc: 97.20%, precision: 88.10%, recall: 85.98%, F1: 87.03% (epoch: 4)
Epoch 5 (LSTM(std), learning rate=0.0083, decay rate=0.0500 (schedule=1)):
train: 937 loss: 1.8425, time: 20.10s
dev acc: 98.05%, precision: 92.23%, recall: 90.04%, F1: 91.12%
best dev acc: 98.05%, precision: 92.23%, recall: 90.04%, F1: 91.12% (epoch: 5)
best test acc: 97.27%, precision: 88.23%, recall: 86.63%, F1: 87.42% (epoch: 5)
Epoch 6 (LSTM(std), learning rate=0.0080, decay rate=0.0500 (schedule=1)):
train: 937 loss: 1.7096, time: 20.70s
dev acc: 97.79%, precision: 92.15%, recall: 89.13%, F1: 90.62%
best dev acc: 98.05%, precision: 92.23%, recall: 90.04%, F1: 91.12% (epoch: 5)
best test acc: 97.27%, precision: 88.23%, recall: 86.63%, F1: 87.42% (epoch: 5)
Epoch 7 (LSTM(std), learning rate=0.0077, decay rate=0.0500 (schedule=1)):
train: 937 loss: 1.7420, time: 22.69s
dev acc: 98.16%, precision: 91.95%, recall: 90.91%, F1: 91.43%
best dev acc: 98.16%, precision: 91.95%, recall: 90.91%, F1: 91.43% (epoch: 7)
best test acc: 97.38%, precision: 88.18%, recall: 87.82%, F1: 88.00% (epoch: 7)

pvcastro · 2018-06-06T20:13:50Z

Yes, I'm running Anaconda 4.5.1 with python 3.6.3, pytorch 0.4.0 (using your pytorch0.4 branch) and gensim 3.4.0.
I'll set up the python 2 environment and will verify the results.

XuezheMax · 2018-06-06T21:41:20Z

FYI. here is the first 35 epochs for python 2.7 with pytorch 0.4. I seems it converges slower than pytorch 0.3. But still approaches 90% F1 after 35 epochs.
loading embedding: glove from data/glove/glove.6B/glove.6B.100d.gz
2018-06-06 16:25:56,009 - NERCRF - INFO - Creating Alphabets
2018-06-06 16:25:56,057 - Create Alphabets - INFO - Word Alphabet Size (Singleton): 23598 (8122)
2018-06-06 16:25:56,058 - Create Alphabets - INFO - Character Alphabet Size: 86
2018-06-06 16:25:56,058 - Create Alphabets - INFO - POS Alphabet Size: 47
2018-06-06 16:25:56,058 - Create Alphabets - INFO - Chunk Alphabet Size: 19
2018-06-06 16:25:56,058 - Create Alphabets - INFO - NER Alphabet Size: 18
2018-06-06 16:25:56,058 - NERCRF - INFO - Word Alphabet Size: 23598
2018-06-06 16:25:56,058 - NERCRF - INFO - Character Alphabet Size: 86
2018-06-06 16:25:56,058 - NERCRF - INFO - POS Alphabet Size: 47
2018-06-06 16:25:56,058 - NERCRF - INFO - Chunk Alphabet Size: 19
2018-06-06 16:25:56,058 - NERCRF - INFO - NER Alphabet Size: 18
2018-06-06 16:25:56,058 - NERCRF - INFO - Reading Data
Reading data from data/conll2003/english/eng.train.bioes.conll
reading data: 10000
Total number of data: 14987
Reading data from data/conll2003/english/eng.dev.bioes.conll
Total number of data: 3466
Reading data from data/conll2003/english/eng.test.bioes.conll
Total number of data: 3684
oov: 339
2018-06-06 16:25:59,294 - NERCRF - INFO - constructing network...
/home/max/.local/lib/python2.7/site-packages/torch/nn/modules/rnn.py:38: UserWarning: dropout option adds dropout after all but last recurrent layer, so non-zero dropout expects num_layers greater than 1, but got dropout=0.5 and num_layers=1
"num_layers={}".format(dropout, num_layers))
2018-06-06 16:25:59,314 - NERCRF - INFO - Network: LSTM, num_layer=1, hidden=256, filter=30, tag_space=128, crf=bigram
2018-06-06 16:25:59,315 - NERCRF - INFO - training: l2: 0.000000, (#training data: 14987, batch: 16, unk replace: 0.00)
2018-06-06 16:25:59,315 - NERCRF - INFO - dropout(in, out, rnn): (0.33, 0.50, (0.33, 0.5))
Epoch 1 (LSTM(std), learning rate=0.0100, decay rate=0.0500 (schedule=1)):
train: 937 loss: 11.5858, time: 116.24s
dev acc: 94.64%, precision: 77.99%, recall: 71.49%, F1: 74.60%
best dev acc: 94.64%, precision: 77.99%, recall: 71.49%, F1: 74.60% (epoch: 1)
best test acc: 93.82%, precision: 76.13%, recall: 70.41%, F1: 73.16% (epoch: 1)
Epoch 2 (LSTM(std), learning rate=0.0095, decay rate=0.0500 (schedule=1)):
train: 937 loss: 3.1999, time: 125.24s
dev acc: 96.54%, precision: 85.75%, recall: 83.12%, F1: 84.41%
best dev acc: 96.54%, precision: 85.75%, recall: 83.12%, F1: 84.41% (epoch: 2)
best test acc: 95.70%, precision: 81.84%, recall: 79.64%, F1: 80.73% (epoch: 2)
Epoch 3 (LSTM(std), learning rate=0.0091, decay rate=0.0500 (schedule=1)):
train: 937 loss: 2.6765, time: 114.69s
dev acc: 96.89%, precision: 90.07%, recall: 84.40%, F1: 87.14%
best dev acc: 96.89%, precision: 90.07%, recall: 84.40%, F1: 87.14% (epoch: 3)
best test acc: 95.90%, precision: 85.93%, recall: 80.35%, F1: 83.05% (epoch: 3)
Epoch 4 (LSTM(std), learning rate=0.0087, decay rate=0.0500 (schedule=1)):
train: 937 loss: 2.3663, time: 107.77s
dev acc: 97.26%, precision: 89.77%, recall: 85.85%, F1: 87.77%
best dev acc: 97.26%, precision: 89.77%, recall: 85.85%, F1: 87.77% (epoch: 4)
best test acc: 96.40%, precision: 85.72%, recall: 81.82%, F1: 83.72% (epoch: 4)
Epoch 5 (LSTM(std), learning rate=0.0083, decay rate=0.0500 (schedule=1)):
train: 937 loss: 2.2414, time: 112.05s
dev acc: 97.48%, precision: 88.71%, recall: 88.37%, F1: 88.54%
best dev acc: 97.48%, precision: 88.71%, recall: 88.37%, F1: 88.54% (epoch: 5)
best test acc: 96.54%, precision: 84.67%, recall: 84.95%, F1: 84.81% (epoch: 5)
Epoch 6 (LSTM(std), learning rate=0.0080, decay rate=0.0500 (schedule=1)):
train: 937 loss: 2.1981, time: 112.35s
dev acc: 97.58%, precision: 90.12%, recall: 89.04%, F1: 89.58%
best dev acc: 97.58%, precision: 90.12%, recall: 89.04%, F1: 89.58% (epoch: 6)
best test acc: 96.85%, precision: 87.28%, recall: 85.98%, F1: 86.62% (epoch: 6)
Epoch 7 (LSTM(std), learning rate=0.0077, decay rate=0.0500 (schedule=1)):
train: 937 loss: 2.0362, time: 114.91s
dev acc: 97.70%, precision: 92.14%, recall: 88.61%, F1: 90.34%
best dev acc: 97.70%, precision: 92.14%, recall: 88.61%, F1: 90.34% (epoch: 7)
best test acc: 96.89%, precision: 88.24%, recall: 84.24%, F1: 86.20% (epoch: 7)
Epoch 8 (LSTM(std), learning rate=0.0074, decay rate=0.0500 (schedule=1)):
train: 937 loss: 1.8955, time: 111.44s
dev acc: 97.35%, precision: 89.69%, recall: 87.53%, F1: 88.60%
best dev acc: 97.70%, precision: 92.14%, recall: 88.61%, F1: 90.34% (epoch: 7)
best test acc: 96.89%, precision: 88.24%, recall: 84.24%, F1: 86.20% (epoch: 7)
Epoch 9 (LSTM(std), learning rate=0.0071, decay rate=0.0500 (schedule=1)):
train: 937 loss: 1.9163, time: 106.08s
dev acc: 97.94%, precision: 91.67%, recall: 90.17%, F1: 90.91%
best dev acc: 97.94%, precision: 91.67%, recall: 90.17%, F1: 90.91% (epoch: 9)
best test acc: 97.14%, precision: 88.07%, recall: 86.88%, F1: 87.47% (epoch: 9)
Epoch 10 (LSTM(std), learning rate=0.0069, decay rate=0.0500 (schedule=1)):
train: 937 loss: 1.8767, time: 110.97s
dev acc: 97.96%, precision: 92.15%, recall: 90.07%, F1: 91.10%
best dev acc: 97.96%, precision: 92.15%, recall: 90.07%, F1: 91.10% (epoch: 10)
best test acc: 97.07%, precision: 87.82%, recall: 86.07%, F1: 86.94% (epoch: 10)
Epoch 11 (LSTM(std), learning rate=0.0067, decay rate=0.0500 (schedule=1)):
train: 937 loss: 1.8514, time: 113.16s
dev acc: 97.82%, precision: 91.57%, recall: 90.27%, F1: 90.92%
best dev acc: 97.96%, precision: 92.15%, recall: 90.07%, F1: 91.10% (epoch: 10)
best test acc: 97.07%, precision: 87.82%, recall: 86.07%, F1: 86.94% (epoch: 10)
Epoch 12 (LSTM(std), learning rate=0.0065, decay rate=0.0500 (schedule=1)):
train: 937 loss: 1.7597, time: 108.15s
dev acc: 98.15%, precision: 92.33%, recall: 91.22%, F1: 91.77%
best dev acc: 98.15%, precision: 92.33%, recall: 91.22%, F1: 91.77% (epoch: 12)
best test acc: 97.16%, precision: 87.74%, recall: 87.32%, F1: 87.53% (epoch: 12)
Epoch 13 (LSTM(std), learning rate=0.0062, decay rate=0.0500 (schedule=1)):
train: 937 loss: 1.7508, time: 111.77s
dev acc: 98.06%, precision: 92.18%, recall: 90.71%, F1: 91.44%
best dev acc: 98.15%, precision: 92.33%, recall: 91.22%, F1: 91.77% (epoch: 12)
best test acc: 97.16%, precision: 87.74%, recall: 87.32%, F1: 87.53% (epoch: 12)
Epoch 14 (LSTM(std), learning rate=0.0061, decay rate=0.0500 (schedule=1)):
train: 937 loss: 1.7144, time: 107.66s
dev acc: 98.05%, precision: 92.76%, recall: 90.61%, F1: 91.67%
best dev acc: 98.15%, precision: 92.33%, recall: 91.22%, F1: 91.77% (epoch: 12)
best test acc: 97.16%, precision: 87.74%, recall: 87.32%, F1: 87.53% (epoch: 12)
Epoch 15 (LSTM(std), learning rate=0.0059, decay rate=0.0500 (schedule=1)):
train: 937 loss: 1.6631, time: 113.54s
dev acc: 98.13%, precision: 92.51%, recall: 91.01%, F1: 91.75%
best dev acc: 98.15%, precision: 92.33%, recall: 91.22%, F1: 91.77% (epoch: 12)
best test acc: 97.16%, precision: 87.74%, recall: 87.32%, F1: 87.53% (epoch: 12)
Epoch 16 (LSTM(std), learning rate=0.0057, decay rate=0.0500 (schedule=1)):
train: 937 loss: 1.6694, time: 115.85s
dev acc: 98.08%, precision: 92.43%, recall: 90.83%, F1: 91.62%
best dev acc: 98.15%, precision: 92.33%, recall: 91.22%, F1: 91.77% (epoch: 12)
best test acc: 97.16%, precision: 87.74%, recall: 87.32%, F1: 87.53% (epoch: 12)
Epoch 17 (LSTM(std), learning rate=0.0056, decay rate=0.0500 (schedule=1)):
train: 937 loss: 1.6892, time: 115.00s
dev acc: 98.20%, precision: 92.69%, recall: 91.27%, F1: 91.97%
best dev acc: 98.20%, precision: 92.69%, recall: 91.27%, F1: 91.97% (epoch: 17)
best test acc: 97.30%, precision: 89.00%, recall: 87.64%, F1: 88.31% (epoch: 17)
Epoch 18 (LSTM(std), learning rate=0.0054, decay rate=0.0500 (schedule=1)):
train: 937 loss: 1.5907, time: 108.94s
dev acc: 98.17%, precision: 93.07%, recall: 91.59%, F1: 92.32%
best dev acc: 98.17%, precision: 93.07%, recall: 91.59%, F1: 92.32% (epoch: 18)
best test acc: 97.39%, precision: 89.51%, recall: 88.21%, F1: 88.85% (epoch: 18)
Epoch 19 (LSTM(std), learning rate=0.0053, decay rate=0.0500 (schedule=1)):
train: 937 loss: 1.5726, time: 110.24s
dev acc: 98.24%, precision: 93.42%, recall: 91.47%, F1: 92.43%
best dev acc: 98.24%, precision: 93.42%, recall: 91.47%, F1: 92.43% (epoch: 19)
best test acc: 97.42%, precision: 89.85%, recall: 87.91%, F1: 88.87% (epoch: 19)
Epoch 20 (LSTM(std), learning rate=0.0051, decay rate=0.0500 (schedule=1)):
train: 937 loss: 1.5618, time: 110.93s
dev acc: 98.08%, precision: 92.10%, recall: 90.98%, F1: 91.53%
best dev acc: 98.24%, precision: 93.42%, recall: 91.47%, F1: 92.43% (epoch: 19)
best test acc: 97.42%, precision: 89.85%, recall: 87.91%, F1: 88.87% (epoch: 19)
Epoch 21 (LSTM(std), learning rate=0.0050, decay rate=0.0500 (schedule=1)):
train: 937 loss: 1.5315, time: 114.51s
dev acc: 98.24%, precision: 93.34%, recall: 91.55%, F1: 92.44%
best dev acc: 98.24%, precision: 93.34%, recall: 91.55%, F1: 92.44% (epoch: 21)
best test acc: 97.39%, precision: 89.59%, recall: 87.73%, F1: 88.65% (epoch: 21)
Epoch 22 (LSTM(std), learning rate=0.0049, decay rate=0.0500 (schedule=1)):
train: 937 loss: 1.5707, time: 111.92s
dev acc: 98.34%, precision: 93.33%, recall: 92.34%, F1: 92.83%
best dev acc: 98.34%, precision: 93.33%, recall: 92.34%, F1: 92.83% (epoch: 22)
best test acc: 97.40%, precision: 89.47%, recall: 88.49%, F1: 88.98% (epoch: 22)
Epoch 23 (LSTM(std), learning rate=0.0048, decay rate=0.0500 (schedule=1)):
train: 937 loss: 1.5023, time: 109.71s
dev acc: 98.34%, precision: 93.17%, recall: 92.58%, F1: 92.88%
best dev acc: 98.34%, precision: 93.17%, recall: 92.58%, F1: 92.88% (epoch: 23)
best test acc: 97.45%, precision: 89.12%, recall: 88.79%, F1: 88.96% (epoch: 23)
Epoch 24 (LSTM(std), learning rate=0.0047, decay rate=0.0500 (schedule=1)):
train: 937 loss: 1.5445, time: 118.68s
dev acc: 98.29%, precision: 93.96%, recall: 91.62%, F1: 92.77%
best dev acc: 98.34%, precision: 93.17%, recall: 92.58%, F1: 92.88% (epoch: 23)
best test acc: 97.45%, precision: 89.12%, recall: 88.79%, F1: 88.96% (epoch: 23)
Epoch 25 (LSTM(std), learning rate=0.0045, decay rate=0.0500 (schedule=1)):
train: 937 loss: 1.5255, time: 114.08s
dev acc: 98.30%, precision: 93.43%, recall: 92.17%, F1: 92.80%
best dev acc: 98.34%, precision: 93.17%, recall: 92.58%, F1: 92.88% (epoch: 23)
best test acc: 97.45%, precision: 89.12%, recall: 88.79%, F1: 88.96% (epoch: 23)
Epoch 26 (LSTM(std), learning rate=0.0044, decay rate=0.0500 (schedule=1)):
train: 937 loss: 1.5290, time: 113.38s
dev acc: 98.37%, precision: 93.29%, recall: 92.49%, F1: 92.89%
best dev acc: 98.37%, precision: 93.29%, recall: 92.49%, F1: 92.89% (epoch: 26)
best test acc: 97.52%, precision: 89.55%, recall: 88.95%, F1: 89.25% (epoch: 26)
Epoch 27 (LSTM(std), learning rate=0.0043, decay rate=0.0500 (schedule=1)):
train: 937 loss: 1.4693, time: 111.95s
dev acc: 98.31%, precision: 93.06%, recall: 92.28%, F1: 92.67%
best dev acc: 98.37%, precision: 93.29%, recall: 92.49%, F1: 92.89% (epoch: 26)
best test acc: 97.52%, precision: 89.55%, recall: 88.95%, F1: 89.25% (epoch: 26)
Epoch 28 (LSTM(std), learning rate=0.0043, decay rate=0.0500 (schedule=1)):
train: 937 loss: 1.3779, time: 105.43s
dev acc: 98.39%, precision: 93.44%, recall: 92.34%, F1: 92.89%
best dev acc: 98.37%, precision: 93.29%, recall: 92.49%, F1: 92.89% (epoch: 26)
best test acc: 97.52%, precision: 89.55%, recall: 88.95%, F1: 89.25% (epoch: 26)
Epoch 29 (LSTM(std), learning rate=0.0042, decay rate=0.0500 (schedule=1)):
train: 937 loss: 1.4463, time: 117.16s
dev acc: 98.38%, precision: 93.51%, recall: 92.33%, F1: 92.91%
best dev acc: 98.38%, precision: 93.51%, recall: 92.33%, F1: 92.91% (epoch: 29)
best test acc: 97.61%, precision: 89.99%, recall: 88.81%, F1: 89.40% (epoch: 29)
Epoch 30 (LSTM(std), learning rate=0.0041, decay rate=0.0500 (schedule=1)):
train: 937 loss: 1.4345, time: 108.91s
dev acc: 98.33%, precision: 93.25%, recall: 92.28%, F1: 92.76%
best dev acc: 98.38%, precision: 93.51%, recall: 92.33%, F1: 92.91% (epoch: 29)
best test acc: 97.61%, precision: 89.99%, recall: 88.81%, F1: 89.40% (epoch: 29)
Epoch 31 (LSTM(std), learning rate=0.0040, decay rate=0.0500 (schedule=1)):
train: 937 loss: 1.4096, time: 111.84s
dev acc: 98.40%, precision: 93.50%, recall: 92.53%, F1: 93.01%
best dev acc: 98.40%, precision: 93.50%, recall: 92.53%, F1: 93.01% (epoch: 31)
best test acc: 97.61%, precision: 90.14%, recall: 89.47%, F1: 89.80% (epoch: 31)
Epoch 32 (LSTM(std), learning rate=0.0039, decay rate=0.0500 (schedule=1)):
train: 937 loss: 1.4046, time: 113.07s
dev acc: 98.39%, precision: 93.94%, recall: 92.31%, F1: 93.12%
best dev acc: 98.39%, precision: 93.94%, recall: 92.31%, F1: 93.12% (epoch: 32)
best test acc: 97.58%, precision: 90.38%, recall: 88.79%, F1: 89.58% (epoch: 32)
Epoch 33 (LSTM(std), learning rate=0.0038, decay rate=0.0500 (schedule=1)):
train: 937 loss: 1.4126, time: 111.48s
dev acc: 98.47%, precision: 93.98%, recall: 92.68%, F1: 93.32%
best dev acc: 98.47%, precision: 93.98%, recall: 92.68%, F1: 93.32% (epoch: 33)
best test acc: 97.56%, precision: 89.93%, recall: 88.56%, F1: 89.24% (epoch: 33)
Epoch 34 (LSTM(std), learning rate=0.0038, decay rate=0.0500 (schedule=1)):
train: 937 loss: 1.3716, time: 107.51s
dev acc: 98.40%, precision: 93.87%, recall: 92.46%, F1: 93.16%
best dev acc: 98.47%, precision: 93.98%, recall: 92.68%, F1: 93.32% (epoch: 33)
best test acc: 97.56%, precision: 89.93%, recall: 88.56%, F1: 89.24% (epoch: 33)
Epoch 35 (LSTM(std), learning rate=0.0037, decay rate=0.0500 (schedule=1)):
train: 937 loss: 1.3615, time: 116.80s
dev acc: 98.39%, precision: 93.65%, recall: 92.38%, F1: 93.01%
best dev acc: 98.47%, precision: 93.98%, recall: 92.68%, F1: 93.32% (epoch: 33)
best test acc: 97.56%, precision: 89.93%, recall: 88.56%, F1: 89.24% (epoch: 33)

pvcastro · 2018-06-07T12:13:36Z

Hi @XuezheMax!

Besides running the python 2 setup (with pytorch 3.1), I also ran the script mentioned in #9 to add indexes to the start of each line in my corpus, to eliminate the possibility that I maybe did something wrong when adapting the code to run without the indexes. The results I got were compatible to yours, I got to near 90% F1 score on the test dataset on only 10 epochs.

Then I got back to the pytorch4.0 branch with python 3, reverted the changes I made to disregard the starting indexes and ran the training on the corpus with starting indexes again, to see if I had succeeded because of the corpus or because of the python and pytorch versions, and I ended up getting those same low results again. So looks like there's something wrong with running pytorch 4.0 on python 3 🤔

I didn't test pytorch 4.0 with python 2.7, I'm guessing you already did that. What you probably didn't do was testing with python 3.6, right?

ducalpha · 2018-06-07T20:17:33Z

Python 2.7 + Pytorch0.4 seems work well. My result on this config matches the paper. Running run_ner_crf.sh on CoNLL2003, I got F1 91.36% (better than the paper 91.21%) on epoch 167, but after that F1 reduced to 91.12%.

Epoch 167 (LSTM(std), learning rate=0.0011, decay rate=0.0500 (schedule=1)):
train: 937 loss: 0.7290, time: 31.23s
dev acc: 98.94%, precision: 94.79%, recall: 94.65%, F1: 94.72%
best dev acc: 98.94%, precision: 94.79%, recall: 94.65%, F1: 94.72% (epoch: 167)
best test acc: 98.14%, precision: 91.46%, recall: 91.25%, F1: 91.36% (epoch: 167)

pvcastro · 2018-06-07T20:21:24Z

These reported results are usually averaged after some number of executions, it doesn't actually mean that their highest individual training was 91.21%.

So if you ran with 2.7 and pytorch 0.4, I'm inclined to think that the problem must be related to python 3 somehow 🤔

pvcastro · 2018-06-07T20:29:36Z

@ducalpha did you use the pytorch4.0 branch, or did you use the master?

ducalpha · 2018-06-07T20:43:30Z

I used the pytorch4.0 branch. The master branch yield an recursive stack exceeded error.

XuezheMax closed this as completed Jun 6, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trying to achieve same results as "End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF" paper #13

Trying to achieve same results as "End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF" paper #13

ayrtondenner commented Apr 30, 2018

XuezheMax commented Apr 30, 2018 via email

ayrtondenner commented Apr 30, 2018

XuezheMax commented Apr 30, 2018 via email

pvcastro commented Jun 6, 2018

XuezheMax commented Jun 6, 2018

pvcastro commented Jun 6, 2018

XuezheMax commented Jun 6, 2018 •

edited

Loading

pvcastro commented Jun 6, 2018

pvcastro commented Jun 6, 2018

XuezheMax commented Jun 6, 2018

XuezheMax commented Jun 6, 2018

pvcastro commented Jun 6, 2018 •

edited

Loading

XuezheMax commented Jun 6, 2018

pvcastro commented Jun 6, 2018

XuezheMax commented Jun 6, 2018

pvcastro commented Jun 7, 2018 •

edited

Loading

ducalpha commented Jun 7, 2018

pvcastro commented Jun 7, 2018

pvcastro commented Jun 7, 2018

ducalpha commented Jun 7, 2018

Trying to achieve same results as "End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF" paper #13

Trying to achieve same results as "End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF" paper #13

Comments

ayrtondenner commented Apr 30, 2018

XuezheMax commented Apr 30, 2018 via email

ayrtondenner commented Apr 30, 2018

XuezheMax commented Apr 30, 2018 via email

pvcastro commented Jun 6, 2018

XuezheMax commented Jun 6, 2018

pvcastro commented Jun 6, 2018

XuezheMax commented Jun 6, 2018 • edited Loading

pvcastro commented Jun 6, 2018

pvcastro commented Jun 6, 2018

XuezheMax commented Jun 6, 2018

XuezheMax commented Jun 6, 2018

pvcastro commented Jun 6, 2018 • edited Loading

XuezheMax commented Jun 6, 2018

pvcastro commented Jun 6, 2018

XuezheMax commented Jun 6, 2018

pvcastro commented Jun 7, 2018 • edited Loading

ducalpha commented Jun 7, 2018

pvcastro commented Jun 7, 2018

pvcastro commented Jun 7, 2018

ducalpha commented Jun 7, 2018

XuezheMax commented Jun 6, 2018 •

edited

Loading

pvcastro commented Jun 6, 2018 •

edited

Loading

pvcastro commented Jun 7, 2018 •

edited

Loading