-
Notifications
You must be signed in to change notification settings - Fork 39
/
test_output.txt
171 lines (171 loc) · 30.6 KB
/
test_output.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
Starting to train
Iter 0002000 : train nnl 1.387, valid error 075.040%, best valid error 100.000%, average gradient norm 0.269, rho_Whh 1.01, Omega 0.00, alpha 2.000, steps in the past 1.000
Iter 0004000 : train nnl 1.386, valid error 074.710%, best valid error 075.040%, average gradient norm 0.333, rho_Whh 1.01, Omega 0.00, alpha 2.000, steps in the past 1.000
Iter 0006000 : train nnl 1.386, valid error 074.900%, best valid error 074.710%, average gradient norm 0.331, rho_Whh 1.02, Omega 0.00, alpha 2.000, steps in the past 1.000
Iter 0008000 : train nnl 1.385, valid error 067.830%, best valid error 074.710%, average gradient norm 1.385, rho_Whh 1.10, Omega 0.00, alpha 2.000, steps in the past 1.000
Iter 0010000 : train nnl 1.325, valid error 054.170%, best valid error 067.830%, average gradient norm 15.831, rho_Whh 1.08, Omega 0.01, alpha 2.000, steps in the past 1.000
Iter 0012000 : train nnl 1.296, valid error 071.810%, best valid error 054.170%, average gradient norm 20.991, rho_Whh 1.13, Omega 0.01, alpha 2.000, steps in the past 1.000
Iter 0014000 : train nnl 1.323, valid error 075.220%, best valid error 054.170%, average gradient norm 22.791, rho_Whh 1.16, Omega 0.01, alpha 2.000, steps in the past 1.000
Iter 0016000 : train nnl 1.279, valid error 067.610%, best valid error 054.170%, average gradient norm 16.508, rho_Whh 1.13, Omega 0.01, alpha 2.000, steps in the past 1.000
Iter 0018000 : train nnl 1.316, valid error 053.800%, best valid error 054.170%, average gradient norm 20.286, rho_Whh 1.15, Omega 0.02, alpha 2.000, steps in the past 1.000
Iter 0020000 : train nnl 1.281, valid error 071.660%, best valid error 053.800%, average gradient norm 34.118, rho_Whh 1.11, Omega 0.01, alpha 2.000, steps in the past 1.000
Iter 0022000 : train nnl 1.273, valid error 066.620%, best valid error 053.800%, average gradient norm 35.369, rho_Whh 1.21, Omega 0.02, alpha 2.000, steps in the past 1.000
Iter 0024000 : train nnl 1.121, valid error 056.310%, best valid error 053.800%, average gradient norm 46.847, rho_Whh 1.17, Omega 0.03, alpha 2.000, steps in the past 1.000
Iter 0026000 : train nnl 1.001, valid error 050.510%, best valid error 053.800%, average gradient norm 112.692, rho_Whh 1.12, Omega 0.02, alpha 2.000, steps in the past 1.000
Iter 0028000 : train nnl 1.321, valid error 075.680%, best valid error 050.510%, average gradient norm 37.333, rho_Whh 1.16, Omega 0.02, alpha 2.000, steps in the past 1.000
Iter 0030000 : train nnl 1.392, valid error 074.860%, best valid error 050.510%, average gradient norm 2.807, rho_Whh 1.35, Omega 0.01, alpha 2.000, steps in the past 1.000
Iter 0032000 : train nnl 1.397, valid error 073.710%, best valid error 050.510%, average gradient norm 3.406, rho_Whh 1.23, Omega 0.03, alpha 2.000, steps in the past 1.000
Iter 0034000 : train nnl 1.394, valid error 074.300%, best valid error 050.510%, average gradient norm 9.439, rho_Whh 1.69, Omega 0.03, alpha 2.000, steps in the past 1.000
Iter 0036000 : train nnl 1.391, valid error 075.040%, best valid error 050.510%, average gradient norm 1.788, rho_Whh 2.31, Omega 0.05, alpha 2.000, steps in the past 0.997
Iter 0038000 : train nnl 1.391, valid error 075.160%, best valid error 050.510%, average gradient norm 2.445, rho_Whh 2.12, Omega 0.06, alpha 2.000, steps in the past 0.999
Iter 0040000 : train nnl 1.389, valid error 075.330%, best valid error 050.510%, average gradient norm 1.979, rho_Whh 2.11, Omega 0.05, alpha 2.000, steps in the past 0.998
Iter 0042000 : train nnl 1.390, valid error 075.100%, best valid error 050.510%, average gradient norm 2.779, rho_Whh 2.26, Omega 0.03, alpha 2.000, steps in the past 1.000
Iter 0044000 : train nnl 1.394, valid error 075.090%, best valid error 050.510%, average gradient norm 4.129, rho_Whh 2.31, Omega 0.05, alpha 2.000, steps in the past 0.992
Iter 0046000 : train nnl 1.389, valid error 074.640%, best valid error 050.510%, average gradient norm 1.521, rho_Whh 2.30, Omega 0.05, alpha 2.000, steps in the past 1.000
Iter 0048000 : train nnl 1.389, valid error 075.870%, best valid error 050.510%, average gradient norm 1.264, rho_Whh 2.26, Omega 0.03, alpha 2.000, steps in the past 1.000
Iter 0050000 : train nnl 1.389, valid error 074.940%, best valid error 050.510%, average gradient norm 1.292, rho_Whh 2.26, Omega 0.05, alpha 2.000, steps in the past 1.000
Iter 0052000 : train nnl 1.389, valid error 074.960%, best valid error 050.510%, average gradient norm 1.068, rho_Whh 2.16, Omega 0.04, alpha 2.000, steps in the past 1.000
Iter 0054000 : train nnl 1.389, valid error 075.180%, best valid error 050.510%, average gradient norm 1.048, rho_Whh 2.10, Omega 0.06, alpha 2.000, steps in the past 0.999
Iter 0056000 : train nnl 1.389, valid error 074.750%, best valid error 050.510%, average gradient norm 1.508, rho_Whh 1.60, Omega 0.05, alpha 2.000, steps in the past 1.000
Iter 0058000 : train nnl 1.388, valid error 075.680%, best valid error 050.510%, average gradient norm 1.170, rho_Whh 1.52, Omega 0.03, alpha 2.000, steps in the past 1.000
Iter 0060000 : train nnl 1.389, valid error 074.640%, best valid error 050.510%, average gradient norm 1.145, rho_Whh 1.46, Omega 0.02, alpha 2.000, steps in the past 1.000
Iter 0062000 : train nnl 1.388, valid error 075.670%, best valid error 050.510%, average gradient norm 1.128, rho_Whh 1.33, Omega 0.03, alpha 2.000, steps in the past 1.000
Iter 0064000 : train nnl 1.390, valid error 075.010%, best valid error 050.510%, average gradient norm 3.856, rho_Whh 1.34, Omega 0.04, alpha 2.000, steps in the past 1.000
Iter 0066000 : train nnl 1.388, valid error 075.520%, best valid error 050.510%, average gradient norm 1.655, rho_Whh 1.42, Omega 0.03, alpha 2.000, steps in the past 1.000
Iter 0068000 : train nnl 1.389, valid error 073.660%, best valid error 050.510%, average gradient norm 2.965, rho_Whh 1.32, Omega 0.06, alpha 2.000, steps in the past 1.000
Iter 0070000 : train nnl 1.388, valid error 075.350%, best valid error 050.510%, average gradient norm 1.388, rho_Whh 1.27, Omega 0.04, alpha 2.000, steps in the past 1.000
Iter 0072000 : train nnl 1.388, valid error 075.080%, best valid error 050.510%, average gradient norm 1.227, rho_Whh 1.29, Omega 0.03, alpha 2.000, steps in the past 1.000
Iter 0074000 : train nnl 1.388, valid error 075.140%, best valid error 050.510%, average gradient norm 1.108, rho_Whh 1.19, Omega 0.03, alpha 2.000, steps in the past 1.000
Iter 0076000 : train nnl 1.387, valid error 074.100%, best valid error 050.510%, average gradient norm 1.149, rho_Whh 1.11, Omega 0.04, alpha 2.000, steps in the past 1.000
Iter 0078000 : train nnl 1.387, valid error 074.710%, best valid error 050.510%, average gradient norm 1.039, rho_Whh 1.13, Omega 0.02, alpha 2.000, steps in the past 1.000
Iter 0080000 : train nnl 1.387, valid error 073.270%, best valid error 050.510%, average gradient norm 1.241, rho_Whh 1.12, Omega 0.02, alpha 2.000, steps in the past 1.000
Iter 0082000 : train nnl 1.293, valid error 068.330%, best valid error 050.510%, average gradient norm 25.847, rho_Whh 1.11, Omega 0.03, alpha 2.000, steps in the past 1.000
Iter 0084000 : train nnl 1.327, valid error 060.680%, best valid error 050.510%, average gradient norm 11.976, rho_Whh 1.10, Omega 0.02, alpha 2.000, steps in the past 1.000
Iter 0086000 : train nnl 1.387, valid error 075.360%, best valid error 050.510%, average gradient norm 1.243, rho_Whh 1.12, Omega 0.01, alpha 2.000, steps in the past 1.000
Iter 0088000 : train nnl 1.316, valid error 053.630%, best valid error 050.510%, average gradient norm 13.591, rho_Whh 1.19, Omega 0.02, alpha 2.000, steps in the past 1.000
Iter 0090000 : train nnl 1.354, valid error 074.580%, best valid error 050.510%, average gradient norm 11.207, rho_Whh 1.27, Omega 0.02, alpha 2.000, steps in the past 1.000
Iter 0092000 : train nnl 1.389, valid error 074.650%, best valid error 050.510%, average gradient norm 1.563, rho_Whh 1.30, Omega 0.03, alpha 2.000, steps in the past 1.000
Iter 0094000 : train nnl 1.389, valid error 074.380%, best valid error 050.510%, average gradient norm 1.431, rho_Whh 1.50, Omega 0.03, alpha 2.000, steps in the past 0.999
Iter 0096000 : train nnl 1.389, valid error 075.320%, best valid error 050.510%, average gradient norm 3.632, rho_Whh 1.21, Omega 0.03, alpha 2.000, steps in the past 1.000
Iter 0098000 : train nnl 1.387, valid error 072.080%, best valid error 050.510%, average gradient norm 1.734, rho_Whh 1.11, Omega 0.02, alpha 2.000, steps in the past 1.000
Iter 0100000 : train nnl 1.388, valid error 074.930%, best valid error 050.510%, average gradient norm 1.589, rho_Whh 1.10, Omega 0.02, alpha 2.000, steps in the past 1.000
Iter 0102000 : train nnl 1.365, valid error 051.330%, best valid error 050.510%, average gradient norm 6.096, rho_Whh 1.17, Omega 0.02, alpha 2.000, steps in the past 1.000
Iter 0104000 : train nnl 1.246, valid error 051.420%, best valid error 050.510%, average gradient norm 62.306, rho_Whh 1.25, Omega 0.03, alpha 2.000, steps in the past 1.000
Iter 0106000 : train nnl 1.111, valid error 038.540%, best valid error 050.510%, average gradient norm 97.047, rho_Whh 1.27, Omega 0.02, alpha 2.000, steps in the past 1.000
Iter 0108000 : train nnl 1.331, valid error 074.640%, best valid error 038.540%, average gradient norm 21.407, rho_Whh 1.21, Omega 0.03, alpha 2.000, steps in the past 1.000
Iter 0110000 : train nnl 1.319, valid error 049.410%, best valid error 038.540%, average gradient norm 27.153, rho_Whh 1.16, Omega 0.02, alpha 2.000, steps in the past 1.000
Iter 0112000 : train nnl 1.321, valid error 077.940%, best valid error 038.540%, average gradient norm 46.728, rho_Whh 1.28, Omega 0.02, alpha 2.000, steps in the past 1.000
Iter 0114000 : train nnl 1.410, valid error 075.070%, best valid error 038.540%, average gradient norm 7.069, rho_Whh 1.42, Omega 0.02, alpha 2.000, steps in the past 0.999
Iter 0116000 : train nnl 1.389, valid error 075.590%, best valid error 038.540%, average gradient norm 1.325, rho_Whh 1.47, Omega 0.02, alpha 2.000, steps in the past 1.000
Iter 0118000 : train nnl 1.389, valid error 074.670%, best valid error 038.540%, average gradient norm 1.278, rho_Whh 1.54, Omega 0.03, alpha 2.000, steps in the past 1.000
Iter 0120000 : train nnl 1.390, valid error 075.230%, best valid error 038.540%, average gradient norm 1.174, rho_Whh 1.62, Omega 0.04, alpha 2.000, steps in the past 1.000
Iter 0122000 : train nnl 1.390, valid error 075.090%, best valid error 038.540%, average gradient norm 1.335, rho_Whh 1.72, Omega 0.03, alpha 2.000, steps in the past 1.000
Iter 0124000 : train nnl 1.390, valid error 075.040%, best valid error 038.540%, average gradient norm 1.431, rho_Whh 1.78, Omega 0.03, alpha 2.000, steps in the past 1.000
Iter 0126000 : train nnl 1.390, valid error 075.060%, best valid error 038.540%, average gradient norm 1.322, rho_Whh 1.82, Omega 0.02, alpha 2.000, steps in the past 1.000
Iter 0128000 : train nnl 1.389, valid error 074.600%, best valid error 038.540%, average gradient norm 1.289, rho_Whh 1.85, Omega 0.02, alpha 2.000, steps in the past 1.000
Iter 0130000 : train nnl 1.389, valid error 075.380%, best valid error 038.540%, average gradient norm 1.996, rho_Whh 1.87, Omega 0.04, alpha 2.000, steps in the past 1.000
Iter 0132000 : train nnl 1.391, valid error 075.040%, best valid error 038.540%, average gradient norm 1.747, rho_Whh 1.92, Omega 0.04, alpha 2.000, steps in the past 1.000
Iter 0134000 : train nnl 1.389, valid error 075.300%, best valid error 038.540%, average gradient norm 1.155, rho_Whh 1.92, Omega 0.03, alpha 2.000, steps in the past 1.000
Iter 0136000 : train nnl 1.389, valid error 074.400%, best valid error 038.540%, average gradient norm 1.471, rho_Whh 1.92, Omega 0.05, alpha 2.000, steps in the past 1.000
Iter 0138000 : train nnl 1.389, valid error 074.260%, best valid error 038.540%, average gradient norm 1.437, rho_Whh 1.87, Omega 0.04, alpha 2.000, steps in the past 1.000
Iter 0140000 : train nnl 1.389, valid error 074.730%, best valid error 038.540%, average gradient norm 1.225, rho_Whh 1.90, Omega 0.03, alpha 2.000, steps in the past 1.000
Iter 0142000 : train nnl 1.389, valid error 075.230%, best valid error 038.540%, average gradient norm 1.171, rho_Whh 1.89, Omega 0.03, alpha 2.000, steps in the past 1.000
Iter 0144000 : train nnl 1.393, valid error 075.130%, best valid error 038.540%, average gradient norm 1.785, rho_Whh 1.82, Omega 0.05, alpha 2.000, steps in the past 1.000
Iter 0146000 : train nnl 1.389, valid error 075.030%, best valid error 038.540%, average gradient norm 1.461, rho_Whh 1.85, Omega 0.05, alpha 2.000, steps in the past 1.000
Iter 0148000 : train nnl 1.389, valid error 075.160%, best valid error 038.540%, average gradient norm 1.266, rho_Whh 1.84, Omega 0.03, alpha 2.000, steps in the past 1.000
Iter 0150000 : train nnl 1.389, valid error 076.050%, best valid error 038.540%, average gradient norm 1.736, rho_Whh 1.65, Omega 0.10, alpha 2.000, steps in the past 0.990
Iter 0152000 : train nnl 1.400, valid error 075.000%, best valid error 038.540%, average gradient norm 33366.897, rho_Whh 1.63, Omega 0.22, alpha 2.000, steps in the past 0.998
Iter 0154000 : train nnl 1.389, valid error 075.240%, best valid error 038.540%, average gradient norm 1.427, rho_Whh 1.56, Omega 0.05, alpha 2.000, steps in the past 1.000
Iter 0156000 : train nnl 1.389, valid error 075.070%, best valid error 038.540%, average gradient norm 1.458, rho_Whh 1.36, Omega 0.04, alpha 2.000, steps in the past 1.000
Iter 0158000 : train nnl 1.389, valid error 075.320%, best valid error 038.540%, average gradient norm 3.602, rho_Whh 1.57, Omega 0.04, alpha 2.000, steps in the past 1.000
Iter 0160000 : train nnl 1.389, valid error 074.440%, best valid error 038.540%, average gradient norm 1.238, rho_Whh 1.55, Omega 0.04, alpha 2.000, steps in the past 1.000
Iter 0162000 : train nnl 1.390, valid error 074.440%, best valid error 038.540%, average gradient norm 2.392, rho_Whh 1.60, Omega 0.03, alpha 2.000, steps in the past 1.000
Iter 0164000 : train nnl 1.389, valid error 075.240%, best valid error 038.540%, average gradient norm 1.246, rho_Whh 1.61, Omega 0.04, alpha 2.000, steps in the past 1.000
Iter 0166000 : train nnl 1.392, valid error 075.680%, best valid error 038.540%, average gradient norm 55.340, rho_Whh 1.69, Omega 0.06, alpha 2.000, steps in the past 1.000
Iter 0168000 : train nnl 1.389, valid error 074.780%, best valid error 038.540%, average gradient norm 1.117, rho_Whh 1.71, Omega 0.04, alpha 2.000, steps in the past 1.000
Iter 0170000 : train nnl 1.389, valid error 074.870%, best valid error 038.540%, average gradient norm 15.982, rho_Whh 1.65, Omega 0.05, alpha 2.000, steps in the past 1.000
Iter 0172000 : train nnl 1.389, valid error 075.170%, best valid error 038.540%, average gradient norm 1.236, rho_Whh 1.65, Omega 0.05, alpha 2.000, steps in the past 1.000
Iter 0174000 : train nnl 1.389, valid error 076.200%, best valid error 038.540%, average gradient norm 1.137, rho_Whh 1.61, Omega 0.04, alpha 2.000, steps in the past 1.000
Iter 0176000 : train nnl 1.389, valid error 074.750%, best valid error 038.540%, average gradient norm 1.327, rho_Whh 1.63, Omega 0.05, alpha 2.000, steps in the past 1.000
Iter 0178000 : train nnl 1.389, valid error 073.480%, best valid error 038.540%, average gradient norm 1.317, rho_Whh 1.57, Omega 0.05, alpha 2.000, steps in the past 1.000
Iter 0180000 : train nnl 1.389, valid error 074.930%, best valid error 038.540%, average gradient norm 1.121, rho_Whh 1.61, Omega 0.05, alpha 2.000, steps in the past 1.000
Iter 0182000 : train nnl 1.390, valid error 074.660%, best valid error 038.540%, average gradient norm 3.057, rho_Whh 1.53, Omega 0.08, alpha 2.000, steps in the past 1.000
Iter 0184000 : train nnl 1.389, valid error 075.290%, best valid error 038.540%, average gradient norm 1.109, rho_Whh 1.51, Omega 0.04, alpha 2.000, steps in the past 1.000
Iter 0186000 : train nnl 1.389, valid error 074.970%, best valid error 038.540%, average gradient norm 1.107, rho_Whh 1.49, Omega 0.03, alpha 2.000, steps in the past 1.000
Iter 0188000 : train nnl 1.389, valid error 075.560%, best valid error 038.540%, average gradient norm 1.112, rho_Whh 1.46, Omega 0.03, alpha 2.000, steps in the past 1.000
Iter 0190000 : train nnl 1.389, valid error 075.110%, best valid error 038.540%, average gradient norm 1.175, rho_Whh 1.52, Omega 0.04, alpha 2.000, steps in the past 1.000
Iter 0192000 : train nnl 1.389, valid error 074.770%, best valid error 038.540%, average gradient norm 1.088, rho_Whh 1.51, Omega 0.03, alpha 2.000, steps in the past 1.000
Iter 0194000 : train nnl 1.389, valid error 075.090%, best valid error 038.540%, average gradient norm 1.885, rho_Whh 1.48, Omega 0.04, alpha 2.000, steps in the past 1.000
Iter 0196000 : train nnl 1.389, valid error 074.850%, best valid error 038.540%, average gradient norm 1.124, rho_Whh 1.55, Omega 0.05, alpha 2.000, steps in the past 1.000
Iter 0198000 : train nnl 1.389, valid error 074.470%, best valid error 038.540%, average gradient norm 1.171, rho_Whh 1.56, Omega 0.03, alpha 2.000, steps in the past 1.000
Iter 0200000 : train nnl 1.389, valid error 074.830%, best valid error 038.540%, average gradient norm 11.755, rho_Whh 1.52, Omega 0.04, alpha 2.000, steps in the past 1.000
Iter 0202000 : train nnl 1.389, valid error 074.950%, best valid error 038.540%, average gradient norm 1.076, rho_Whh 1.44, Omega 0.04, alpha 2.000, steps in the past 1.000
Iter 0204000 : train nnl 1.393, valid error 074.360%, best valid error 038.540%, average gradient norm 17432.660, rho_Whh 1.32, Omega 0.10, alpha 2.000, steps in the past 1.000
Iter 0206000 : train nnl 1.397, valid error 074.900%, best valid error 038.540%, average gradient norm 177.243, rho_Whh 1.33, Omega 0.14, alpha 2.000, steps in the past 1.000
Iter 0208000 : train nnl 1.389, valid error 075.070%, best valid error 038.540%, average gradient norm 4.297, rho_Whh 1.40, Omega 0.14, alpha 2.000, steps in the past 0.999
Iter 0210000 : train nnl 1.389, valid error 074.420%, best valid error 038.540%, average gradient norm 5.435, rho_Whh 1.30, Omega 0.12, alpha 2.000, steps in the past 1.000
Iter 0212000 : train nnl 1.389, valid error 074.940%, best valid error 038.540%, average gradient norm 2.226, rho_Whh 1.24, Omega 0.09, alpha 2.000, steps in the past 1.000
Iter 0214000 : train nnl 1.388, valid error 075.300%, best valid error 038.540%, average gradient norm 1.643, rho_Whh 1.32, Omega 0.06, alpha 2.000, steps in the past 1.000
Iter 0216000 : train nnl 1.388, valid error 075.100%, best valid error 038.540%, average gradient norm 2.605, rho_Whh 1.41, Omega 0.04, alpha 2.000, steps in the past 1.000
Iter 0218000 : train nnl 1.388, valid error 074.770%, best valid error 038.540%, average gradient norm 1.559, rho_Whh 1.40, Omega 0.04, alpha 2.000, steps in the past 1.000
Iter 0220000 : train nnl 1.389, valid error 074.150%, best valid error 038.540%, average gradient norm 1.132, rho_Whh 1.44, Omega 0.03, alpha 2.000, steps in the past 1.000
Iter 0222000 : train nnl 1.388, valid error 074.960%, best valid error 038.540%, average gradient norm 1.305, rho_Whh 1.44, Omega 0.03, alpha 2.000, steps in the past 1.000
Iter 0224000 : train nnl 1.388, valid error 075.300%, best valid error 038.540%, average gradient norm 1.200, rho_Whh 1.42, Omega 0.03, alpha 2.000, steps in the past 1.000
Iter 0226000 : train nnl 1.389, valid error 075.230%, best valid error 038.540%, average gradient norm 1.420, rho_Whh 1.42, Omega 0.03, alpha 2.000, steps in the past 1.000
Iter 0228000 : train nnl 1.389, valid error 074.380%, best valid error 038.540%, average gradient norm 2.004, rho_Whh 1.49, Omega 0.04, alpha 2.000, steps in the past 1.000
Iter 0230000 : train nnl 1.388, valid error 075.210%, best valid error 038.540%, average gradient norm 1.379, rho_Whh 1.41, Omega 0.03, alpha 2.000, steps in the past 1.000
Iter 0232000 : train nnl 1.388, valid error 075.400%, best valid error 038.540%, average gradient norm 1.215, rho_Whh 1.42, Omega 0.02, alpha 2.000, steps in the past 1.000
Iter 0234000 : train nnl 1.388, valid error 074.940%, best valid error 038.540%, average gradient norm 1.486, rho_Whh 1.36, Omega 0.03, alpha 2.000, steps in the past 1.000
Iter 0236000 : train nnl 1.388, valid error 074.480%, best valid error 038.540%, average gradient norm 1.564, rho_Whh 1.45, Omega 0.04, alpha 2.000, steps in the past 1.000
Iter 0238000 : train nnl 1.388, valid error 075.420%, best valid error 038.540%, average gradient norm 1.262, rho_Whh 1.54, Omega 0.02, alpha 2.000, steps in the past 1.000
Iter 0240000 : train nnl 1.388, valid error 074.940%, best valid error 038.540%, average gradient norm 12.842, rho_Whh 1.49, Omega 0.05, alpha 2.000, steps in the past 1.000
Iter 0242000 : train nnl 1.347, valid error 050.290%, best valid error 038.540%, average gradient norm 15.670, rho_Whh 1.44, Omega 0.04, alpha 2.000, steps in the past 1.000
Iter 0244000 : train nnl 1.371, valid error 075.770%, best valid error 038.540%, average gradient norm 1832.496, rho_Whh 1.42, Omega 0.11, alpha 2.000, steps in the past 0.999
Iter 0246000 : train nnl 1.392, valid error 075.430%, best valid error 038.540%, average gradient norm 295.393, rho_Whh 1.44, Omega 0.09, alpha 2.000, steps in the past 0.995
Iter 0248000 : train nnl 1.380, valid error 075.750%, best valid error 038.540%, average gradient norm 43.067, rho_Whh 1.54, Omega 0.05, alpha 2.000, steps in the past 1.000
Iter 0250000 : train nnl 1.368, valid error 051.010%, best valid error 038.540%, average gradient norm 34.774, rho_Whh 1.38, Omega 0.06, alpha 2.000, steps in the past 1.000
Iter 0252000 : train nnl 1.171, valid error 048.320%, best valid error 038.540%, average gradient norm 139.939, rho_Whh 1.40, Omega 0.06, alpha 2.000, steps in the past 1.000
Iter 0254000 : train nnl 1.399, valid error 075.360%, best valid error 038.540%, average gradient norm 115.323, rho_Whh 1.43, Omega 0.10, alpha 2.000, steps in the past 1.000
Iter 0256000 : train nnl 1.394, valid error 075.530%, best valid error 038.540%, average gradient norm 12.264, rho_Whh 1.27, Omega 0.09, alpha 2.000, steps in the past 1.000
Iter 0258000 : train nnl 1.390, valid error 075.270%, best valid error 038.540%, average gradient norm 8.261, rho_Whh 1.20, Omega 0.07, alpha 2.000, steps in the past 1.000
Iter 0260000 : train nnl 1.356, valid error 049.500%, best valid error 038.540%, average gradient norm 7.125, rho_Whh 1.27, Omega 0.04, alpha 2.000, steps in the past 1.000
Iter 0262000 : train nnl 0.941, valid error 025.100%, best valid error 038.540%, average gradient norm 101.063, rho_Whh 1.43, Omega 0.04, alpha 2.000, steps in the past 1.000
Iter 0264000 : train nnl 0.760, valid error 047.860%, best valid error 025.100%, average gradient norm 167.093, rho_Whh 1.43, Omega 0.05, alpha 2.000, steps in the past 1.000
Iter 0266000 : train nnl 1.061, valid error 024.570%, best valid error 025.100%, average gradient norm 173.773, rho_Whh 1.32, Omega 0.06, alpha 2.000, steps in the past 1.000
Iter 0268000 : train nnl 0.912, valid error 075.720%, best valid error 024.570%, average gradient norm 132.310, rho_Whh 1.29, Omega 0.06, alpha 2.000, steps in the past 1.000
Iter 0270000 : train nnl 1.396, valid error 073.410%, best valid error 024.570%, average gradient norm 11.085, rho_Whh 1.21, Omega 0.07, alpha 2.000, steps in the past 1.000
Iter 0272000 : train nnl 0.814, valid error 025.060%, best valid error 024.570%, average gradient norm 94.562, rho_Whh 1.24, Omega 0.04, alpha 2.000, steps in the past 1.000
Iter 0274000 : train nnl 0.890, valid error 011.740%, best valid error 024.570%, average gradient norm 109.446, rho_Whh 1.23, Omega 0.04, alpha 2.000, steps in the past 1.000
Iter 0276000 : train nnl 0.995, valid error 028.280%, best valid error 011.740%, average gradient norm 137.196, rho_Whh 1.26, Omega 0.05, alpha 2.000, steps in the past 1.000
Iter 0278000 : train nnl 0.750, valid error 025.480%, best valid error 011.740%, average gradient norm 92.293, rho_Whh 1.21, Omega 0.06, alpha 2.000, steps in the past 1.000
Iter 0280000 : train nnl 0.565, valid error 000.500%, best valid error 011.740%, average gradient norm 92.235, rho_Whh 1.25, Omega 0.04, alpha 2.000, steps in the past 1.000
Iter 0282000 : train nnl 0.684, valid error 022.660%, best valid error 000.500%, average gradient norm 241.501, rho_Whh 1.26, Omega 0.04, alpha 2.000, steps in the past 1.000
Iter 0284000 : train nnl 0.446, valid error 002.460%, best valid error 000.500%, average gradient norm 60.607, rho_Whh 1.24, Omega 0.04, alpha 2.000, steps in the past 1.000
Iter 0286000 : train nnl 1.399, valid error 078.460%, best valid error 000.500%, average gradient norm 17.738, rho_Whh 1.15, Omega 0.04, alpha 2.000, steps in the past 1.000
Iter 0288000 : train nnl 0.722, valid error 007.090%, best valid error 000.500%, average gradient norm 135.009, rho_Whh 1.24, Omega 0.03, alpha 2.000, steps in the past 1.000
Iter 0290000 : train nnl 0.745, valid error 074.130%, best valid error 000.500%, average gradient norm 184.128, rho_Whh 1.25, Omega 0.03, alpha 2.000, steps in the past 1.000
Iter 0292000 : train nnl 0.662, valid error 001.500%, best valid error 000.500%, average gradient norm 121.419, rho_Whh 1.22, Omega 0.03, alpha 2.000, steps in the past 1.000
Iter 0294000 : train nnl 0.678, valid error 001.390%, best valid error 000.500%, average gradient norm 152.395, rho_Whh 1.23, Omega 0.03, alpha 2.000, steps in the past 1.000
Iter 0296000 : train nnl 0.756, valid error 000.440%, best valid error 000.500%, average gradient norm 85.208, rho_Whh 1.23, Omega 0.03, alpha 2.000, steps in the past 1.000
Iter 0298000 : train nnl 0.615, valid error 052.310%, best valid error 000.440%, average gradient norm 128.243, rho_Whh 1.18, Omega 0.03, alpha 2.000, steps in the past 1.000
Iter 0300000 : train nnl 0.932, valid error 002.580%, best valid error 000.440%, average gradient norm 134.267, rho_Whh 1.17, Omega 0.03, alpha 2.000, steps in the past 1.000
Iter 0302000 : train nnl 1.119, valid error 029.010%, best valid error 000.440%, average gradient norm 78.948, rho_Whh 1.24, Omega 0.04, alpha 2.000, steps in the past 1.000
Iter 0304000 : train nnl 0.621, valid error 008.110%, best valid error 000.440%, average gradient norm 66.588, rho_Whh 1.23, Omega 0.04, alpha 2.000, steps in the past 1.000
Iter 0306000 : train nnl 0.628, valid error 011.950%, best valid error 000.440%, average gradient norm 129.213, rho_Whh 1.27, Omega 0.04, alpha 2.000, steps in the past 1.000
Iter 0308000 : train nnl 0.733, valid error 039.140%, best valid error 000.440%, average gradient norm 203.652, rho_Whh 1.25, Omega 0.04, alpha 2.000, steps in the past 1.000
Iter 0310000 : train nnl 0.592, valid error 005.560%, best valid error 000.440%, average gradient norm 187.311, rho_Whh 1.25, Omega 0.03, alpha 2.000, steps in the past 1.000
Iter 0312000 : train nnl 0.563, valid error 007.380%, best valid error 000.440%, average gradient norm 276.705, rho_Whh 1.27, Omega 0.03, alpha 2.000, steps in the past 1.000
Iter 0314000 : train nnl 0.357, valid error 009.390%, best valid error 000.440%, average gradient norm 143.366, rho_Whh 1.27, Omega 0.03, alpha 2.000, steps in the past 1.000
Iter 0316000 : train nnl 0.550, valid error 011.060%, best valid error 000.440%, average gradient norm 197.828, rho_Whh 1.29, Omega 0.03, alpha 2.000, steps in the past 1.000
Iter 0318000 : train nnl 0.389, valid error 004.890%, best valid error 000.440%, average gradient norm 186.710, rho_Whh 1.27, Omega 0.03, alpha 2.000, steps in the past 1.000
Iter 0320000 : train nnl 0.741, valid error 018.470%, best valid error 000.440%, average gradient norm 201.806, rho_Whh 1.26, Omega 0.03, alpha 2.000, steps in the past 1.000
Iter 0322000 : train nnl 0.785, valid error 016.100%, best valid error 000.440%, average gradient norm 263.493, rho_Whh 1.26, Omega 0.03, alpha 2.000, steps in the past 0.999
Iter 0324000 : train nnl 0.749, valid error 005.420%, best valid error 000.440%, average gradient norm 233.453, rho_Whh 1.26, Omega 0.03, alpha 2.000, steps in the past 1.000
Iter 0326000 : train nnl 0.609, valid error 000.690%, best valid error 000.440%, average gradient norm 203.653, rho_Whh 1.24, Omega 0.03, alpha 2.000, steps in the past 1.000
Iter 0328000 : train nnl 0.634, valid error 001.310%, best valid error 000.440%, average gradient norm 272.219, rho_Whh 1.26, Omega 0.03, alpha 2.000, steps in the past 1.000
Iter 0330000 : train nnl 0.508, valid error 002.960%, best valid error 000.440%, average gradient norm 182.485, rho_Whh 1.27, Omega 0.03, alpha 2.000, steps in the past 1.000
Iter 0332000 : train nnl 0.501, valid error 007.780%, best valid error 000.440%, average gradient norm 125.004, rho_Whh 1.26, Omega 0.03, alpha 2.000, steps in the past 1.000
Iter 0334000 : train nnl 1.226, valid error 072.480%, best valid error 000.440%, average gradient norm 109.744, rho_Whh 1.22, Omega 0.03, alpha 2.000, steps in the past 1.000
Iter 0336000 : train nnl 1.033, valid error 000.000%, best valid error 000.440%, average gradient norm 776.123, rho_Whh 1.24, Omega 0.03, alpha 2.000, steps in the past 1.000
**> Iter 0336000 : train nnl 1.033 valid error 000.000% best valid error 000.000% average gradient norm 776.123 rho_Whh 1.24 Omega 0.03 alpha 2.000 steps in the past 1.000
!!!!! STOPING - Problem solved