Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weekly update #1

Merged
merged 25 commits into from
Oct 1, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
73fd3ea
Added a semi-colon to the example code.
mayhewsw Aug 22, 2018
2f71141
Added check in View.addConstituent that doesn't allow duplicate const…
mayhewsw Aug 22, 2018
56fef85
Configuration parameters, gazetteers and brown clusters are no long s…
Sep 2, 2018
1e896a6
Next version.
Sep 5, 2018
653e4ff
Added a modicum of error handling to bits of code around what I touched.
Sep 5, 2018
7d9dad3
Merge pull request #692 from cowchipkid/master
mssammon Sep 5, 2018
d667377
Fixed several bugs in OntonotesReader. Allows ignoring begin document…
mayhewsw Sep 17, 2018
2e2c96a
Don't allow empty view.
mayhewsw Sep 17, 2018
398753b
Fixed some bugs in reading, including a bug that does not clear sente…
mayhewsw Sep 20, 2018
f6b9147
There was a duplicate nchunk in this file as in <nchunk><nchunk>...</…
mayhewsw Sep 20, 2018
76733a0
Modified the test file to match the updated data file.
mayhewsw Sep 20, 2018
9b6618e
Major changes related to adding duplicate constituents. Added also a …
mayhewsw Sep 20, 2018
0df7f46
Merge remote-tracking branch 'upstream/master'
mayhewsw Sep 20, 2018
db14613
fix train/test scripts for NER
mayhewsw Sep 20, 2018
b8fcd80
Fix code that is repeated.
mayhewsw Sep 20, 2018
56cb0c1
readme update for memory usage of Verb SRL in pipeline (close #656)
qiangning Sep 24, 2018
b934e9e
fix #685: when retraining chunker, should forget first; otherwise, it…
qiangning Sep 24, 2018
a9449cd
Chunker model retrained (v4.0.12) with improved performance
qiangning Sep 24, 2018
bfdafde
Merge branch 'master' of github.com:qiangning/illinois-cogcomp-nlp
qiangning Sep 24, 2018
787e100
testRefOut.txt updated after Chunker model retrained (v4.0.12)
qiangning Sep 24, 2018
46a2150
removed redundant and misleading test files
qiangning Sep 25, 2018
c2b032c
removed redundant and misleading test files
qiangning Sep 25, 2018
daf593d
Merge pull request #684 from mayhewsw/master
Sep 25, 2018
ee8332f
move output and diff files generated during testing out of the test r…
qiangning Sep 27, 2018
738c990
Merge pull request #694 from qiangning/master
Sep 27, 2018
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions big-data-utils/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
<parent>
<artifactId>illinois-cogcomp-nlp</artifactId>
<groupId>edu.illinois.cs.cogcomp</groupId>
<version>4.0.12</version>
<version>4.0.13</version>
</parent>

<modelVersion>4.0.0</modelVersion>
Expand All @@ -23,7 +23,7 @@
<dependency>
<groupId>edu.illinois.cs.cogcomp</groupId>
<artifactId>illinois-core-utilities</artifactId>
<version>4.0.12</version>
<version>4.0.13</version>
</dependency>
<dependency>
<groupId>org.xeustechnologies.google-api</groupId>
Expand Down
54 changes: 26 additions & 28 deletions chunker/doc/performance.txt
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
Date: 10/20/2016
Date: 09/23/2018
Tested: Qiang (John) Ning
Contact: qning2@illinois.edu

Chunker model version: illinois-chunker-model-3.0.77
Trainset: /shared/corpora/corporaWeb/written/eng/chunking/conll2000distributions/train.txt (trained with 50 iterations.)
Chunker model version: illinois-chunker-model-4.0.12
Trainset: /shared/corpora/corporaWeb/written/eng/chunking/conll2000distributions/train.txt (trained with 11 iterations.)
Testset:
Gold POS: /shared/corpora/corporaWeb/written/eng/chunking/conll2000distributions/test.txt
No POS: /shared/corpora/corporaWeb/written/eng/chunking/conll2000distributions/test.noPOS.txt
Expand All @@ -12,39 +12,37 @@ Performance:
With Gold POS
Label Precision Recall F1 LCount PCount
----------------------------------------------
ADJP 76.633 69.635 72.967 438 398
ADVP 81.862 79.215 80.516 866 838
CONJP 45.455 55.556 50.000 9 11
INTJ 50.000 50.000 50.000 2 2
ADJP 78.000 71.233 74.463 438 400
ADVP 82.262 79.792 81.008 866 840
CONJP 50.000 55.556 52.632 9 10
INTJ 100.000 50.000 66.667 2 1
LST 0.000 0.000 0.000 5 1
NP 94.106 93.962 94.034 12422 12403
PP 96.770 97.776 97.270 4811 4861
PRT 72.072 75.472 73.733 106 111
SBAR 88.280 87.290 87.782 535 529
UCP 0.000 0.000 0.000 0 5
VP 93.416 93.517 93.466 4658 4663
NP 94.051 94.051 94.051 12422 12422
PP 96.694 97.880 97.283 4811 4870
PRT 73.394 75.472 74.419 106 109
SBAR 87.902 86.916 87.406 535 529
VP 93.845 93.946 93.896 4658 4663
----------------------------------------------
O 0.000 0.000 0.000 1244 1274
O 0.000 0.000 0.000 1214 1221
----------------------------------------------
Overall 93.510 93.393 93.451 23852 23822
Accuracy 88.763 - - - 25096
Overall 93.613 93.585 93.599 23852 23845
Accuracy 89.053 - - - 25066

With NO POS
Label Precision Recall F1 LCount PCount
----------------------------------------------
ADJP 78.608 69.635 73.850 438 388
ADVP 80.427 78.291 79.345 866 843
CONJP 45.455 55.556 50.000 9 11
ADJP 80.051 72.374 76.019 438 396
ADVP 80.806 78.753 79.766 866 844
CONJP 50.000 55.556 52.632 9 10
INTJ 100.000 50.000 66.667 2 1
LST 0.000 0.000 0.000 5 0
NP 94.193 94.019 94.106 12422 12399
PP 96.656 97.942 97.295 4811 4875
PRT 60.417 82.075 69.600 106 144
SBAR 86.813 88.598 87.697 535 546
UCP 0.000 0.000 0.000 0 4
VP 94.105 94.246 94.176 4658 4665
NP 94.224 94.156 94.190 12422 12413
PP 96.540 98.005 97.267 4811 4884
PRT 64.444 82.075 72.199 106 135
SBAR 86.900 88.037 87.465 535 542
VP 94.427 94.568 94.497 4658 4665
----------------------------------------------
O 0.000 0.000 0.000 1231 1207
O 0.000 0.000 0.000 1199 1161
----------------------------------------------
Overall 93.529 93.623 93.576 23852 23876
Accuracy 89.028 - - - 25083
Overall 93.675 93.824 93.750 23852 23890
Accuracy 89.334 - - - 25051
10 changes: 5 additions & 5 deletions chunker/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
<parent>
<artifactId>illinois-cogcomp-nlp</artifactId>
<groupId>edu.illinois.cs.cogcomp</groupId>
<version>4.0.12</version>
<version>4.0.13</version>
</parent>

<modelVersion>4.0.0</modelVersion>
Expand All @@ -13,7 +13,7 @@
<dependency>
<groupId>edu.illinois.cs.cogcomp</groupId>
<artifactId>illinois-core-utilities</artifactId>
<version>4.0.12</version>
<version>4.0.13</version>
</dependency>

<dependency>
Expand All @@ -24,12 +24,12 @@
<dependency>
<groupId>edu.illinois.cs.cogcomp</groupId>
<artifactId>LBJava-NLP-tools</artifactId>
<version>4.0.12</version>
<version>4.0.13</version>
</dependency>
<dependency>
<groupId>edu.illinois.cs.cogcomp</groupId>
<artifactId>illinois-pos</artifactId>
<version>4.0.12</version>
<version>4.0.13</version>
</dependency>
<dependency>
<groupId>edu.illinois.cs.cogcomp</groupId>
Expand All @@ -39,7 +39,7 @@
<dependency>
<groupId>edu.illinois.cs.cogcomp</groupId>
<artifactId>illinois-chunker-model</artifactId>
<version>3.0.77</version>
<version>4.0.12</version>
</dependency>
<dependency>
<groupId>edu.illinois.cs.cogcomp</groupId>
Expand Down
4 changes: 2 additions & 2 deletions chunker/scripts/mvn_demo.sh
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#!/usr/bin/env bash
TESTFILE=test/testIn.txt
OUTFILE=test/testOut.txt
TESTFILE=src/test/resources/testIn.txt
OUTFILE=testOut.txt

mvn exec:java -Dexec.mainClass=edu.illinois.cs.cogcomp.chunker.main.ChunkerDemo -Dexec.args="$TESTFILE $OUTFILE"

Expand Down
2 changes: 1 addition & 1 deletion chunker/scripts/mvn_test_conll.sh
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#!/usr/bin/env bash
TESTFILE=test/testCoNLL.txt
TESTFILE=src/test/resources/testCoNLL.txt

# Use the default chunker model
if [ $# -eq 0 ]; then
Expand Down
12 changes: 7 additions & 5 deletions chunker/scripts/mvn_validate.sh
Original file line number Diff line number Diff line change
@@ -1,19 +1,21 @@
TESTFILE=test/testIn.txt
OUTFILE=test/testOut.txt
REFFILE=test/testRefOut.txt
#!/usr/bin/env bash
TESTFILE=src/test/resources/testIn.txt
OUTFILE=testOut.txt
REFFILE=src/test/resources/testRefOut-demo.txt

mvn exec:java -Dexec.mainClass=edu.illinois.cs.cogcomp.chunker.main.ChunkerDemo -Dexec.args="$TESTFILE $OUTFILE"

DIFFFILE=test/testDiff.txt
DIFFFILE=testDiff.txt
rm -f ${DIFFFILE}
diff $REFFILE $OUTFILE > $DIFFFILE

if [ -e ${DIFFFILE} ]; then
if [ -s ${DIFFFILE} ]; then
echo "$0: *** TEST FAILED ***: Differences found between new output and reference output. See $DIFFFILE for details."
echo "$0: *** TEST FAILED ***: Differences found between new output and reference output. See $OUTFILE and $DIFFFILE for details."
else
echo "$0: Test passed: no difference between new output and reference output."
rm -f $DIFFFILE
rm -f $OUTFILE
fi
else
echo "$0: Error: couldn't find the diff file '$DIFFFILE'."
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@ public void trainModels(String trainingData, String modeldir, String modelname,
*/
public void trainModelsWithParser(Parser parser) {
Chunker.isTraining = true;

chunker.forget();
// Run the learner
for (int i = 1; i <= iter; i++) {
LinkedVector ex;
Expand All @@ -97,6 +97,7 @@ public void trainModelsWithParser(Parser parser) {

public void trainModelsWithParser(Parser parser, String modeldir, String modelname, double dev_ratio) {
Chunker.isTraining = true;
chunker.forget();
double tmpF1 = 0;
double bestF1 = 0;
int bestIter = 0;
Expand All @@ -107,16 +108,11 @@ public void trainModelsWithParser(Parser parser, String modeldir, String modelna
// Get the total number of training set
int cnt = 0;
LinkedVector ex;
while ((ex = (LinkedVector) parser.next()) != null) {
cnt++;
}
while (parser.next() != null) cnt++;
parser.reset();
// Get the boundary between train and dev
dev_ratio = Math.min(1,Math.max(dev_ratio,0));
long idx = Math.round(cnt*(1-dev_ratio));
if( idx < 0 )
idx = 0;
if( idx > cnt )
idx = cnt;

// Run the learner and save F1 for each iteration
for (int i = 1; i <= iter; i++) {
Expand All @@ -125,10 +121,8 @@ public void trainModelsWithParser(Parser parser, String modeldir, String modelna
for (int j = 0; j < ex.size(); j++) {
chunker.learn(ex.get(j));
}
if(cnt>=idx)
break;
else
cnt++;
if(cnt>=idx) break;
cnt++;
}
chunker.doneWithRound();
writeModelsToDisk(modeldir,modelname);
Expand All @@ -153,6 +147,7 @@ public void trainModelsWithParser(Parser parser, String modeldir, String modelna
System.out.println("Best #Iter = "+bestIter+" (F1="+bestF1+")");
System.out.println("Rerun the learner using best #Iter...");
// Rerun the learner
chunker.forget();
for (int i = 1; i <= bestIter; i++) {
while ((ex = (LinkedVector) parser.next()) != null) {
for (int j = 0; j < ex.size(); j++) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@
public class TestDiff {
private static final String testFileName = "testIn.txt";
private static String testFile;
private static final String refFileName = "testRefOutput.txt";
private static final String refFileName = "testRefOut.txt";
private static List<String> refSentences;

@Before
Expand Down
File renamed without changes.
3 changes: 0 additions & 3 deletions chunker/src/test/resources/testOut.txt

This file was deleted.

Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
[ADVP Arguably ] [NP both ] [VP were ] [PP on ] [NP notice ] [SBAR that ] [NP their behavior ] [VP was ] [ADVP at ] [ADJP least risky ] [NP Mr. Bush ] [VP had threatened ] [NP a veto ] [ADVP previously ] [NP The volatility ] [VP was ] [ADJP dizzying ] [PP for ] [NP traders ]
[ADVP Arguably ] [NP both ] [VP were ] [PP on ] [NP notice ] [SBAR that ] [NP their behavior ] [VP was ] [ADVP at least ] [ADJP risky ] [NP Mr. Bush ] [VP had threatened ] [NP a veto ] [ADVP previously ] [NP The volatility ] [VP was ] [ADJP dizzying ] [PP for ] [NP traders ]
(RB Arguably) (, ,) (DT both) (VBD were) (IN on) (NN notice) (IN that) (PRP$ their) (NN behavior) (VBD was) (IN at) (JJS least) (JJ risky) (. .) (NNP Mr.) (NNP Bush) (VBD had) (VBN threatened) (DT a) (NN veto) (RB previously) (. .) (DT The) (NN volatility) (VBD was) (JJ dizzying) (IN for) (NNS traders) (. .)
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
[ADVP (RB Arguably) ] (, ,) [NP (DT both) ] [VP (VBD were) ] [PP (IN on) ] [NP (NN notice) ] [SBAR (IN that) ] [NP (PRP$ their) (NN behavior) ] [VP (VBD was) ] [ADVP (IN at) (JJS least) ] [ADJP (JJ risky) ] (. .)
[NP (NNP Mr.) (NNP Bush) ] [VP (VBD had) (VBN threatened) ] [NP (DT a) (NN veto) ] [ADVP (RB previously) ] (. .)
[NP (DT The) (NN volatility) ] [VP (VBD was) ] [ADJP (JJ dizzying) ] [PP (IN for) ] [NP (NNS traders) ] (. .)
[NP (NNP Mr.) (NNP Bush) ] [VP (VBD had) (VBN threatened) ] [NP (DT a) (NN veto) ] [ADVP (RB previously) ] (. .)
[NP (DT The) (NN volatility) ] [VP (VBD was) ] [ADJP (JJ dizzying) ] [PP (IN for) ] [NP (NNS traders) ] (. .)
3 changes: 0 additions & 3 deletions chunker/src/test/resources/testRefOutput.txt

This file was deleted.

3 changes: 0 additions & 3 deletions chunker/test/testIn.txt

This file was deleted.

2 changes: 0 additions & 2 deletions chunker/test/testRefOut.txt

This file was deleted.

20 changes: 10 additions & 10 deletions commasrl/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
<parent>
<artifactId>illinois-cogcomp-nlp</artifactId>
<groupId>edu.illinois.cs.cogcomp</groupId>
<version>4.0.12</version>
<version>4.0.13</version>
</parent>
<modelVersion>4.0.0</modelVersion>

Expand Down Expand Up @@ -35,48 +35,48 @@
<dependency>
<groupId>edu.illinois.cs.cogcomp</groupId>
<artifactId>illinois-core-utilities</artifactId>
<version>4.0.12</version>
<version>4.0.13</version>
<optional>true</optional>
</dependency>
<dependency>
<groupId>edu.illinois.cs.cogcomp</groupId>
<artifactId>illinois-curator</artifactId>
<version>4.0.12</version>
<version>4.0.13</version>
</dependency>
<dependency>
<groupId>edu.illinois.cs.cogcomp</groupId>
<artifactId>illinois-tokenizer</artifactId>
<version>4.0.12</version>
<version>4.0.13</version>
</dependency>
<dependency>
<groupId>edu.illinois.cs.cogcomp</groupId>
<artifactId>illinois-corpusreaders</artifactId>
<version>4.0.12</version>
<version>4.0.13</version>
</dependency>
<dependency>
<groupId>edu.illinois.cs.cogcomp</groupId>
<artifactId>illinois-inference</artifactId>
<version>4.0.12</version>
<version>4.0.13</version>
</dependency>
<dependency>
<groupId>edu.illinois.cs.cogcomp</groupId>
<artifactId>stanford_3.3.1</artifactId>
<version>4.0.12</version>
<version>4.0.13</version>
</dependency>
<dependency>
<groupId>edu.illinois.cs.cogcomp</groupId>
<artifactId>illinois-pos</artifactId>
<version>4.0.12</version>
<version>4.0.13</version>
</dependency>
<dependency>
<groupId>edu.illinois.cs.cogcomp</groupId>
<artifactId>illinois-ner</artifactId>
<version>4.0.12</version>
<version>4.0.13</version>
</dependency>
<dependency>
<groupId>edu.illinois.cs.cogcomp</groupId>
<artifactId>illinois-chunker</artifactId>
<version>4.0.12</version>
<version>4.0.13</version>
</dependency>
<dependency>
<groupId>edu.illinois.cs.cogcomp</groupId>
Expand Down
2 changes: 1 addition & 1 deletion core-utilities/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,7 @@ of annotations over some text.
are white-space tokenized.

```java
String corpus = "2001_ODYSSEY"
String corpus = "2001_ODYSSEY";
String textId2 = "002";

String[] sentence1 = {"Good", "afternoon", ",", "gentlemen", "."};
Expand Down
2 changes: 1 addition & 1 deletion core-utilities/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
<parent>
<artifactId>illinois-cogcomp-nlp</artifactId>
<groupId>edu.illinois.cs.cogcomp</groupId>
<version>4.0.12</version>
<version>4.0.13</version>
</parent>

<artifactId>illinois-core-utilities</artifactId>
Expand Down
Loading