Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
__init__.py		__init__.py
brat2ner.py		brat2ner.py
ner2brat.py		ner2brat.py

README.md

Training a custom Named Entity Recogniser using Stanford CoreNLP

This document contains instructions to train Core NLP's CRFClassifier for NER.

References:

Create custom NER model for CoreNLP http://nlp.stanford.edu/software/crf-faq.shtml#a
Brat : http://brat.nlplab.org/
Py CoreNLP https://github.com/smilli/py-corenlp

Step 1. Download CoreNLP

Visit http://stanfordnlp.github.io/CoreNLP/download.html, download the zip. Extract the zip. Note: this runs on Java 8

Step 2: Install Python dependencies

Follow instructions in https://github.com/smilli/py-corenlp

pip install pycorenlp

Step 3: Start Core NLP server

Goto CoreNLP extracted directory and runs

java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer

Step 4: Prepare input text and annotations

List all the .txt files and .ann files to a path list file

$ ls $PWD/*.ann | sed -e 's/\(.*\)\.ann/\1.txt,\1.ann/g' > input.list

Step 5: Separate the training and testing records

Split the input.list for training and testing data. Lets call them as train.list and test.list

Step 6: Convert brat to corenlp NER format

python3 brat2ner.py --in train.list --out train.tsv
python3 brat2ner.py --in test.list --out test.tsv

Step 6: Train and create a NER Model

Create a properties file lpsc.prop

#location of the training file
trainFile = train.tsv
#location where you would like to save (serialize to) your
#classifier; adding .gz at the end automatically gzips the file,
#making it faster and smaller
serializeTo = ner-model.ser.gz

#structure of your training file; this tells the classifier
#that the word is in column 0 and the correct answer is in
#column 1
map = word=0,answer=1

#these are the features we'd like to train with
#some are discussed below, the rest can be
#understood by looking at NERFeatureFactory
useClassFeature=true
useWord=true
useNGrams=true
#no ngrams will be included that do not contain either the
#beginning or end of the word
noMidNGrams=true
useDisjunctive=true
maxNGramLeng=6
usePrev=true
useNext=true
useSequences=true
usePrevSequences=true
maxLeft=1
#the next 4 deal with word shape features
useTypeSeqs=true
useTypeSeqs2=true
useTypeySequences=true
wordShape=chris2useLC

java -cp .:../stanford-corenlp-full-2015-12-09/* \
 edu.stanford.nlp.ie.crf.CRFClassifier  -prop lpsc.prop

Step 7: Test

java -cp .:../stanford-corenlp-full-2015-12-09/* edu.stanford.nlp.ie.crf.CRFClassifier  -loadClassifier ner-model.ser.gz -testFile test.tsv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

corenlp

corenlp

README.md

Training a custom Named Entity Recogniser using Stanford CoreNLP

References:

Step 1. Download CoreNLP

Step 2: Install Python dependencies

Step 3: Start Core NLP server

Step 4: Prepare input text and annotations

Step 5: Separate the training and testing records

Step 6: Convert brat to corenlp NER format

Step 6: Train and create a NER Model

Step 7: Test

Files

corenlp

Directory actions

More options

Directory actions

More options

Latest commit

History

corenlp

Folders and files

parent directory

README.md

Training a custom Named Entity Recogniser using Stanford CoreNLP

References:

Step 1. Download CoreNLP

Step 2: Install Python dependencies

Step 3: Start Core NLP server

Step 4: Prepare input text and annotations

Step 5: Separate the training and testing records

Step 6: Convert brat to corenlp NER format

Step 6: Train and create a NER Model

Step 7: Test