GitHub

NAME ENTITY RECOGNITION

This software implements several experiments by using the following toolkits

Stanford Linear Classifier
Stanford NER
SVMLight, tree-kernels
Label propagation algorithm JUNTO

In addition it access external ressources, namely a database with the wikipedia pages until Feb 2014

We also implemented a weakly supervised algorithm that is first initialized with the weights given by the stanford linear classifier trained on little data.

Licence

This software is distributed under the CeCILL-C license.

Configuration

etc/ner.properties , this file has the variables necesary to configure the different classifiers

The following packages and classes interface with these utilities

Packages

Interface with Stanford Linear Classifier

src/linearclassifier AnalyzeClassifier.java :
- interfaces with Stanford Linear Classifier
- implements a weakly supervised algorithm based on risk minimisation. It can use the closed form for the risk estimation or a numerical approximation to the risk.
- configuration: you can give to the classifier the name of classifier, for the moment the following types are supported
  - "pers": binary classifier for detecting whether or not one word is a person
  - "org": binary classifier for detecting whether or not one word is a organization
  - "prod": binary classifier for detecting whether or not one word is a product
  - "loc": binary classifier for detecting whether or not one word is a localization
    - "pn": if there is one general classifier that detects a proper name
  - "all": multiclass classifier detecting the categories: person,organization,produt and localization. All these constants are setted in the class tools.CNConstans.
  - input: set the static variables LISTTRAINFILES and LISTTESTFILES, which are files containing the list of files to process, see as examples esterTrain.xmll and esterTest. YOU MUST SET THE LIST OF FILES TO TRAIN AND TEST IN THE PROPERTIES FILE: ner.properties You can set a flag for using wikipedia as an extrafeature. If entity found in wikipedia as person, place,organization or product. (it can take up to 2h) Margin.java, store stanford linear classifier weights, features, and instances.
  NumericalIntegration.java, implements the numerical approximation to the risk
src/gmm : All classes for the gmm-training

Interface with Stanford CRF

src/CRFClassifier AnalyzeCRFClassifier.java, interfaces with stanford NER You can use gazetters, by using file gazettes/gazette.txt or gazettelcase.txt (all in lowercase) Margin.java, class that stores the weights, features and instances of the CRF classifier AnalyzeSemiCRF.java, intefaces with a semi-crf implementation YOU MUST SET THE LIST OF FILES TO TRAIN AND TEST IN THE PROPERTIES FILE: ner.properties

Interface with SVMLight

src/svm AnalyzeSVMClassifier.java, interfaces to SVMLight, prepares the input and evaluates the output. There are several input files, it generates dependency trees for using tree kernel, polynomial kerner or linear kernel, it can even use the same features as the Stanford linear classifier. YOU MUST SET THE LIST OF FILES TO TRAIN AND TEST IN THE PROPERTIES FILE: ner.properties
src/lex , necessary classes for storing utterances, words, lexical unix, postags and dependency trees

External Resources

src/resources WikipediaAPI.java, access to wikipedia pages in French all stored in a mysql database, up to feb 2104 For the database configuration, in a mysql database, create an user "contonmina/contnomina" in localhost, Create the wikipedia database by executing the script in wikipedia/db/dbWikibackupMar32014.sql (11G)
YOU MUST SET THE DATABASE SETTING IN THE PROPERTIES FILE:ner.properties and in the hibernate configuration file: src/hibernate.cfg.xml
src/labelpropagation LabelPropagation.java, prepares the input for the JUNTO label propagation toolit and evaluates its output file

Using the output of the ASR

src/reco ASROut.java Alignment of the ASR output, calls the Linera Classifier and CRF Capitalization.java CRF for automic capitalizing the output of the ASR YOU MUST SET THE DATABASE SETTING IN THE PROPERTIES FILE:ner.properties and in the hibernate configuration file: src/hibernate.cfg.xml

Name		Name	Last commit message	Last commit date
Latest commit History 447 Commits
analysis/CRF		analysis/CRF
corpus/CoNLL-2003		corpus/CoNLL-2003
gazettes		gazettes
lib		lib
lprop		lprop
res		res
scripts		scripts
src		src
templates		templates
.classpath		.classpath
.project		.project
README.md		README.md
StanfordNER.tar.gz		StanfordNER.tar.gz
TODO.md		TODO.md
build.xml		build.xml
conlleval		conlleval
deten.c		deten.c
en.c		en.c
en.hier		en.hier
en2.c		en2.c
en2.hier		en2.hier
eninv.hier		eninv.hier
ester2.sh		ester2.sh
ester2SNER.sh		ester2SNER.sh
ester2unsup.sh		ester2unsup.sh
esterParseTestALL.xmll		esterParseTestALL.xmll
esterParseTrainALL.xmll		esterParseTrainALL.xmll
esterTest.xmll		esterTest.xmll
esterTestALL.xmll		esterTestALL.xmll
esterTrain.xmll		esterTrain.xmll
esterTrainALL.xmll		esterTrainALL.xmll
etape-cz.sh		etape-cz.sh
etape.sh		etape.sh
groups.pn.tab.lc.test		groups.pn.tab.lc.test
groups.pn.tab.lc.train		groups.pn.tab.lc.train
make-good-format.awk		make-good-format.awk
nerbuild.xml		nerbuild.xml
newrun.sh		newrun.sh
prenoms		prenoms
prenoms.txt		prenoms.txt
prods.txt		prods.txt
repere2SNER.sh		repere2SNER.sh
samplib.c		samplib.c
samplib.h		samplib.h
slinearclassifier.props		slinearclassifier.props
stats.c		stats.c
stats.h		stats.h
svmfeaturesDict.ser		svmfeaturesDict.ser
syn.props		syn.props
syncz.props		syncz.props
test.xmll		test.xmll
train.xmll		train.xmll

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NAME ENTITY RECOGNITION

Licence

Configuration

The following packages and classes interface with these utilities

Packages

Interface with Stanford Linear Classifier

Interface with Stanford CRF

Interface with SVMLight

External Resources

Using the output of the ASR

About

Releases

Packages

Languages

synalp/NER

Folders and files

Latest commit

History

Repository files navigation

NAME ENTITY RECOGNITION

Licence

Configuration

The following packages and classes interface with these utilities

Packages

Interface with Stanford Linear Classifier

Interface with Stanford CRF

Interface with SVMLight

External Resources

Using the output of the ASR

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages