PGxCorpus is a manually annotated corpus, designed for the extraction of pharmacogenomic realtions from text. It is composed of 945 sentences mannually annotated, issued from 911 distinct PubMed abstracts. Annotation has been achieved by 11 annotators, including 5 senior annotators. Each sentence has been seen independently by 2 annotators, in a first phase, and by a third senior annotator, in a second phase.
- PGxCorpus is in the file PGxCorpus.tar, in the Brat file format.
- It can be browsed on our Brat server at https://pgxcorpus.loria.fr/.
- It is also available on FigShare.
The annotation guidelines were provided to the annotators to reduced the heterogeneity in the annotation task.
- annotation_guidelines.pdf: Annotation guidelines
The source code of the baseline experiment reported in [1], is available in ./baseline_experiment/
In preparation.
PGxCorpus is under Creative Commons BY NC 4.0.
PGxCorpus is supported by the PractiKPharma project (http://practikpharma.loria.fr/), funded by the French National Research Agency (ANR) under grant ANR-15-CE23-0028.