gold-crowd

A platform to create crowd-sourced gene function gold standards with Amazon Mechanical Turk

Installation

Make sure you have all requirements: python2, pipenv, and java (tested on openjdk 1.8, used for NobleCoder).
Download the repository
Change into it and pipenv install python dependencies
Launch NobleCoder from tools/NobleCoder-1.0.jar and import the Gene Ontology (download from here) under the name go. The process.py script will run NobleCoder on your abstracts and tell it to use the Ontology "go", so if you choose a different name you will have to adapt the script.

Put the Pubmed IDs of the abstracts you're interested in into data/pmid_list.txt
Run pipenv run python process.py
Output is in data/abstracts and data/brat-input. Put all files from these folders together in the same folder of your brat installation. In that same folder you will also need a file annotation.conf that could look like this (more information here):
```
[entities]

Gene
Function

[relations]

Does	Arg1:Gene, Arg2:Function
Does	Arg1:Function, Arg2:Gene
DoesNot	Arg1:Function, Arg2:Gene
DoesNot	Arg1:Gene, Arg2:Function

[attributes]

[events]
```
There will also be a file data/statistics.cvs containing the number of words, genes, and functions for each abstract.