Prodigy

Prodigy is a new tool for radically efficient machine teaching. It addresses the big remaining problem: annotation and training.

Prodigy is not free, but you can submit a request for a research license here.

Greek language and Prodigy

Unfortunately, for Greek language there were available datasets only for POS and DEP tagger but not for NER. So, we had to create the data ourselves.

Prodigy helped a lot in this direction. The final data for NER can be found here.

Using prodigy

Useful commands:

Get info about your dataset(s)
```
python3 -m prodigy stats ner_train -l
```
Drop a dataset
```
python3 -m prodigy drop ner_dev
```

Retrain NER

python3 -m prodigy dataset ner
python3 -m prodigy db-in ner ner.jsonl
python3 -m prodigy ner.batch-train ner el_core_web_sm --output models/ner/ --label "ORG, PRODUCT, LOC, GPE, EVENT, PERSON" --no-missing --dropout 0.2 --n-iter 15

Retrain NER from scratch

Firstly, you will need to annotate manually the dataset.

python3 -m prodigy ner.manual ner_train el_core_web_sm path_to_data --label "ORG, PRODUCT, PERSON, LOC, GPE, EVENT"

After a significant amount of annotations, you can start using model predictions to accelerate the annotation procedure.

python3 -m prodigy ner.make-gold ner_train el_core_web_sm path_to_data

When the performance of your model is good enough, you can use another recipe, ner.teach, to accelerate even more the annotation procedure:

python3 -m prodigy ner.teach ner_train el_core_web_sm path_to_data --label "ORG, PRODUCT, PERSON, LOC, GPE, EVENT"

Produce model:

python3 -m prodigy ner.batch-train ner_train el_core_web_sm --output models/small_with_entities --n-iter 20 --eval-split 0.2 --dropout 0.2

Note: If you haven't used ner.make-gold, you can use --no-missing optional argument for better performance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prodigy

Prodigy

Greek language and Prodigy

Using prodigy

Retrain NER

Retrain NER from scratch

Clone this wiki locally