-
Notifications
You must be signed in to change notification settings - Fork 10
Prodigy
Prodigy is a new tool for radically efficient machine teaching. It addresses the big remaining problem: annotation and training.
Prodigy is not free, but you can submit a request for a research license here.
Unfortunately, for Greek language there were available datasets only for POS and DEP tagger but not for NER. So, we had to create the data ourselves.
Prodigy helped a lot in this direction. The final data for NER can be found here.
Useful commands:
-
Get info about your dataset(s)
python3 -m prodigy stats ner_train -l
-
Drop a dataset
python3 -m prodigy drop ner_dev
python3 -m prodigy dataset ner
python3 -m prodigy db-in ner ner.jsonl
python3 -m prodigy ner.batch-train ner el_core_web_sm --output models/ner/ --label "ORG, PRODUCT, LOC, GPE, EVENT, PERSON" --no-missing --dropout 0.2 --n-iter 15
Firstly, you will need to annotate manually the dataset.
python3 -m prodigy ner.manual ner_train el_core_web_sm path_to_data --label "ORG, PRODUCT, PERSON, LOC, GPE, EVENT"
After a significant amount of annotations, you can start using model predictions to accelerate the annotation procedure.
python3 -m prodigy ner.make-gold ner_train el_core_web_sm path_to_data
When the performance of your model is good enough, you can use another recipe, ner.teach, to accelerate even more the annotation procedure:
python3 -m prodigy ner.teach ner_train el_core_web_sm path_to_data --label "ORG, PRODUCT, PERSON, LOC, GPE, EVENT"
Produce model:
python3 -m prodigy ner.batch-train ner_train el_core_web_sm --output models/small_with_entities --n-iter 20 --eval-split 0.2 --dropout 0.2
Note: If you haven't used ner.make-gold, you can use --no-missing optional argument for better performance.