Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

evaluate.py crashes after a detection with incorrect correction is found #11

Closed
ebritoc opened this issue Dec 19, 2017 · 3 comments
Closed
Assignees

Comments

@ebritoc
Copy link

ebritoc commented Dec 19, 2017

The evaluation on the validation set crashes at line 62 when there is a misspelled word is correctly detected but incorrectly classified.
Example from page1037.json:

[DETECTION MATCH] page1037.text.list.1.item.6.s.1.w.3: hobbies -> Hobies
	[INCORRECT] Should be: hobby's
Traceback (most recent call last):
  File "./CLIN28_ST_spelling_correction/clin28tools/evaluate.py", line 93, in <module>
    main()
  File "./CLIN28_ST_spelling_correction/clin28tools/evaluate.py", line 62, in main
    print("[DETECTION WRONG] " + ";".join(outcorrection['span']) + ": " + " ".join([ outdata[wordid]['text'] for wordid in outcorrection['span'] ]) + " -> " + outcorrection['text'],file=sys.stderr)
  File "./CLIN28_ST_spelling_correction/clin28tools/evaluate.py", line 62, in <listcomp>
    print("[DETECTION WRONG] " + ";".join(outcorrection['span']) + ": " + " ".join([ outdata[wordid]['text'] for wordid in outcorrection['span'] ]) + " -> " + outcorrection['text'],file=sys.stderr)
  File "/home/ebritochac/anaconda3/lib/python3.6/site-packages/clin28tools/format.py", line 89, in __getitem__
    return self.data[key]
KeyError: 'page1037.text.list.1.item.1.s.1.w.1'

I changed that line by the following and it seems to work, but maybe I am misunderstanding what it is expected in the json file with the corrections:

print("[DETECTION WRONG] " + ";".join(outcorrection['span']) + ": " + " ".join([ refdata[wordid]['text'] for wordid in outcorrection['span'] ]) + " -> " + outcorrection['text'],file=sys.stderr)
@proycon
Copy link
Member

proycon commented Dec 19, 2017

I think this might have already been solved in the latest fix, can you try a pip install -U clin28tools and also try clin28-validator on your output?

@ebritoc
Copy link
Author

ebritoc commented Dec 21, 2017

I still get exactly the same KeyError with the original evaluate.py script after updating with pip.
My output JSON file seems to be valid:

$ ~/anaconda3/bin/python  validator.py ../data/results/page1037.json
valid <- ../data/results/page1037.json

proycon added a commit that referenced this issue Dec 21, 2017
@proycon
Copy link
Member

proycon commented Dec 21, 2017

Hmm, I'm still a bit puzzled why it goes wrong (hard to reproduce without having your output), but the way you changed the line is fine indeed (the reference text and output text should be the same, so this error shouldn't really happen). Anyway, I now included your fix in case anybody else stumbled on this problem.

@proycon proycon closed this as completed Dec 21, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants