Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add handling of discontinuous annotations (brat >= 1.3). #60

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jamesdunham
Copy link

This PR addresses the issue described in #36 when brat_to_conll.py encounters discontinuous annotations created by brat >= 1.3. These can be created unintentionally by including a newline in the span of an annotation, or manually ("Add Frag").

I implemented two possible behaviors. A discontinuous annotation can either be split into multiple annotations (one for each fragment) or joined into an expanded annotation that starts with the first fragment and ends with the last. For examples see test_brat_to_conll.py.

The choice is controlled by a new parameter split_discontinuous. Its default is False, i.e., joining, because of the case where discontinuous annotations are unintentional.

@CLAassistant
Copy link

CLAassistant commented Sep 3, 2017

CLA assistant check
All committers have signed the CLA.

Discontinuous annotations can be split into multiple annotations, one for each
fragment, or joined into a continuous annotation that starts with the first
fragment and ends with the last. This behavior is controlled by a new parameter
`split_discontinuous` whose default is `False` (i.e., joining discontinuous
annotations).
@rriveraz
Copy link

rriveraz commented May 3, 2018

Is this change works also for convert from conll to brat or it isn't neccesary to change conll_to_brat file?

@jamesdunham
Copy link
Author

It isn't necessary. The issue is only with brat to conll.

@rriveraz
Copy link

rriveraz commented May 3, 2018

Ok thank you for help. Do you have an example of the output?. Also do you have the code with the changes?. If so i really appreciate if you can share it with me. Thank you.

@jamesdunham
Copy link
Author

Sure, the new tests demonstrate the changes. If you're running into problems with discontinuous annotations and need this fix now, you could clone my fork. It's up to date at the moment.

@rriveraz
Copy link

rriveraz commented May 4, 2018

Hi James. I wonder if you can help with this problem. My problem is that i have annotations between or inner other annotations for example:

T2 SCOPE 53 69 with no dementia
T3 NEGATION 58 60 no
T4 DISABILITY 61 69 dementia

or

T3 SCOPE 1420 1455 not dependent on others for walking
T4 NEGATION 1420 1423 not
T5 DISABILITY 1424 1455 dependent on others for walking

I think i could manage like disconitunous annotations but i don't know if this is the best option. When i use the original brat_to_conll file it always kept with the first annotation in this case with the scope annotation. Do you know how manage this kind of inner annotations?. Really appreciate your help. Thank you.

@jamesdunham
Copy link
Author

Sorry, I haven't looked into options for handling overlapping annotations.

@rriveraz
Copy link

rriveraz commented May 4, 2018

Thanks for your help James. By the way is it possible to identify interaction between entities with neuroner given a brat annotation? like:

T39 disease 72 82 carcinomas
T56 body-part 61 71 colorectal
R1 relatedTo Arg1:T39 Arg2:T56

Thank you

@rriveraz
Copy link

Hi James.

Just a quick question, do you know if neuroner use some kind of padding for character embedding?

Hope you can help me with this.

@Jongmassey
Copy link

Whatever happened to this PR? I'm trying to load a bunch of brat annotation files with discontinuous annotations. Is @jamesdunham 's fork still the only option and is the master branch ahead of it in other ways?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants