Changes to output format #12

cmungall · 2021-08-12T22:00:11Z

Related to #11.

Refer to sssom for good practice

use lowercase
split entity into two columns
- ID
- Property (synonym, label, etc)
split origin
- source_version (e.g. "version" : "http://purl.obolibrary.org/obo/go/releases/2021-07-02/go.owl")
- source_id (e.g. go)
- source_file (e.g. go.json)
what is zone?
sentence id:
- this is a bit opaque
- should there be an intermediate file created that is one line per sentence with document ids and sentence ids as the first two columns?

hrshdhgd · 2021-08-12T22:30:18Z

what is zone?

This is something OGER spits out in the output. I have no clue what it represents. The documentation does not specify it's significance either. I'll keep looking.

sentence id:

So when the text has multiple sentences, S1 is Sentence 1, S2 is Sentence 2 so on and so forth. It basically splits sentences by the separator (. for example) and assign these IDs to sentences.

hrshdhgd · 2021-08-12T22:36:59Z

I looked in the code for OGER and it seems that 'zone' is represented by section_type in the code. This is relevant to clinical notes. I have seen in this in the past that when clinical text is recorded in EHRs , there are sections in the text represented in all caps (for e.g. DIAGNOSIS, TREATMENT PLAN etc.). This basically highlights that. In our case it will always be blank.

cmungall · 2021-08-12T23:11:32Z

there may be analogs, e.g. a typical journal article will be structured, maybe it is also for the structure/section heading, e.g. methods, abstract, ...?

hrshdhgd · 2021-08-12T23:50:21Z

That makes sense.

column names lowercased as per #12

cmungall · 2021-10-22T19:00:15Z

currently the match_field is sometimes empty sometimes filled

let's change to reuse sssom data dictionary where possible

object_id: (currently entity_id). The ontology term id that was matched
object_label: (currently sometimes this is in match field). This is the primary label of the object_id, regardless of whether the match was on the label or synonym
object_category (currently "type")

hrshdhgd added a commit that referenced this issue Aug 13, 2021

column names lowercased as per #12

a4d0314

hrshdhgd added a commit that referenced this issue Aug 13, 2021

Merge pull request #15 from monarch-initiative/column-name-change

a25cbad

column names lowercased as per #12

hrshdhgd mentioned this issue Oct 22, 2021

New output format #28

Merged

hrshdhgd added a commit that referenced this issue Oct 22, 2021

basic refactor as per #12

45916a8

hrshdhgd added a commit that referenced this issue Oct 22, 2021

all tests pass addresses #12

b6fe4f1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Changes to output format #12

Changes to output format #12

cmungall commented Aug 12, 2021

hrshdhgd commented Aug 12, 2021

hrshdhgd commented Aug 12, 2021

cmungall commented Aug 12, 2021

hrshdhgd commented Aug 12, 2021

cmungall commented Oct 22, 2021

Changes to output format #12

Changes to output format #12

Comments

cmungall commented Aug 12, 2021

hrshdhgd commented Aug 12, 2021

hrshdhgd commented Aug 12, 2021

cmungall commented Aug 12, 2021

hrshdhgd commented Aug 12, 2021

cmungall commented Oct 22, 2021