Add `--input-format` option #1056

beckyjackson · 2022-09-30T17:59:13Z

Resolves #1038

docs/ have been added/updated
tests have been added/updated
mvn verify says all tests pass
mvn site says all JavaDocs correct
CHANGELOG.md has been updated

Creates a new global --input-format option which accepts one of the valid formats (owl, obo, owx, ofn, omn, ttl, or json) to specify which parser(s) to use when loading the ontology.

I would prefer to add manager to IOHelper instead of as a parameter for loadOntology, but that would require a big refactoring of the IOHelper class and potentially introduce some breaking changes? On the other hand, I don't want to build up a bunch of tech debt and this introduces a bunch more 'loadOntology' overrides. @jamesaoverton what do you think?

Last note - there were two test commands in the docs that were overwriting the example files. I fixed these so that they write to the results folder instead (had to remove an axiom to get the reason test to work).

docs/global.md

jamesaoverton · 2022-10-01T14:48:20Z

I think this addresses the use case in the issue. Thanks!

I looked into the suggestion about storing the manager in the IOHelper instance, but our JavaDocs say @return a new ontology object, with a new OWLManager, so I think we can't do that.

cthoyt · 2022-10-01T14:50:02Z

@beckyjackson I am very happy you have been able to work on this - from a user perspective this will solve many of my debugging problems. Thank you :)

matentzn

Quite some changes to the core loading process. I went through it line by line, but remind me to do some extensive testing before the next release.

matentzn · 2022-10-01T14:45:54Z

robot-core/src/main/java/org/obolibrary/robot/IOHelper.java

+      case "owx":
+        return Sets.newHashSet(new OWLXMLParserFactory());
+      case "owl":
+        return Sets.newHashSet(new RDFXMLParserFactory(), new RioRDFXMLParserFactory());


Why do we need to add the RIO parser here? Shouldnt we be more strict?

I don't have strong feelings one way or another. I figured it would be used if RDFXML failed without --input-format anyway? @jamesaoverton do you have a preference for excluding Rio parsers?

matentzn · 2022-10-01T14:53:03Z

robot-core/src/main/java/org/obolibrary/robot/IOHelper.java

+      case "ttl":
+        return Sets.newHashSet(new RioTurtleParserFactory(), new TurtleOntologyParserFactory());
+      case "json":
+        return Sets.newHashSet(new RioJsonParserFactory(), new RioJsonLDParserFactory());


This should be the OBOGraphs parser (stand-alone) or none at all I think.

TBH, I think the --input-format options should reflect identically the -f option. Why would we care about RioJson and not about the other more outlandish parsers like KRSS? I would want my parsing to fail if it try parsing with the exact parsers I specify - not because for some reason the second json parser in the list was successful because of its crappy implementation..

I was thinking that we should just divide up the parsers that OWLAPI currently uses, but you're right that if we're adding a new option then we don't have to be constrained in that way.

The way you do it for remove is IMO the best. We use the same extensions:parser mappings here as in -f, but, if we ever so want to (I don't), add a bunch more switches later for compare complex combinations any-owl for ofn, owx, rdfxml, KRSSParser for krss etc (analogous to axiomtype option in --axioms). I would not bother here - the -f should be by far sufficient.

I spent quite a bit of time digging around OWLAPI and could not find anything for OBOGraphs. Here's a full list of the parsers in a default manager (OWLManager.createOWLOntologyManager()):

RDFXMLParserFactory

OWLXMLParserFactory

OWLFunctionalSyntaxOWLParserFactory

RioTurtleParserFactory

ManchesterOWLSyntaxOntologyParserFactory

RioNQuadsParserFactory

RioJsonParserFactory

RioNTriplesParserFactory

RioTrigParserFactory

RioBinaryRdfParserFactory

RioJsonLDParserFactory

RioN3ParserFactory

RioRDFXMLParserFactory

RioTrixParserFactory

TurtleOntologyParserFactory

OBOFormatOWLAPIParserFactory

KRSS2OWLParserFactory

RioRDFaParserFactory

Unless I'm missing something, I only see the Rio JSON parsers.

OBOgraphs parser is not in owlapi.. If it is anywhere, it is in the obographs package..

I had checked there as well (https://github.com/geneontology/obographs), but didn't see anything about a parser. The manager currently doesn't use OBOGraphs anyway when loading any JSON files, so it seems we're using one of the Rio parsers when we're loading in any JSON. We just use the OgJsonGenerator to write OBO graphs JSON.

The OG reader (OgJsonReader) doesn't produce an OWLOntology.

I was curious to see what's going on... Interestingly, if you load an OG JSON file into ROBOT and print out the format:

// Load the ontology OWLOntology loadedOntology = manager.loadOntologyFromOntologyDocument(source, config); // Check for unparsed triples - get the document format and then the loader metadata OWLDocumentFormat f = manager.getOntologyFormat(loadedOntology); System.out.println(f);

...it tells you it's RDF/XML. Looks like at some point while loading, it's converted:

DEBUG Converting from JSON to RDF ... RDF/XML Syntax

jamesaoverton · 2022-10-01T15:19:43Z

I created an issue to discuss auto-detection of formats. @beckyjackson Don't feel obligated to dive even deeper into this -- this is already a great start!

balhoff

I looked at this code because I wanted to use the same approach in a different project. But I ran into a problem there which I confirmed also is a problem here. If you limit the parsers to a particular set, all the imported ontologies must also use that format. :-(

Try making an ro.ttl and setting input format to ttl. When it tries to import http://purl.obolibrary.org/obo/ro/annotations.owl (RDF/XML), it will fail.

matentzn · 2022-12-01T06:46:52Z

@balhoff There is nothing though we can do about it, right? The only thing we can do is document that behaviour on the docs pages and that's that...

balhoff · 2022-12-01T16:27:14Z

@matentzn I think you're right, but I'm not sure we can add this feature if it is inherently broken.

matentzn · 2022-12-02T10:30:58Z

I am not going to fight for this feature. I just want to point out it is inherently broken only in cases where owl:imports are involved.. But yeah, I am not sure if this downside completely invalidates this PR.

balhoff · 2022-12-09T03:58:58Z

I think we might want to go ahead with this if we document the limitation well. I'm running into a problem in another project where I really need to specify the format. It's a huge ontology and running through all the parsers is hitting memory problems.

jamesaoverton · 2024-02-07T19:40:00Z

I don't have any plans for this, so I'm going to convert to Draft. If someone really wants this, feel free to push it forward.

beckyjackson added 4 commits September 30, 2022 10:15

Add input-format option

eb7a7b7

Add tests

d055d73

Fix reason tests

05de161

Update CHANGELOG

3509353

jamesaoverton mentioned this pull request Oct 1, 2022

Allow specifying input type in robot convert #1038

Open

cthoyt reviewed Oct 1, 2022

View reviewed changes

docs/global.md Outdated Show resolved Hide resolved

matentzn reviewed Oct 1, 2022

View reviewed changes

jamesaoverton mentioned this pull request Oct 1, 2022

Add functionality to guess the intended input file format #1058

Open

Add table of formats

69bdb12

balhoff requested changes Nov 30, 2022

View reviewed changes

balhoff mentioned this pull request Apr 27, 2023

CLI & docs: robot convert --format #1114

Open

3 tasks

jamesaoverton marked this pull request as draft February 7, 2024 19:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `--input-format` option #1056

Add `--input-format` option #1056

beckyjackson commented Sep 30, 2022 •

edited by jamesaoverton

Loading

jamesaoverton commented Oct 1, 2022

cthoyt commented Oct 1, 2022

matentzn left a comment

matentzn Oct 1, 2022

beckyjackson Oct 2, 2022

matentzn Oct 1, 2022

jamesaoverton Oct 1, 2022

matentzn Oct 1, 2022

beckyjackson Oct 2, 2022

matentzn Oct 2, 2022

beckyjackson Oct 2, 2022 •

edited

Loading

jamesaoverton commented Oct 1, 2022

balhoff left a comment

matentzn commented Dec 1, 2022

balhoff commented Dec 1, 2022

matentzn commented Dec 2, 2022

balhoff commented Dec 9, 2022

jamesaoverton commented Feb 7, 2024

Add --input-format option #1056

Are you sure you want to change the base?

Add --input-format option #1056

Conversation

beckyjackson commented Sep 30, 2022 • edited by jamesaoverton Loading

jamesaoverton commented Oct 1, 2022

cthoyt commented Oct 1, 2022

matentzn left a comment

Choose a reason for hiding this comment

matentzn Oct 1, 2022

Choose a reason for hiding this comment

beckyjackson Oct 2, 2022

Choose a reason for hiding this comment

matentzn Oct 1, 2022

Choose a reason for hiding this comment

jamesaoverton Oct 1, 2022

Choose a reason for hiding this comment

matentzn Oct 1, 2022

Choose a reason for hiding this comment

beckyjackson Oct 2, 2022

Choose a reason for hiding this comment

matentzn Oct 2, 2022

Choose a reason for hiding this comment

beckyjackson Oct 2, 2022 • edited Loading

Choose a reason for hiding this comment

jamesaoverton commented Oct 1, 2022

balhoff left a comment

Choose a reason for hiding this comment

matentzn commented Dec 1, 2022

balhoff commented Dec 1, 2022

matentzn commented Dec 2, 2022

balhoff commented Dec 9, 2022

jamesaoverton commented Feb 7, 2024

Add `--input-format` option #1056

Add `--input-format` option #1056

beckyjackson commented Sep 30, 2022 •

edited by jamesaoverton

Loading

beckyjackson Oct 2, 2022 •

edited

Loading