Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add --input-format option #1056

Draft
wants to merge 5 commits into
base: master
Choose a base branch
from
Draft

Add --input-format option #1056

wants to merge 5 commits into from

Conversation

beckyjackson
Copy link
Contributor

@beckyjackson beckyjackson commented Sep 30, 2022

Resolves #1038

  • docs/ have been added/updated
  • tests have been added/updated
  • mvn verify says all tests pass
  • mvn site says all JavaDocs correct
  • CHANGELOG.md has been updated

Creates a new global --input-format option which accepts one of the valid formats (owl, obo, owx, ofn, omn, ttl, or json) to specify which parser(s) to use when loading the ontology.

I would prefer to add manager to IOHelper instead of as a parameter for loadOntology, but that would require a big refactoring of the IOHelper class and potentially introduce some breaking changes? On the other hand, I don't want to build up a bunch of tech debt and this introduces a bunch more 'loadOntology' overrides. @jamesaoverton what do you think?

Last note - there were two test commands in the docs that were overwriting the example files. I fixed these so that they write to the results folder instead (had to remove an axiom to get the reason test to work).

docs/global.md Outdated Show resolved Hide resolved
@jamesaoverton
Copy link
Member

I think this addresses the use case in the issue. Thanks!

I looked into the suggestion about storing the manager in the IOHelper instance, but our JavaDocs say @return a new ontology object, with a new OWLManager, so I think we can't do that.

@cthoyt
Copy link

cthoyt commented Oct 1, 2022

@beckyjackson I am very happy you have been able to work on this - from a user perspective this will solve many of my debugging problems. Thank you :)

Copy link
Contributor

@matentzn matentzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quite some changes to the core loading process. I went through it line by line, but remind me to do some extensive testing before the next release.

case "owx":
return Sets.newHashSet(new OWLXMLParserFactory());
case "owl":
return Sets.newHashSet(new RDFXMLParserFactory(), new RioRDFXMLParserFactory());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to add the RIO parser here? Shouldnt we be more strict?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have strong feelings one way or another. I figured it would be used if RDFXML failed without --input-format anyway? @jamesaoverton do you have a preference for excluding Rio parsers?

case "ttl":
return Sets.newHashSet(new RioTurtleParserFactory(), new TurtleOntologyParserFactory());
case "json":
return Sets.newHashSet(new RioJsonParserFactory(), new RioJsonLDParserFactory());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be the OBOGraphs parser (stand-alone) or none at all I think.

TBH, I think the --input-format options should reflect identically the -f option. Why would we care about RioJson and not about the other more outlandish parsers like KRSS? I would want my parsing to fail if it try parsing with the exact parsers I specify - not because for some reason the second json parser in the list was successful because of its crappy implementation..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking that we should just divide up the parsers that OWLAPI currently uses, but you're right that if we're adding a new option then we don't have to be constrained in that way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way you do it for remove is IMO the best. We use the same extensions:parser mappings here as in -f, but, if we ever so want to (I don't), add a bunch more switches later for compare complex combinations any-owl for ofn, owx, rdfxml, KRSSParser for krss etc (analogous to axiomtype option in --axioms). I would not bother here - the -f should be by far sufficient.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spent quite a bit of time digging around OWLAPI and could not find anything for OBOGraphs. Here's a full list of the parsers in a default manager (OWLManager.createOWLOntologyManager()):

  • RDFXMLParserFactory
  • OWLXMLParserFactory
  • OWLFunctionalSyntaxOWLParserFactory
  • RioTurtleParserFactory
  • ManchesterOWLSyntaxOntologyParserFactory
  • RioNQuadsParserFactory
  • RioJsonParserFactory
  • RioNTriplesParserFactory
  • RioTrigParserFactory
  • RioBinaryRdfParserFactory
  • RioJsonLDParserFactory
  • RioN3ParserFactory
  • RioRDFXMLParserFactory
  • RioTrixParserFactory
  • TurtleOntologyParserFactory
  • OBOFormatOWLAPIParserFactory
  • KRSS2OWLParserFactory
  • RioRDFaParserFactory

Unless I'm missing something, I only see the Rio JSON parsers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OBOgraphs parser is not in owlapi.. If it is anywhere, it is in the obographs package..

Copy link
Contributor Author

@beckyjackson beckyjackson Oct 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had checked there as well (https://github.com/geneontology/obographs), but didn't see anything about a parser. The manager currently doesn't use OBOGraphs anyway when loading any JSON files, so it seems we're using one of the Rio parsers when we're loading in any JSON. We just use the OgJsonGenerator to write OBO graphs JSON.

The OG reader (OgJsonReader) doesn't produce an OWLOntology.

I was curious to see what's going on... Interestingly, if you load an OG JSON file into ROBOT and print out the format:

// Load the ontology
OWLOntology loadedOntology = manager.loadOntologyFromOntologyDocument(source, config);

// Check for unparsed triples - get the document format and then the loader metadata
OWLDocumentFormat f = manager.getOntologyFormat(loadedOntology);
System.out.println(f);

...it tells you it's RDF/XML. Looks like at some point while loading, it's converted:

DEBUG Converting from JSON to RDF
...
RDF/XML Syntax

@jamesaoverton
Copy link
Member

I created an issue to discuss auto-detection of formats. @beckyjackson Don't feel obligated to dive even deeper into this -- this is already a great start!

Copy link
Contributor

@balhoff balhoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at this code because I wanted to use the same approach in a different project. But I ran into a problem there which I confirmed also is a problem here. If you limit the parsers to a particular set, all the imported ontologies must also use that format. :-(

Try making an ro.ttl and setting input format to ttl. When it tries to import http://purl.obolibrary.org/obo/ro/annotations.owl (RDF/XML), it will fail.

@matentzn
Copy link
Contributor

matentzn commented Dec 1, 2022

@balhoff There is nothing though we can do about it, right? The only thing we can do is document that behaviour on the docs pages and that's that...

@balhoff
Copy link
Contributor

balhoff commented Dec 1, 2022

@matentzn I think you're right, but I'm not sure we can add this feature if it is inherently broken.

@matentzn
Copy link
Contributor

matentzn commented Dec 2, 2022

I am not going to fight for this feature. I just want to point out it is inherently broken only in cases where owl:imports are involved.. But yeah, I am not sure if this downside completely invalidates this PR.

@balhoff
Copy link
Contributor

balhoff commented Dec 9, 2022

I think we might want to go ahead with this if we document the limitation well. I'm running into a problem in another project where I really need to specify the format. It's a huge ontology and running through all the parsers is hitting memory problems.

@jamesaoverton
Copy link
Member

I don't have any plans for this, so I'm going to convert to Draft. If someone really wants this, feel free to push it forward.

@jamesaoverton jamesaoverton marked this pull request as draft February 7, 2024 19:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow specifying input type in robot convert
5 participants