Skip to content

2024 2025 roadmap

Bob Dionne edited this page Sep 23, 2024 · 4 revisions

2024-2025 Roadmap for NCI Protege

(Based on discussions between Mark, and Bob)

Introduction

The roadmap this year is to some extent a continuation of the work from last year, some of which was descoped due to lack of resources.

After discussions with the modelers, it's been decided that configurability of the EditTab customizations at NCI is not really needed and not worth the added complexity to the codebase. So we will back those changes out this year as part of a cleanup and refactoring of EditTab.

The major focus this year is to solidify all the recents changes, clean up the dependencies, continue the scalability work to support the ever growing thesaurus, and pay off some of the technical debt that has built up over the years. Some of the code is over 20 years old and a good amount of it was built to implement features that are now part of the Java platform.

  • increase the scalability of the system to allow terminologies of any size by not requiring the entire content be all in memory at once
  • enhance the protege server to support an RDF triple store, and support sparql queries over it, keeping it in sync with the terminology as it's edited
  • backout the changes EditTab from last year. Refactor EditTab to simplify and make more robust.
  • cleanup dependencies and technical debt in the code
  • develop a business rule capability then enables modelers to encode business rules using the ontology they are building. This will enable better and more flexible and auditable workflows amongst modelers.

Client/Server

EditTab

EditTab is the main plugin that provides support for the many complex operations used in terminology editing. Many of the operations, .eg. retirements, depend on existing properties in the ontology that are used for metaproject annotations. When a class is retired it's re-treed under a distinguished branch. If it was the result of a merge the original source is tracked with an annotation property. Additionally, certain meta level properties are used to support the various NCI specific business rules.

Originally it was thought that the use of properties this way should be configurable, and built by managers in order to define new projects. However feedback from the modelers indicated this was not really desired and the resulting code added too much complexity. So this year we will refactor this and remove the feature, while still keeping the enhancements to support missing projects and generally make EditTab more robust when the project config from the server is broken.

This will also require some small changes to the metaproject admin application, to remove the dialog option of adding a new config file.

Technical Debt

Sparql Query

This plugin has proven itself to be very useful for curating and repairing the terminology because it allows for ad hoc queries. Last year, we provided a new version that no longer sits in main memory. Rather it communicates to a virtuoso triple store that has been loaded with the ontology. When modelers make edits to annotation properties the triple store on the protege server is updated as well. However this only works for annotation properties.

This year we need to extend this in order to:

  • incorporate all edits, both to annotation properties as well as logical definitions
  • refresh the triple store with a new copy when the changesets have been squashed.
  • begin to assess supporting the protege-owl layer directly form the triple store

For this last item, we prototyped a small navigator widget last year that rant directly against virtuoso and the results were encouraging, the performance was good.

Business Rules

When you look at the editing rules currently enforced by EditTab they are considerable. Many of these have influenced the UI design, and the overall workflow in EditTab. It's been a long standing desire to come up with a more generic solution and refactor these business rules.

One approach would be to define a few interfaces that allow EditTab to callout to 3rd party codes at key points in the process, .eg. before committing edits, passing the class or classes in question and some context. This plugin approach was used in earlier systems (TDE) with some success. Another approach would be to use a declarative framework base on the OWL language so that the constraints would be encode in the ontology. SHACL

Scalability

Several different approaches have been prototyped and researched, which will be descibed here: