Skip to content

Commit

Permalink
Update documentations related to PR 'CVOC : Indexed field accuracy (O…
Browse files Browse the repository at this point in the history
…ntoportal integration) IQSS#10505'
  • Loading branch information
luddaniel committed May 7, 2024
1 parent 450997c commit 72c1f61
Show file tree
Hide file tree
Showing 3 changed files with 22 additions and 10 deletions.
16 changes: 13 additions & 3 deletions doc/release-notes/9276-doc-cvoc-index-in.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,17 @@

### Updates on Support for External Vocabulary Services

#### Indexed field accuracy
Multiple extensions of the External Vocabulary mechanism have been added. These extensions allow interaction with services based on the Ontoportal software and are expected to be generally useful for other service types.

For more relevant indexing, you can now map external vocabulary values to a `managed-fields` of a [:CVocConf setting](https://guides.dataverse.org/en/6.3/installation/config.html#cvocconf) by adding the key `indexIn` in `retrieval-filtering`.
For more information, please check [GDCC/dataverse-external-vocab-support documentation](https://github.com/gdcc/dataverse-external-vocab-support/tree/main/docs).
These changes include:

#### Improved Indexing with Compound Fields

When using an external vocabulary service with compound fields, you can now specify which field(s) will include additional indexed information, such as translations of an entry into other languages. This is done by adding the `indexIn` in `retrieval-filtering`. (#10505)
For more information, please check [GDCC/dataverse-external-vocab-support documentation](https://github.com/gdcc/dataverse-external-vocab-support/tree/main/docs).

#### Broader Support for Indexing Service Responses

Indexing of the results from `retrieval-filtering` responses can now handle additional formats including Json Arrays of Strings and values from arbitrary keys within a JSON Object. (#10505)

**** This documentation must be merged with 9276-allow-flexible-params-in-retrievaluri-cvoc.md (#10404)
6 changes: 4 additions & 2 deletions doc/sphinx-guides/source/admin/metadatacustomization.rst
Original file line number Diff line number Diff line change
Expand Up @@ -555,6 +555,8 @@ Great care must be taken when reloading a metadata block. Matching is done on fi

The ability to reload metadata blocks means that SQL update scripts don't need to be written for these changes. See also the :doc:`/developers/sql-upgrade-scripts` section of the Developer Guide.

.. _using-external-vocabulary-services:

Using External Vocabulary Services
----------------------------------

Expand All @@ -580,9 +582,9 @@ In general, the external vocabulary support mechanism may be a better choice for
The specifics of the user interface for entering/selecting a vocabulary term and how that term is then displayed are managed by third-party Javascripts. The initial Javascripts that have been created provide auto-completion, displaying a list of choices that match what the user has typed so far, but other interfaces, such as displaying a tree of options for a hierarchical vocabulary, are possible.
Similarly, existing scripts do relatively simple things for displaying a term - showing the term's name in the appropriate language and providing a link to an external URL with more information, but more sophisticated displays are possible.

Scripts supporting use of vocabularies from services supporting the SKOMOS protocol (see https://skosmos.org) and retrieving ORCIDs (from https://orcid.org) are available https://github.com/gdcc/dataverse-external-vocab-support. (Custom scripts can also be used and community members are encouraged to share new scripts through the dataverse-external-vocab-support repository.)
Scripts supporting use of vocabularies from services supporting the SKOMOS protocol (see https://skosmos.org), retrieving ORCIDs (from https://orcid.org), services based on Ontoportal product (see https://ontoportal.org/), and using ROR (https://ror.org/) are available https://github.com/gdcc/dataverse-external-vocab-support. (Custom scripts can also be used and community members are encouraged to share new scripts through the dataverse-external-vocab-support repository.)

Configuration involves specifying which fields are to be mapped, whether free-text entries are allowed, which vocabulary(ies) should be used, what languages those vocabulary(ies) are available in, and several service protocol and service instance specific parameters.
Configuration involves specifying which fields are to be mapped, to which Solr field they should be indexed, whether free-text entries are allowed, which vocabulary(ies) should be used, what languages those vocabulary(ies) are available in, and several service protocol and service instance specific parameters, including the ability to send HTTP headers on calls to the service.
These are all defined in the :ref:`:CVocConf <:CVocConf>` setting as a JSON array. Details about the required elements as well as example JSON arrays are available at https://github.com/gdcc/dataverse-external-vocab-support, along with an example metadata block that can be used for testing.
The scripts required can be hosted locally or retrieved dynamically from https://gdcc.github.io/ (similar to how dataverse-previewers work).

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -328,7 +328,7 @@ public Map<Long, JsonObject> getCVocConf(boolean byTermUriField){
logger.warning("Ignoring External Vocabulary setting for non-existent child field: "
+ managedFields.getString(s));
} else {
logger.info("Found: " + dft.getName());
logger.fine("Found: " + dft.getName());
}
}
}
Expand Down Expand Up @@ -370,10 +370,10 @@ public void registerExternalVocabValues(DatasetField df) {
/**
* Retrieves indexable strings from a cached externalvocabularyvalue entry filtered through retrieval-filtering configuration.
* <p>
* This method externalvocabularyvalue entries have been filtered and contains a single JsonObject.
* Is handled : Strings, Array of Objects with "lang" and ("value" or "content") keys, Object with Strings as value or Object with Array of Strings as value.
* The string, or the "value/content"s for each language are added to the set.
* This method can retrieve string values to be indexed in term-uri-field (parameter defined in CVOC configuration) or in "indexIn" field (optional parameter of retrieval-filtering defined in CVOC configuration).
* This method assumes externalvocabularyvalue entries have been filtered and that they contain a single JsonObject.
* Cases Handled : A String, an Array of Strings, an Array of Objects with "value" or "content" keys, an Object with one or more entries that have String values or Array values with a set of String values.
* The string(s), or the "value/content"s for each language are added to the set.
* Retrieved string values are indexed in the term-uri-field (parameter defined in CVOC configuration) by default, or in the field specified by an optional "indexIn" parameter in the retrieval-filtering defined in the CVOC configuration.
* <p>
* Any parsing error results in no entries (there can be unfiltered entries with
* unknown structure - getting some strings from such an entry could give fairly
Expand Down

0 comments on commit 72c1f61

Please sign in to comment.