Skip to content

InstallFusekiJenaText

joelit edited this page Mar 12, 2020 · 19 revisions

Installing Fuseki with the jena-text extension

Introduction

Jena Fuseki is a SPARQL server and triple store which is the recommended backend for Skosmos. The jena-text extension can be used for faster text search. Fuseki 1.0.1 or later is recommended, because it includes graph-specific indexing. Fuseki 1.3.0+ or 2.3.0+ is required for Skosmos 1.4 and above. NOTE: Fuseki 1.3.1 and 2.3.1 have a bug which affects Skosmos so they are not recommended.

Download

You will need:

Installation

  • Unpack the Fuseki distribution: tar xzf jena-fuseki-*-distribution.tar.gz
  • cd jena-fuseki-*-SNAPSHOT

If all went well, you should be able to test Fuseki by running ./fuseki-server --mem /ds

Configuration

To use the index, you will need to run Fuseki with a configuration file, such as the one below. The example is based on the jena-text example configuration but has the following edits:

  • add a graph index
  • index properties skos:prefLabel, skos:altLabel, skos:hiddenLabel instead of rdfs:label
  • set TDB location to /tmp/tdb (change this to where you want to keep the TDB store)
  • set Lucene index location to /tmp/lucene (change this to where you want to keep the Lucene index)
## Example of a TDB dataset and text index published using Fuseki

@prefix :        <#> .
@prefix fuseki:  <http://jena.apache.org/fuseki#> .
@prefix rdf:     <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs:    <http://www.w3.org/2000/01/rdf-schema#> .
@prefix tdb:     <http://jena.hpl.hp.com/2008/tdb#> .
@prefix ja:      <http://jena.hpl.hp.com/2005/11/Assembler#> .
@prefix text:    <http://jena.apache.org/text#> .
@prefix skos:    <http://www.w3.org/2004/02/skos/core#> .


[] rdf:type fuseki:Server ;
   fuseki:services (
     <#service_text_tdb>
   ) .

# TDB
[] ja:loadClass "org.apache.jena.tdb.TDB" .
tdb:DatasetTDB  rdfs:subClassOf  ja:RDFDataset .
tdb:GraphTDB    rdfs:subClassOf  ja:Model .

# Text
[] ja:loadClass "org.apache.jena.query.text.TextQuery" .
text:TextDataset      rdfs:subClassOf   ja:RDFDataset .
#text:TextIndexSolr    rdfs:subClassOf   text:TextIndex .
text:TextIndexLucene  rdfs:subClassOf   text:TextIndex .

## ---------------------------------------------------------------

<#service_text_tdb> rdf:type fuseki:Service ;
    rdfs:label                      "TDB/text service" ;
    fuseki:name                     "ds" ;
    fuseki:serviceQuery             "query" ;
    fuseki:serviceQuery             "sparql" ;
    fuseki:serviceUpdate            "update" ;
    fuseki:serviceUpload            "upload" ;
    fuseki:serviceReadGraphStore    "get" ;
    fuseki:serviceReadWriteGraphStore    "data" ;
    fuseki:dataset                  <#text_dataset> ;
    .

<#text_dataset> rdf:type     text:TextDataset ;
    text:dataset   <#dataset> ;
    ##text:index   <#indexSolr> ;
    text:index     <#indexLucene> ;
    .

<#dataset> rdf:type      tdb:DatasetTDB ;
    tdb:location "/tmp/tdb" ;
    tdb:unionDefaultGraph true ;
    .

<#indexSolr> a text:TextIndexSolr ;
    #text:server <http://localhost:8983/solr/COLLECTION> ;
    text:server <embedded:SolrARQ> ;
    text:entityMap <#entMap> ;
    .

<#indexLucene> a text:TextIndexLucene ;
    text:directory <file:/tmp/lucene> ;
    ##text:directory "mem" ;
    text:entityMap <#entMap> ;
    text:storeValues true ; ## required for Skosmos 1.4
    .

# Text index configuration for Skosmos 1.4 and above (requires Fuseki 1.3.0+ or 2.3.0+)
<#entMap> a text:EntityMap ;
    text:entityField      "uri" ;
    text:graphField       "graph" ; ## enable graph-specific indexing
    text:defaultField     "pref" ; ## Must be defined in the text:map
    text:uidField         "uid" ; ## recommended for Skosmos 1.4+
    text:langField        "lang" ; ## required for Skosmos 1.4
    text:map (
         # skos:prefLabel
         [ text:field "pref" ;
           text:predicate skos:prefLabel ;
           text:analyzer [ a text:LowerCaseKeywordAnalyzer ]
         ]
         # skos:altLabel
         [ text:field "alt" ;
           text:predicate skos:altLabel ;
           text:analyzer [ a text:LowerCaseKeywordAnalyzer ]
         ]
         # skos:hiddenLabel
         [ text:field "hidden" ;
           text:predicate skos:hiddenLabel ;
           text:analyzer [ a text:LowerCaseKeywordAnalyzer ]
         ]
         # skos:notation
         [ text:field "notation" ;
           text:predicate skos:notation ;
           text:analyzer [ a text:LowerCaseKeywordAnalyzer ]
         ]
     ) . 

Save this as jena-text-config.ttl and now you can run Fuseki with ./fuseki-server --config jena-text-config.ttl

In order to get fuseki to use this config file by default, add the following line to /etc/environment: FUSEKI_CONF="/actual/full/path/goes/here/jena-text-config.ttl"

Alternative configurations

The above configuration is the suggested starting point but other configurations can be used as well, for example when other applications are also using the jena-text index. Skosmos (version 1.4+) has the following requirements for the jena-text configuration:

  1. The properties skos:prefLabel, skos:altLabel and skos:hiddenLabel MUST be indexed. They SHOULD be configured with different field names. Other properties SHOULD NOT be configured to share the same field name with these properties.
  2. Alternative analyzer configurations can be used instead of LowerCaseKeywordAnalyzer, but the analyzer MUST be case-insensitive.
  3. The analyzer configuration SHOULD be the same for all SKOS properties (prefLabel, altLabel and hiddenLabel).
  4. text:storeValues MUST be true.
  5. text:langField MUST be set to a unique field name.
  6. text:graphField SHOULD be set to a unique field name.
  7. text:uidField SHOULD be set to a unique field name.
  8. The text:defaultField setting is not used by Skosmos but jena-text itself requires that it MUST be set to one of the configured field names.

In the above requirements, "MUST", "SHOULD" and "SHOULD NOT" are to be interpreted according to RFC 2119. In practice, the performance of Skosmos may not be optimal if the "SHOULD" and "SHOULD NOT" requirements are not followed.

Tuning

See FusekiTuning for tips on tuning Fuseki for production use.