-
Notifications
You must be signed in to change notification settings - Fork 0
SPARQL
Using the --sparql-query
option, it is possible to execute custom SPARQL queries during the process, and match various entities handled by the analyzers.
Each query is executed at various points during the analysis. The data available to the query differs based on the presence of the --buffered
option: if the option is present, the query operates on the whole graph, while if the option is not present, only a small section of the data, usually enough to describe a single entity, is used.
In the search
mode, the --sparql-query
option should point to a SELECT
or ASK
query. When a query is evaluated, its results are added to an internal storage, which is serialized to the output file when the process stops.
The evaluation of a query in this mode may also stop the process prematurely if one of these conditions succeeds:
- The query uses
ASK
, and its result is determined to betrue
. - The query uses
LIMIT
, and the number of results exceeds the limit. The process will be stopped in this case only if there are no other queries that may yet produce results, such as queries withoutLIMIT
.
PREFIX schema: <http://schema.org/>
ASK WHERE {
[] schema:encodingFormat <https://w3id.org/uri4uri/mime/image/png> .
}
PREFIX nfo: <http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#>
PREFIX nie: <http://www.semanticdesktop.org/ontologies/2007/01/19/nie#>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX schema: <http://schema.org/>
SELECT DISTINCT ?name ?w ?h
WHERE {
[
nfo:fileName ?name ;
nie:interpretedAs/dcterms:hasFormat [
a schema:ImageObject ;
nfo:width ?w ;
nfo:height ?h
]
] .
}
In the describe
mode, the --sparql-query
option should point to a SELECT
query, which will be used to mark entities that should be matched and extracted if they are backed by binary data. The query should have a variable ?node
, which is compared against the node representing the currently analyzed entity, extracting it as a file if the nodes are equal.
The name of the file can be determined by assigning the ?path_format
variable in the query, which has the default value "${name
${extension}"}. Other properties related to the file may be substituted in ?path_format
, including ${media_type}
or ${size}
.
SELECT ?node ?path_format
WHERE {
?node ?p ?o .
BIND("extracted/${name}${extension}" AS ?path_format)
}
PREFIX nfo: <http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX schema: <http://schema.org/>
SELECT DISTINCT ?node
WHERE {
?node dcterms:hasFormat [
a schema:ImageObject ;
nfo:width 256 ;
nfo:height 256
]
}