This document describes the SoDA API. The SoDA API uses JSON over HTTP and is thus language agnostic. A SoDA request is built as a JSON document and is sent to the SoDA JSON endpoint over HTTP POST. SoDA responds with another JSON document indicating success or failure.
- Index
- Add Lexicon Entries
- Delete Lexicon (Entries)
- Annotate
- List Lexicons
- Coverage Info
- Lookup Dictionary Entry
- Reverse Lookup Phrase against Dictionary
The following section provides details about each of the endpoints.
INPUT This just returns a status OK JSON message. It is meant to test if the SoDA web component is alive.
URL http://host:port/soda/index.json
INPUT
None
OUTPUT
{
"status": "ok",
"message": "SoDA accepting requests (Solr version 7.3.1)"
}
EXAMPLE PYTHON CLIENT
import sodaclient
client = sodaclient.SodaClient("http://host:port/soda")
resp = client.index()
EXAMPLE SCALA CLIENT
import com.elsevier.soda.messages._
import com.elsevier.soda.SodaClient
val client = new SodaClient("http://host:port/soda")
val resp: IndexResponse = client.index()
Adds new entries to a named Lexicon.
URL http://host:port/soda/add.json
INPUT
{
"lexicon" : "countries",
"id" : "http://www.geonames.org/CHN",
"names" : ["China", "Chine", "CHN"],
"commit" : true
}
The id value we have chosen to use is the RDF URI of the entity as reported in the imported lexicon. The names are the strings to match for that entity, a single entry can have multiple names. The commit is optional, if omitted, each addition operation results in a commit, which is inefficient. It is better to either commit at regular intervals, and once at the end. In order to send a commit request using the save.json endpoint, omit the id and names entries, like this:
{
"lexicon" : "countries",
"commit" : true
}
OUTPUT
{
"status": "ok",
"payload": {
"lexicon" : "countries",
"id" : "http://www.geonames.org/CHN",
"names" : ["China", "Chine", "CHN"],
"commit" : true
}
}
EXAMPLE PYTHON CLIENT
import sodaclient
client = sodaclient.SodaClient("http://host:port/soda")
resp = client.add(lexicon_name, id, names, commit)
EXAMPLE SCALA CLIENT
import com.elsevier.soda.messages._
import com.elsevier.soda.SodaClient
val client = new SodaClient("http://host:port/soda")
val resp: AddResponse = client.add(lexiconName, id, names, commit)
A single SoDA index can contain entries from multiple lexicons. This operation deletes all entries in a Lexicon, or a single ID if an ID is specified in the request.
URL http://host:port/soda/delete.json
INPUT
{
"lexicon" : "countries",
"id": "http://www.geonames.org/CHN"
}
OUTPUT
{
"status": "ok",
"payload": {
"lexicon" : "countries",
"id": "http://www.geonames.org/CHN"
}
}
EXAMPLE PYTHON CLIENT
import sodaclient
client = sodaclient.SodaClient("http://host:port/soda")
resp = client.delete(lexicon_name, optional_id)
EXAMPLE SCALA CLIENT
import com.elsevier.soda.messages._
import com.elsevier.soda.SodaClient
val client = new SodaClient("http://host:port/soda")
val resp: DeleteResponse = client.delete(lexiconName, optionalID)
Annotates text against a specified lexicon and match type. Match type can be one of the following.
- exact - matches text against the FST maintained in Solr by SolrTextTagger. This will match segments in text that are identical to a dictionary entry.
- lower - same as exact, but matches are now case insensitive.
- stop - same as lower, but with standard English stopwords removed.
- stem1 - same as stop, but with the Solr Minimal English stemmer applied.
- stem2 - same as stop, but with the KStem stemmer applied.
- stem3 - same as stop, but with the Porter stemmer applied.
URL http://host:port/soda/annot.json
INPUT
{
"lexicon" : "countries",
"text" : "Institute of Clean Coal Technology, East China University of Science and Technology, Shanghai 200237, China",
"matching" : "exact"
}
OUTPUT
{
"status": "ok",
"annotations": [
{
"id" : "http://www.geonames.org/CHN",
"lexicon" : "countries",
"begin" : 41,
"end" : 46,
"coveredText" : "China",
"confidence" : "1.0"
},
{
"id" : "http://www.geonames.org/CHN",
"lexicon" : "countries",
"begin" : 102,
"end" : 107,
"coveredText" : "China",
"confidence" : "1.0"
}
]
}
EXAMPLE PYTHON CLIENT
import sodaclient
client = sodaclient.SodaClient("http://host:port/soda")
resp = client.annot(lexicon_name, text, matching)
EXAMPLE SCALA CLIENT
import com.elsevier.soda.messages._
import com.elsevier.soda.SodaClient
val client = new SodaClient("http://host:port/soda")
val resp: AnnotResponse = client.annot(lexicon, text, matching)
Returns a list of lexicons available to annotate against. Currently we only allow the ability to annotate documents against a single lexicon. When requiring annotations against multiple documents, it is recommended to annotate documents separately against each lexicon, then merge the annotations.
URL http://host:port/soda/dicts.json
INPUT
None
OUTPUT
{
"status": "ok",
"lexicons": [
{
"lexicon": "countries",
"count": 248
}
]
}
EXAMPLE PYTHON CLIENT
import sodaclient
client = sodaclient.SodaClient("http://host:port/soda")
resp = client.dicts()
EXAMPLE SCALA CLIENT
import com.elsevier.soda.messages._
import com.elsevier.soda.SodaClient
val client = new SodaClient()
val resp: DictResponse = client.dicts()
This can be used to find which lexicons are appropriate for annotating your text. The service allows you to send a piece of text to all hosted lexicons and returns with the number of matches found in each.
URL http://host:port/soda/coverage.json
INPUT
{
"text" : "Institute of Clean Coal Technology, East China University of Science and Technology, Shanghai 200237, China",
"matching": "exact"
}
OUTPUT
{
"status": "ok",
"lexicons": [
{
"lexicon": "countries",
"count": 2
}
]
}
EXAMPLE PYTHON CLIENT
import sodaclient
client = sodaclient.SodaClient("http://host:port/soda")
resp = client.coverage(text, matching)
EXAMPLE SCALA CLIENT
import com.elsevier.soda.messages._
import com.elsevier.soda.SodaClient
val client = new SodaClient("http://host:port/soda")
val resp: CoverageResponse = client.coverage(text, matching)
This service allows client to look up a dictionary entry from the index by lexicon and ID.
URL http://host:port/lookup.json
INPUT
{
"lexicon": "countries",
"id": "http://www.geonames.org/CHN"
}
OUTPUT
{
"status": "ok",
"entries": [
{
"lexicon": "countries",
"id": "http://www.geonames.org/CHN",
"names": [ "China", "Chine", "CHN" ]
}
]
}
EXAMPLE_PYTHON_CLIENT
import sodaclient
client = sodaclient.SodaClient("http://host:port/soda")
resp = client.lookup(lexicon, id)
EXAMPLE_SCALA_CLIENT
import com.elsevier.soda.messages._
import com.elsevier.soda.SodaClient
val client = new SodaClient("http://host:port/soda")
val resp: LookupResponse = client.lookup(lexicon, id)
This service allows non-streaming matching of phrases against entries in the dictionary. It allows all the matching types allowed by the annot service. In addition, it allows sorted matching against a lowercased and Porter stemmed version of dictionary entries (lsort and s3sort respectively). Full list of matching modes are listed below.
- exact - case-sensitive partial or full match of phrase against dictionary.
- lower - same as exact, but matches are now case insensitive.
- stop - same as lower, but with standard English stopwords removed.
- stem1 - same as stop, but with the Solr Minimal English stemmer applied.
- stem2 - same as stop, but with the KStem stemmer applied.
- stem3 - same as stop, but with the Porter stemmer applied.
- lsort - same as lower, but with tokens in phrase sorted alphabetically.
- s3sort - same as stem3, but with tokens in phrase sorted alphabetically.
URL http://host:port/rlookup.json
INPUT
{
"lexicon" : "countries",
"phrase": "emirates",
"matching": "lower"
}
OUTPUT
{
"status": "ok",
"entries": [
{
"id": "http://test-countries.com/ARE",
"lexicon": "countries",
"text": "United Arab Emirates",
"confidence": 0.4
}
]
}
EXAMPLE_PYTHON_CLIENT
import sodaclient
client = sodaclient.SodaClient("http://host:port/soda")
resp = client.rlookup(lexicon, "emirates", "lower")
EXAMPLE_SCALA_CLIENT
import com.elsevier.soda.messages._
import com.elsevier.soda.SodaClient
val client = new SodaClient("http://host:port/soda")
val resp: ReverseLookupResponse = client.lookup(lexicon, "emirates", "lower")