Skip to content

Latest commit

 

History

History
440 lines (324 loc) · 10.5 KB

api.md

File metadata and controls

440 lines (324 loc) · 10.5 KB

SoDA API

This document describes the SoDA API. The SoDA API uses JSON over HTTP and is thus language agnostic. A SoDA request is built as a JSON document and is sent to the SoDA JSON endpoint over HTTP POST. SoDA responds with another JSON document indicating success or failure.

Table of Contents


The following section provides details about each of the endpoints.

Index

INPUT This just returns a status OK JSON message. It is meant to test if the SoDA web component is alive.

URL http://host:port/soda/index.json

INPUT

None

OUTPUT

    {
        "status": "ok",
        "message": "SoDA accepting requests (Solr version 7.3.1)"
    }

EXAMPLE PYTHON CLIENT

    import sodaclient

    client = sodaclient.SodaClient("http://host:port/soda")
    resp = client.index()

EXAMPLE SCALA CLIENT

    import com.elsevier.soda.messages._
    import com.elsevier.soda.SodaClient

    val client = new SodaClient("http://host:port/soda")
    val resp: IndexResponse = client.index()

Add Lexicon Entries

Adds new entries to a named Lexicon.

URL http://host:port/soda/add.json

INPUT

    {
        "lexicon" : "countries", 
        "id" : "http://www.geonames.org/CHN",
        "names" : ["China", "Chine", "CHN"],
        "commit" : true
    }

The id value we have chosen to use is the RDF URI of the entity as reported in the imported lexicon. The names are the strings to match for that entity, a single entry can have multiple names. The commit is optional, if omitted, each addition operation results in a commit, which is inefficient. It is better to either commit at regular intervals, and once at the end. In order to send a commit request using the save.json endpoint, omit the id and names entries, like this:

    {
        "lexicon" : "countries", 
        "commit" : true
    }

OUTPUT

    {
        "status": "ok",
        "payload": {
            "lexicon" : "countries", 
            "id" : "http://www.geonames.org/CHN",
            "names" : ["China", "Chine", "CHN"],
            "commit" : true
        }
    }

EXAMPLE PYTHON CLIENT

    import sodaclient

    client = sodaclient.SodaClient("http://host:port/soda")
    resp = client.add(lexicon_name, id, names, commit)

EXAMPLE SCALA CLIENT

    import com.elsevier.soda.messages._
    import com.elsevier.soda.SodaClient

    val client = new SodaClient("http://host:port/soda")
    val resp: AddResponse = client.add(lexiconName, id, names, commit)

Delete Lexicon

A single SoDA index can contain entries from multiple lexicons. This operation deletes all entries in a Lexicon, or a single ID if an ID is specified in the request.

URL http://host:port/soda/delete.json

INPUT

    { 
        "lexicon" : "countries",
        "id": "http://www.geonames.org/CHN"
    }

OUTPUT

    {
        "status": "ok",
        "payload": {
            "lexicon" : "countries",
            "id": "http://www.geonames.org/CHN"
        }
    }

EXAMPLE PYTHON CLIENT

    import sodaclient

    client = sodaclient.SodaClient("http://host:port/soda")
    resp = client.delete(lexicon_name, optional_id)

EXAMPLE SCALA CLIENT

    import com.elsevier.soda.messages._
    import com.elsevier.soda.SodaClient

    val client = new SodaClient("http://host:port/soda")
    val resp: DeleteResponse = client.delete(lexiconName, optionalID)

Annotate Document

Annotates text against a specified lexicon and match type. Match type can be one of the following.

  • exact - matches text against the FST maintained in Solr by SolrTextTagger. This will match segments in text that are identical to a dictionary entry.
  • lower - same as exact, but matches are now case insensitive.
  • stop - same as lower, but with standard English stopwords removed.
  • stem1 - same as stop, but with the Solr Minimal English stemmer applied.
  • stem2 - same as stop, but with the KStem stemmer applied.
  • stem3 - same as stop, but with the Porter stemmer applied.

URL http://host:port/soda/annot.json

INPUT

    {
        "lexicon" : "countries",
        "text" : "Institute of Clean Coal Technology, East China University of Science and Technology, Shanghai 200237, China",
        "matching" : "exact"
    }

OUTPUT

    {
        "status": "ok",
        "annotations": [
            {
                "id" : "http://www.geonames.org/CHN", 
                "lexicon" : "countries", 
                "begin" : 41, 
                "end" : 46,
                "coveredText" : "China", 
                "confidence" : "1.0"
            }, 
            {
                "id" : "http://www.geonames.org/CHN", 
                "lexicon" : "countries",
                "begin" : 102, 
                "end" : 107,
                "coveredText" : "China", 
                "confidence" : "1.0"
            }
        ]
    }

EXAMPLE PYTHON CLIENT

    import sodaclient

    client = sodaclient.SodaClient("http://host:port/soda")
    resp = client.annot(lexicon_name, text, matching)

EXAMPLE SCALA CLIENT

    import com.elsevier.soda.messages._
    import com.elsevier.soda.SodaClient

    val client = new SodaClient("http://host:port/soda")
    val resp: AnnotResponse = client.annot(lexicon, text, matching)

List Lexicons

Returns a list of lexicons available to annotate against. Currently we only allow the ability to annotate documents against a single lexicon. When requiring annotations against multiple documents, it is recommended to annotate documents separately against each lexicon, then merge the annotations.

URL http://host:port/soda/dicts.json

INPUT

None

OUTPUT

    {
        "status": "ok",
        "lexicons": [
            {
                "lexicon": "countries",
                "count": 248
            }
        ]
    }

EXAMPLE PYTHON CLIENT

    import sodaclient

    client = sodaclient.SodaClient("http://host:port/soda")
    resp = client.dicts()

EXAMPLE SCALA CLIENT

    import com.elsevier.soda.messages._
    import com.elsevier.soda.SodaClient

    val client = new SodaClient()
    val resp: DictResponse = client.dicts()

Coverage Info

This can be used to find which lexicons are appropriate for annotating your text. The service allows you to send a piece of text to all hosted lexicons and returns with the number of matches found in each.

URL http://host:port/soda/coverage.json

INPUT

    {
        "text" : "Institute of Clean Coal Technology, East China University of Science and Technology, Shanghai 200237, China",
        "matching": "exact"
    }

OUTPUT

    {
        "status": "ok",
        "lexicons": [
            {
                "lexicon": "countries",
                "count": 2
            }
        ]
    }

EXAMPLE PYTHON CLIENT

    import sodaclient

    client = sodaclient.SodaClient("http://host:port/soda")
    resp = client.coverage(text, matching)

EXAMPLE SCALA CLIENT

    import com.elsevier.soda.messages._
    import com.elsevier.soda.SodaClient

    val client = new SodaClient("http://host:port/soda")
    val resp: CoverageResponse = client.coverage(text, matching)

Lookup Dictionary Entry

This service allows client to look up a dictionary entry from the index by lexicon and ID.

URL http://host:port/lookup.json

INPUT

    {
        "lexicon": "countries",
        "id": "http://www.geonames.org/CHN"
    }

OUTPUT

    {
        "status": "ok",
        "entries": [
            {
                "lexicon": "countries",
                "id": "http://www.geonames.org/CHN",
                "names": [ "China", "Chine", "CHN" ]
            }
        ]
    }

EXAMPLE_PYTHON_CLIENT

    import sodaclient

    client = sodaclient.SodaClient("http://host:port/soda")
    resp = client.lookup(lexicon, id)

EXAMPLE_SCALA_CLIENT

    import com.elsevier.soda.messages._
    import com.elsevier.soda.SodaClient

    val client = new SodaClient("http://host:port/soda")
    val resp: LookupResponse = client.lookup(lexicon, id)

Reverse Lookup Phrase against Dictionary

This service allows non-streaming matching of phrases against entries in the dictionary. It allows all the matching types allowed by the annot service. In addition, it allows sorted matching against a lowercased and Porter stemmed version of dictionary entries (lsort and s3sort respectively). Full list of matching modes are listed below.

  • exact - case-sensitive partial or full match of phrase against dictionary.
  • lower - same as exact, but matches are now case insensitive.
  • stop - same as lower, but with standard English stopwords removed.
  • stem1 - same as stop, but with the Solr Minimal English stemmer applied.
  • stem2 - same as stop, but with the KStem stemmer applied.
  • stem3 - same as stop, but with the Porter stemmer applied.
  • lsort - same as lower, but with tokens in phrase sorted alphabetically.
  • s3sort - same as stem3, but with tokens in phrase sorted alphabetically.

URL http://host:port/rlookup.json

INPUT

    {
        "lexicon" : "countries",
        "phrase": "emirates",
        "matching": "lower"
    }

OUTPUT

    {
        "status": "ok",
        "entries": [
            {
                "id": "http://test-countries.com/ARE",
                "lexicon": "countries",
                "text": "United Arab Emirates",
                "confidence": 0.4
            }
        ]
    }

EXAMPLE_PYTHON_CLIENT

    import sodaclient

    client = sodaclient.SodaClient("http://host:port/soda")
    resp = client.rlookup(lexicon, "emirates", "lower")

EXAMPLE_SCALA_CLIENT

    import com.elsevier.soda.messages._
    import com.elsevier.soda.SodaClient

    val client = new SodaClient("http://host:port/soda")
    val resp: ReverseLookupResponse = client.lookup(lexicon, "emirates", "lower")