Skip to content

Latest commit

 

History

History
355 lines (306 loc) · 9.41 KB

score.md

File metadata and controls

355 lines (306 loc) · 9.41 KB

Score API (beta)

Score API is a collection of endpoints for calculating standard AI performance metrics on user-provided inputs.

Machine Translation

List of available scores:

Basic usage

To compare machine translation with a reference translation, send POST request to https://api.inten.to/evaluate/score.

curl  -XPOST -H 'apikey: YOUR_API_KEY' 'https://api.inten.to/evaluate/score'  -d '{
    "data": {
        "items": [
            "A sample text",
            "Some other text"
        ],
        "reference": [
            "Not a sample text",
            "Some other context"
        ],
        "lang": "en"
    },
    "scores": [
        {
            "name": "hlepor",
            "ignore_errors": true
        },
        {
            "name": "ter",
            "ignore_errors": true
        }
    ],
    "itemize": false,
    "async": true
}'

Specify request payload:

  • items - results of the machine translation
  • reference - ground truth translation.
  • lang - language of translation.
  • source - optional parameter for COMET score. See below for more details.
  • scores - list of scoring metrics to calculate.

Setting ignore_errors to true in the scores array, makes the API to return results regardless of consecutive errors.

Note: evaluation endpoints are supported only in async mode.

The response contains id of the operation:

{ "id": "ea1684f1-4ec7-431d-9b7e-bfbe98cf0bda" }

Wait for processing to complete. To retrieve the result of the operation, make a GET request to the https://api.inten.to/evaluate/score/YOUR_OPERATION_ID. TTL of the resource is 30 days.

{
    "id": "ea1684f1-4ec7-431d-9b7e-bfbe98cf0bda",
    "response": {
        "results": {
            "scores": [
                {
                    "name": "hlepor",
                    "value": 0.755
                },
                {
                    "name": "ter",
                    "value": 28.571
                }
            ]
        }
    },
    "error": null,
    "done": true
}

To get scores for each pair of items (machine translation and reference), set itemize flag to true. In this case results would be:

{
    "id": "ea1684f1-4ec7-431d-9b7e-bfbe98cf0bda",
    "response": {
        "results": {
            "scores": [
                {
                    "value": 0.770527098111673,
                    "name": "hlepor"
                },
                {
                    "value": 0.740740740740741,
                    "name": "hlepor"
                },
                {
                    "value": 25.0,
                    "name": "ter"
                },
                {
                    "value": 33.33333333333333,
                    "name": "ter"
                }
            ]
        }
    },
    "error": null,
    "done": true
}

COMET

Crosslingual Optimized Metric for Evaluation of Translation (COMET) is a metric that achieve high levels of correlation with different types of human judgements. COMET takes 4 parameters as an input:

  1. sources - a list of source sentences
  2. items - results of the machine translation
  3. reference - ground truth translation
  4. lang - language of translation

Setting ignore_errors to true in the scores array, makes the API to return results regardless of consecutive errors.

Note: evaluation endpoints are supported only in async mode.

Usage

curl  -XPOST -H 'apikey: YOUR_API_KEY' 'https://api.inten.to/evaluate/score'  -d '{
    "data": {
        "items": [
            "A sample text",
            "Some other text"
        ],
        "reference": [
            "Not a sample text",
            "Some other context"
        ],
        "source": [
            "Un texto de muestra",
            "Algún otro texto"
        ],
        "lang": "en"
    },
    "scores": [
        {
            "name": "comet",
            "ignore_errors": true
        }
    ],
    "itemize": false,
    "async": true
}'

The response contains id of the operation:

{ "id": "c74934b3-89e9-463e-b358-335c7c717f02" }

Wait for processing to complete.

{
    "id": "c74934b3-89e9-463e-b358-335c7c717f02",
    "done": true,
    "response": {
        "results": {
            "scores": [
                {
                    "value": {
                        "segment_scores": [
                            0.3271389603614807,
                            0.08198270201683044
                        ],
                        "corpus_scores": [
                            0.20456083118915558
                        ],
                        "return_hash": "wmt20-comet-da"
                    },
                    "name": "comet"
                }
            ]
        },
        "type": "scores"
    },
    "meta": {},
    "error": null
}

COMET-QE

Crosslingual Optimized Metric for Evaluation of Translation (COMET) is a metric that achieve high levels of correlation with different types of human judgements. Intento uses wmt20-comet-qe-da model.

COMET-QE takes 3 parameters as an input:

  1. items - results of the machine translation
  2. reference - ground truth translation.
  3. lang - language of translation.

Setting ignore_errors to true in the scores array, makes the API to return results regardless of consecutive errors.

Note: evaluation endpoints are supported only in async mode.

Usage

curl  -XPOST -H 'apikey: YOUR_API_KEY' 'https://api.inten.to/evaluate/score'  -d '{
    "data": {
        "items": [
            "A sample text",
            "Some text"
        ],
        "reference": [],
        "source": [
            "Un texto de muestra",
            "Algún otro texto"
        ],
        "lang": "en"
    },
    "scores": [
        {
            "name": "comet-mtqe",
            "ignore_errors": true
        }
    ],
    "itemize": false,
    "async": true
}'

Note: reference field is required but can be an empty list.

The response contains id of the operation:

{ "id": "6f409125-a8c2-4ffb-b42d-99f2d9d14f7a" }

Wait for processing to complete.

{
    "id": "6f409125-a8c2-4ffb-b42d-99f2d9d14f7a",
    "done": true,
    "response": {
        "results": {
            "scores": [
                {
                    "value": {
                        "segment_scores": [
                            0.5931146740913391,
                            4.1349710954818875e-05
                        ],
                        "corpus_scores": [
                            0.29657801190114697
                        ],
                        "return_hash": "wmt20-comet-qe-da"
                    },
                    "name": "comet-mtqe"
                }
            ]
        },
        "type": "scores"
    },
    "meta": {},
    "error": null
}

LaBSE

The language-agnostic BERT sentence embedding encodes text into high dimensional vectors. The model is trained and optimized to produce similar representations exclusively for bilingual sentence pairs that are translations of each other. So it can be used for mining for translations of a sentence in a larger corpus. Intento uses a port of the LaBSE model to PyTorch from huggingface.

LaBSE takes 3 parameters as an input:

  1. items - results of the machine translation
  2. source - a list of source sentences
  3. lang - language of translation

Setting ignore_errors to true in the scores array, makes the API to return results regardless of consecutive errors.

Note: evaluation endpoints are supported only in async mode.

Usage

curl  -XPOST -H 'apikey: YOUR_API_KEY' 'https://api.inten.to/evaluate/score'  -d '{
    "data": {
        "items": [
            "A sample text",
            "Some text"
        ],
        "reference": [],
        "source": [
            "Un texto de muestra",
            "Algún otro texto"
        ],
        "lang": "en"
    },
    "scores": [
        {
            "name": "labse",
            "ignore_errors": true
        }
    ],
    "itemize": false,
    "async": true
}'

Note: reference field is required but can be an empty list.

The response contains id of the operation:

{ "id": "8ab6da9a-c218-405d-88d6-a7d48b507c54" }

Wait for processing to complete.

{
    "id": "8ab6da9a-c218-405d-88d6-a7d48b507c54",
    "done": true,
    "response": {
        "results": {
            "scores": [
                {
                    "value": 0.922886312007904,
                    "name": "LaBSE"
                },
                {
                    "value": 0.7924383878707886,
                    "name": "LaBSE"
                }
            ]
        },
        "type": "scores"
    },
    "error": null,
    "meta": {}
}