Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider adding support for custom parsers of utterances #256

Open
rozele opened this issue Feb 14, 2020 · 2 comments
Open

Consider adding support for custom parsers of utterances #256

rozele opened this issue Feb 14, 2020 · 2 comments
Assignees

Comments

@rozele
Copy link
Contributor

rozele commented Feb 14, 2020

Today, we expect that the utterances JSON file is always an array of utterances with entities in one of two formats, an NLU.DevOps generic format:

[
   {
      "text": "order pizza",
      "intent": "OrderFood",
      "entities": [
        {
           "matchText": "pizza",
           "entityType": "FoodItem"
        }
      ]
   }
]

Or LUIS batch format:

[
   {
      "text": "order pizza",
      "intent": "OrderFood",
      "entities": [
        {
           "entity": "FoodItem",
           "startPos": 6,
           "endPos": 10
        }
      ]
   }
]

I suspect we can make this a bit simpler and afford an opportunity to leverage other tooling (that is less likely to get out of sync) if we allow dependency injection of the parser for utterances. One potential scenario I'd like to unblock is I'd like to write a simple script that takes a test utterance JSON file and sends the utterances off for prediction against LUIS / Lex / etc., storing the unmodified results directly from LUIS / Lex back in a JSON array.

I.e., could we easily enable something like this:

[
  {
    "query": "order pizza",
    "topScoringIntent": {
      "intent": "OrderFood",
      "score": 0.99999994
    },
    "entities": [
      {
        "entity": "pizza",
        "type": "FoodItem",
        "startIndex": 6,
        "endIndex": 10,
        "score": 0.973820746
      }
    ]
  }
]

We could achieve this with a couple different options.

Option 1, we add some flags to the compare command for how to inject the parser:

dotnet nlu compare \
  --expected tests.json \
  --actual results.json \
  --expectedFormat luis-batch \
  --actualFormat luis-response

Option 2, we add an optional envelope to the utterances JSON file:

{
  "format": "luis-response",
  "utterances": [
    {
      "query": "order pizza",
      "topScoringIntent": {
        "intent": "OrderFood",
        "score": 0.99999994
      },
      "entities": [
        {
          "entity": "pizza",
          "type": "FoodItem",
          "startIndex": 6,
          "endIndex": 10,
          "score": 0.973820746
        }
      ]
    }
  ]
}
@rozele
Copy link
Contributor Author

rozele commented Feb 14, 2020

There are a few benefits to this approach:

  1. You do not need to depend on NLU.DevOps to run your tests. If an NLU provider exposes their own batch API, you could consider using that batch API directly and only use NLU.DevOps for comparing.
  2. Whatever the test results are, we do not lose result data by "lifting" the results to a generic format.

@rozele
Copy link
Contributor Author

rozele commented Mar 6, 2020

Acceptance Criteria

  • dotnet nlu test command returns verbatim results from NLU provider
  • Default parsing (when not otherwise specified in the compare command CLI options or in a data envelope) should support generic utterances and LUIS batch formats. If a parser is specified, we should always support falling back on the default parsing behavior.
  • Should be an optional feature of NLU providers (e.g., luis, luisV3, lex, etc.), option to return generic utterance format from the test command should still exist.
  • For now, let's only support 1 format per NLU provider, so we can continue to use the NLU provider moniker (e.g., luis, luisV3, lex, etc.) to represent the format in the compare command.

Examples
If we use a CLI option:

dotnet nlu test -s luis -u tests.json -o results.json
dotnet nlu compare -s luis -e tests.json -a results.json

The tests.json file may contain generic utterances format, whereas the results.json file will contain raw LUIS responses.

If we use the envelope method:

dotnet nlu test -s luis -u tests.json -o results.json
dotnet nlu compare -e tests.json -a results.json

The tests.json file may contain generic utterances format, whereas the results.json file will contain raw LUIS responses embedded in an envelope, e.g.:

{
  "format": "luis",
  "utterances": [
    {
       /* raw LUIS response */
    }
  ]
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants