Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different result when data is trained with Mittie and spaCy + scikit-learn backend ? #164

Closed
Shravan40 opened this issue Feb 19, 2017 · 1 comment

Comments

@Shravan40
Copy link

I am getting two different result when data is trained using Mittie and spaCy + scikit-learn

Response of Mittie trained data

{
  "text": "Dear Customer, You have made a Debit Card purchase of INR 962.00 on 26 Oct. Info.VPS Brown House. Your Net Available Balance is INR 5,584.58.",
  "confidence": 0.21797336872859088,
  "intent": "parseSms",
  "entities": [
    {
      "start": 31,
      "end": 41,
      "value": "Debit Card",
      "entity": "cardType"
    },
    {
      "start": 42,
      "end": 53,
      "value": "purchase of",
      "entity": "transactionType"
    },
    {
      "start": 54,
      "end": 57,
      "value": "INR",
      "entity": "spendCurrency"
    },
    {
      "start": 58,
      "end": 64,
      "value": "962.00",
      "entity": "spendAmount"
    },
    {
      "start": 85,
      "end": 96,
      "value": "Brown House",
      "entity": "merchantsName"
    },
    {
      "start": 54,
      "end": 57,
      "value": "INR",
      "entity": "balanceCurrency"
    },
    {
      "start": 132,
      "end": 141,
      "value": "5,584.58.",
      "entity": "balanceAmount"
    }
  ]
}

Response of spaCy + scikit-learn trained data

{
  "text": "Dear Customer, You have made a Debit Card purchase of INR 962.00 on 26 Oct. Info.VPS Brown House. Your Net Available Balance is INR 5,584.58.",
  "confidence": 0.9703864081830181,
  "intent": "parseSms",
  "entities": [
    {
      "start": 0,
      "end": 13,
      "value": "Dear Customer",
      "entity": "cardType"
    },
    {
      "start": 13,
      "end": 18,
      "value": ", You",
      "entity": "cardType"
    },
    {
      "start": 19,
      "end": 28,
      "value": "have made",
      "entity": "cardType"
    },
    {
      "start": 29,
      "end": 41,
      "value": "a Debit Card",
      "entity": "cardType"
    },
    {
      "start": 42,
      "end": 53,
      "value": "purchase of",
      "entity": "transactionType"
    },
    {
      "start": 54,
      "end": 57,
      "value": "INR",
      "entity": "balanceAmount"
    },
    {
      "start": 58,
      "end": 64,
      "value": "962.00",
      "entity": "cardNumber"
    },
    {
      "start": 65,
      "end": 67,
      "value": "on",
      "entity": "balanceCurrency"
    },
    {
      "start": 68,
      "end": 70,
      "value": "26",
      "entity": "balanceCurrency"
    },
    {
      "start": 71,
      "end": 75,
      "value": "Oct.",
      "entity": "spendCurrency"
    },
    {
      "start": 76,
      "end": 80,
      "value": "Info",
      "entity": "accountNumber"
    },
    {
      "start": 80,
      "end": 81,
      "value": ".",
      "entity": "balanceAmount"
    },
    {
      "start": 81,
      "end": 84,
      "value": "VPS",
      "entity": "dueCurrency"
    },
    {
      "start": 85,
      "end": 90,
      "value": "Brown",
      "entity": "spendCurrency"
    },
    {
      "start": 91,
      "end": 96,
      "value": "House",
      "entity": "limitAmount"
    },
    {
      "start": 96,
      "end": 97,
      "value": ".",
      "entity": "cardNumber"
    },
    {
      "start": 98,
      "end": 106,
      "value": "Your Net",
      "entity": "cardNumber"
    },
    {
      "start": 107,
      "end": 116,
      "value": "Available",
      "entity": "dueCurrency"
    },
    {
      "start": 117,
      "end": 124,
      "value": "Balance",
      "entity": "cardNumber"
    },
    {
      "start": 125,
      "end": 127,
      "value": "is",
      "entity": "cardNumber"
    },
    {
      "start": 128,
      "end": 131,
      "value": "INR",
      "entity": "limitCurrency"
    },
    {
      "start": 132,
      "end": 140,
      "value": "5,584.58",
      "entity": "rdAccountNumber"
    },
    {
      "start": 140,
      "end": 141,
      "value": ".",
      "entity": "balanceCurrency"
    }
  ]
}

In my case Mittie results are more accurate but it's taking too long time to train data (24 hours type), where as spaCy + scikit-learn train the data in less than a minute.

@tmbo
Copy link
Member

tmbo commented Feb 23, 2017

This is expected. The underlying libraries use different models for entity recognition.

For the spacy backend a suggested amount of training data per entity is around 5000 samples (see explosion/spaCy#773). Hence it might very well be that spacy is faster but bad and mitie is slow but might perform better on your data.

@tmbo tmbo closed this as completed Feb 23, 2017
vcidst added a commit that referenced this issue Feb 22, 2024
* do not unpack json payload if data key is not present

* add room arg to else branch

* prepared release of version 3.7.3.dev1 (#151)

* Prepare-release-3.7.3.dev2 (#164)

* prepared release of version 3.7.3.dev2

* allow dev releases without changelogs

* add changelog entry

---------

Co-authored-by: Shailendra Paliwal <hello@shailendra.me>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants