Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MarkdownWriter shouldn't dump pre-trained entities #4877

Closed
akelad opened this issue Nov 29, 2019 · 5 comments · Fixed by #5001
Closed

MarkdownWriter shouldn't dump pre-trained entities #4877

akelad opened this issue Nov 29, 2019 · 5 comments · Fixed by #5001
Assignees
Labels
area:rasa-x/backend 🎩 All issues focused on the Rasa X backend type:bug 🐛 Inconsistencies or issues which will cause an issue or problem for users or implementors.

Comments

@akelad
Copy link
Contributor

akelad commented Nov 29, 2019

I noticed this on Rasa X, that if duckling or ner_spacy entities are recognised and you save it, these entities will get saved to the markdown file. These entities should be filtered out since we don't want to train the crf on it.

I'm not sure how we persist this training data to the database on Rasa X, but there might be something we need to change there too

@akelad akelad added the type:bug 🐛 Inconsistencies or issues which will cause an issue or problem for users or implementors. label Nov 29, 2019
@wochinge wochinge added the area:rasa-x/backend 🎩 All issues focused on the Rasa X backend label Nov 29, 2019
@ricwo ricwo self-assigned this Dec 9, 2019
@ricwo
Copy link
Contributor

ricwo commented Dec 14, 2019

@akelad these entities aren't used for training of the CRF: #898

I think dumping them should be okay then, no? we explicitly removed this from rasa X at some point ages ago: https://github.com/RasaHQ/nlu-trainer-backend/pull/42

@erohmensing
Copy link
Contributor

@ricwo how does that work? We just exported the training data from the instance to use it in the repo, and it was exported with duckling entities annotated as CRF ones:

- [$0.00](amount-of-money:0)
- [$1000](amount-of-money:1000)
- [0](number)

rasa will think those are CRF entities won't it?

@erohmensing
Copy link
Contributor

synonyms from duckling get dumped as well:

## synonym:1
- SINGLE
- one
- single

@erohmensing
Copy link
Contributor

does not look to be duckling-specific:

- [BCBSM](company:bcbsm)[BCBSM](ORG)
- [Helvetia](company:helvetia)[Helvetia](GPE)

@erohmensing
Copy link
Contributor

Not sure if this is related. It seems that by default the case insensitivity is leading to synonym dumps for entities that have capitals in them:

## synonym:jenny
- Jenny

## synonym:jim
- Jim

## synonym:jim halpert
- Jim Halpert

## synonym:john smith
- John Smith

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:rasa-x/backend 🎩 All issues focused on the Rasa X backend type:bug 🐛 Inconsistencies or issues which will cause an issue or problem for users or implementors.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants