Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter out pre-trained entities before CRF entity training #898

Merged
merged 8 commits into from
Mar 13, 2018

Conversation

ricwo
Copy link
Contributor

@ricwo ricwo commented Mar 12, 2018

Proposed changes:

  • Extractor objects have the option to filter out pretrained entity examples
  • implemented in the CRFEntityExtractor

Status (please check what you already did):

  • made PR ready for code review
  • added some tests for the functionality
  • updated the documentation
  • updated the changelog

@ricwo ricwo changed the title entity filtering Filter out pre-trained entities before CRF entity training Mar 12, 2018
extractor = ent.get("extractor")
if not extractor or extractor == self.name:
entities.append(ent)
message.set("entities", entities)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this will work: This will remove all entities from the message that do not belong to the crf extractor, which is fine. The issue is, that this is modifying the message object directly, which will lead to the entities being removed for every component that comes after the crf component. So if there is any other component that needs some entity annotations after the crf, the entities will be missing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point 👍

@@ -25,3 +26,20 @@ def add_processor_name(self, entity):
else:
entity["processors"] = [self.name]
return entity

def filter_trainable_entities(self, entity_examples):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add a comment

@tmbo tmbo merged commit 6b596f4 into master Mar 13, 2018
@tmbo tmbo deleted the filter-entities branch March 13, 2018 20:58
@tmbo
Copy link
Member

tmbo commented Mar 13, 2018

Nice work in implementing that filtering, thanks Rick 👍

taytzehao pushed a commit to taytzehao/rasa that referenced this pull request Jul 14, 2023
We should reuse an existing context logger if in test context.
This will allow test to setup act with a null logger to assert
log messages.

Co-authored-by: Markus Wolf <markus.wolf@new-work.se>

Co-authored-by: Markus Wolf <markus.wolf@new-work.se>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants