-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for writing back NLU intent/example metadata to YAML #7731
Comments
Open QuestionsRendering intent metadataWe don't have a Python-object representation for the intent itself when parsing an NLU file, thus we put the metadata on the intent level in each example of that intent: from rasa.shared.nlu.training_data.formats.rasa_yaml import RasaYAMLReader
yaml_string = f"""version: "2.0"
nlu:
- intent: greet
metadata:
sentiment: neutral
examples: |
- hi
- hello
"""
training_data = RasaYAMLReader().reads(yaml_string)
training_data.training_examples[0].as_dict()
# {'text': 'hi',
# 'intent': 'greet',
# 'metadata': {'intent': {'sentiment': 'neutral'}}}
training_data.training_examples[0].as_dict()
# {'text': 'hello',
# 'intent': 'greet',
# 'metadata': {'intent': {'sentiment': 'neutral'}}} This opens up a question on how the We can: b) be a bit more defensive and try to collect all intent metadata from the examples of a given intent and try to merge them together (shallow / deep merge?). Update: The RasaYAMLWriter can assume that all intent metadata from the examples belonging to the same intent are identical, thus it's fine just to take the first one. Data type of metadataThe docs currently say that:
There is however one test case in the code which has a list of strings as the value of the Which one of the two is the truth? Only allowing key-value objects (i.e. Python Update: The metadata can be any data type that is supported by YAML including maps, lists, strings, numbers, etc. Preserving the YAML structure for examples without metadataThe YAML structure looks different depending if we have example metadata or not. Given that we have metadata on individual examples (or if at least one of the examples has metadata) the YAML structure looks like this: With metadata on examples it would be (example 1): version: "2.0"
nlu:
- intent: greet
examples:
- text: |
hi
metadata:
sentiment: neutral
- text: |
hello
# ... If we don't have any metadata on the examples, then we can use a less verbose YAML structure (example 2): version: "2.0"
nlu:
- intent: greet
examples: |
- hi
- hello So far the version: "2.0"
nlu:
- intent: greet
examples:
- text: |
hi
- text: |
hello Update: The YAML output should be identical to the input. |
As I followed the initial implementation by @degiz , let me share a few thoughts:
(also cc'ing you @tmbo in case you miss reasoning about training data format 😅 ) |
Summary from a call w/ @degiz today:
|
Description of Problem:
Rasa 2.0 introduced support for metadata on NLU intents and examples (reference), but so far only the
RasaYAMLReader
supports parsing this, theRasaYAMLWriter
is not able to write it back to YAML files.This came out of https://github.com/RasaHQ/rasa-x/issues/4180.
Overview of the Solution:
Support for intent and example metadata needs to be added to
RasaYAMLWriter.process_training_examples_by_key
(src).Considering this YAML structure:
The parser returns:
Rendering the example metadata (the
dict
with"capitalization"
) should probably be fairly straight forward without too many questions to figure out upfront.The intent metadata (the
dict
with"sentiment"
) however is duplicated on each example which does raise a few questions.Examples (if relevant):
(hat tip to @dakshvar22 for the code)
Blockers (if relevant):
Definition of Done:
The text was updated successfully, but these errors were encountered: