Skip to content

Commit

Permalink
Story structure validation tool (#4968)
Browse files Browse the repository at this point in the history
Implement the functionality under `rasa data validate stories`
  • Loading branch information
Johannes E. M. Mosig authored Feb 25, 2020
1 parent cacdbf6 commit 730fd06
Show file tree
Hide file tree
Showing 15 changed files with 847 additions and 42 deletions.
1 change: 1 addition & 0 deletions changelog/4088.feature.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Add story structure validation functionality (e.g. `rasa data validate stories --max-history 5`).
15 changes: 15 additions & 0 deletions data/test_stories/stories_conflicting_1.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
## story 1
* greet
- utter_greet
* greet
- utter_greet
* greet
- utter_greet

## story 2
* default
- utter_greet
* greet
- utter_greet
* greet
- utter_default
14 changes: 14 additions & 0 deletions data/test_stories/stories_conflicting_2.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
## greetings
* greet
- utter_greet
> check_greet
## happy path
> check_greet
* default
- utter_default

## problem
> check_greet
* default
- utter_goodbye
14 changes: 14 additions & 0 deletions data/test_stories/stories_conflicting_3.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
## greetings
* greet
- utter_greet
> check_greet
## happy path
> check_greet
* default OR greet
- utter_default

## problem
> check_greet
* greet
- utter_goodbye
17 changes: 17 additions & 0 deletions data/test_stories/stories_conflicting_4.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
## story 1
* greet
- utter_greet
* greet
- slot{"cuisine": "German"}
- utter_greet
* greet
- utter_greet

## story 2
* greet
- utter_greet
* greet
- slot{"cuisine": "German"}
- utter_greet
* greet
- utter_default
16 changes: 16 additions & 0 deletions data/test_stories/stories_conflicting_5.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
## story 1
* greet
- utter_greet
* greet
- utter_greet
- slot{"cuisine": "German"}
* greet
- utter_greet

## story 2
* greet
- utter_greet
* greet
- utter_greet
* greet
- utter_default
22 changes: 22 additions & 0 deletions data/test_stories/stories_conflicting_6.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
## story 1
* greet
- utter_greet

## story 2
* greet
- utter_default

## story 3
* greet
- utter_default
* greet

## story 4
* greet
- utter_default
* default

## story 5
* greet
- utter_default
* goodbye
53 changes: 52 additions & 1 deletion docs/user-guide/validate-files.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,8 @@ You can run it with the following command:
rasa data validate
The script above runs all the validations on your files. Here is the list of options to
The script above runs all the validations on your files, except for story structure validation,
which is omitted unless you provide the ``--max-history`` argument. Here is the list of options to
the script:

.. program-output:: rasa data validate --help
Expand Down Expand Up @@ -65,3 +66,53 @@ To use these functions it is necessary to create a `Validator` object and initia
stories='data/stories.md')
validator.verify_all()
Test Story Files for Conflicts
------------------------------

In addition to the default tests described above, you can also do a more in-depth structural test of your stories.
In particular, you can test if your stories are inconsistent, i.e. if different bot actions follow from the same dialogue history.
If this is not the case, then Rasa cannot learn the correct behaviour.

Take, for example, the following two stories:

.. code-block:: md
## Story 1
* greet
- utter_greet
* inform_happy
- utter_happy
- utter_goodbye
## Story 2
* greet
- utter_greet
* inform_happy
- utter_goodbye
These two stories are inconsistent, because Rasa doesn't know if it should predict ``utter_happy`` or ``utter_goodbye``
after ``inform_happy``, as there is nothing that would distinguish the dialogue states at ``inform_happy`` in the two
stories and the subsequent actions are different in Story 1 and Story 2.

This conflict can be automatically identified with our story structure validation tool.
To do this, use ``rasa data validate`` in the command line, as follows:

.. code-block:: bash
rasa data validate stories --max-history 3
> 2019-12-09 09:32:13 INFO rasa.core.validator - Story structure validation...
> 2019-12-09 09:32:13 INFO rasa.core.validator - Assuming max_history = 3
> Processed Story Blocks: 100% 2/2 [00:00<00:00, 3237.59it/s, # trackers=1]
> 2019-12-09 09:32:13 WARNING rasa.core.validator - CONFLICT after intent 'inform_happy':
> utter_goodbye predicted in 'Story 2'
> utter_happy predicted in 'Story 1'
Here we specify a ``max-history`` value of 3.
This means, that 3 events (user messages / bot actions) are taken into account for action predictions, but the particular setting does not matter for this example, because regardless of how long of a history you take into account, the conflict always exists.
.. warning::
The ``rasa data validate stories`` script assumes that all your **story names are unique**.
If your stories are in the Markdown format, you may find duplicate names with a command like
``grep -h "##" data/*.md | uniq -c | grep "^[^1]"``.
108 changes: 90 additions & 18 deletions rasa/cli/data.py
Original file line number Diff line number Diff line change
@@ -1,21 +1,22 @@
import logging
import argparse
import asyncio
import sys
from typing import List

from rasa import data
from rasa.cli.arguments import data as arguments
from rasa.cli.utils import get_validated_path
import rasa.cli.utils
from rasa.constants import DEFAULT_DATA_PATH
from typing import NoReturn
from rasa.validator import Validator
from rasa.importers.rasa import RasaFileImporter

logger = logging.getLogger(__name__)


# noinspection PyProtectedMember
def add_subparser(
subparsers: argparse._SubParsersAction, parents: List[argparse.ArgumentParser]
):
import rasa.nlu.convert as convert

data_parser = subparsers.add_parser(
"data",
conflict_handler="resolve",
Expand All @@ -26,6 +27,17 @@ def add_subparser(
data_parser.set_defaults(func=lambda _: data_parser.print_help(None))

data_subparsers = data_parser.add_subparsers()

_add_data_convert_parsers(data_subparsers, parents)
_add_data_split_parsers(data_subparsers, parents)
_add_data_validate_parsers(data_subparsers, parents)


def _add_data_convert_parsers(
data_subparsers, parents: List[argparse.ArgumentParser]
) -> None:
from rasa.nlu import convert

convert_parser = data_subparsers.add_parser(
"convert",
formatter_class=argparse.ArgumentDefaultsHelpFormatter,
Expand All @@ -45,6 +57,10 @@ def add_subparser(

arguments.set_convert_arguments(convert_nlu_parser)


def _add_data_split_parsers(
data_subparsers, parents: List[argparse.ArgumentParser]
) -> None:
split_parser = data_subparsers.add_parser(
"split",
formatter_class=argparse.ArgumentDefaultsHelpFormatter,
Expand All @@ -65,21 +81,46 @@ def add_subparser(

arguments.set_split_arguments(nlu_split_parser)


def _add_data_validate_parsers(
data_subparsers, parents: List[argparse.ArgumentParser]
) -> None:
validate_parser = data_subparsers.add_parser(
"validate",
formatter_class=argparse.ArgumentDefaultsHelpFormatter,
parents=parents,
help="Validates domain and data files to check for possible mistakes.",
)
_append_story_structure_arguments(validate_parser)
validate_parser.set_defaults(func=validate_files)
arguments.set_validator_arguments(validate_parser)

validate_subparsers = validate_parser.add_subparsers()
story_structure_parser = validate_subparsers.add_parser(
"stories",
formatter_class=argparse.ArgumentDefaultsHelpFormatter,
parents=parents,
help="Checks for inconsistencies in the story files.",
)
_append_story_structure_arguments(story_structure_parser)
story_structure_parser.set_defaults(func=validate_stories)
arguments.set_validator_arguments(story_structure_parser)


def _append_story_structure_arguments(parser: argparse.ArgumentParser) -> None:
parser.add_argument(
"--max-history",
type=int,
default=None,
help="Number of turns taken into account for story structure validation.",
)

def split_nlu_data(args) -> None:

def split_nlu_data(args: argparse.Namespace) -> None:
from rasa.nlu.training_data.loading import load_data
from rasa.nlu.training_data.util import get_file_format

data_path = get_validated_path(args.nlu, "nlu", DEFAULT_DATA_PATH)
data_path = rasa.cli.utils.get_validated_path(args.nlu, "nlu", DEFAULT_DATA_PATH)
data_path = data.get_nlu_directory(data_path)

nlu_data = load_data(data_path)
Expand All @@ -91,22 +132,53 @@ def split_nlu_data(args) -> None:
test.persist(args.out, filename=f"test_data.{fformat}")


def validate_files(args) -> NoReturn:
"""Validate all files needed for training a model.
Fails with a non-zero exit code if there are any errors in the data."""
from rasa.core.validator import Validator
from rasa.importers.rasa import RasaFileImporter
def validate_files(args: argparse.Namespace, stories_only: bool = False) -> None:
"""
Validates either the story structure or the entire project.
Args:
args: Commandline arguments
stories_only: If `True`, only the story structure is validated.
"""
loop = asyncio.get_event_loop()
file_importer = RasaFileImporter(
domain_path=args.domain, training_data_paths=args.data
)

validator = loop.run_until_complete(Validator.from_importer(file_importer))
domain_is_valid = validator.verify_domain_validity()
if not domain_is_valid:
sys.exit(1)

everything_is_alright = validator.verify_all(not args.fail_on_warnings)
sys.exit(0) if everything_is_alright else sys.exit(1)
if stories_only:
all_good = _validate_story_structure(validator, args)
else:
all_good = (
_validate_domain(validator)
and _validate_nlu(validator, args)
and _validate_story_structure(validator, args)
)

if not all_good:
rasa.cli.utils.print_error_and_exit("Project validation completed with errors.")


def validate_stories(args: argparse.Namespace) -> None:
validate_files(args, stories_only=True)


def _validate_domain(validator: Validator) -> bool:
return validator.verify_domain_validity()


def _validate_nlu(validator: Validator, args: argparse.Namespace) -> bool:
return validator.verify_nlu(not args.fail_on_warnings)


def _validate_story_structure(validator: Validator, args: argparse.Namespace) -> bool:
# Check if a valid setting for `max_history` was given
if isinstance(args.max_history, int) and args.max_history < 1:
raise argparse.ArgumentTypeError(
f"The value of `--max-history {args.max_history}` is not a positive integer.",
)

return validator.verify_story_structure(
not args.fail_on_warnings, max_history=args.max_history
)
Loading

0 comments on commit 730fd06

Please sign in to comment.