Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Story structure validation tool #4968

Merged
merged 236 commits into from
Feb 25, 2020
Merged
Show file tree
Hide file tree
Changes from 196 commits
Commits
Show all changes
236 commits
Select commit Hold shift + click to select a range
cb3bc37
Copy story_tree.py script
Oct 9, 2019
f0fe7b2
Hack in simple story-tree validation (not featurized)
Oct 9, 2019
00de88b
Begin tracker use (draft)
Oct 10, 2019
a548667
Begin tracker use (draft)
Oct 10, 2019
bc4465e
Use dialogue state for tree
Nov 1, 2019
e8743c0
Distinguish name and state in story_tree
Nov 1, 2019
33225bd
Add separate 'kind' property to tree
Nov 1, 2019
59c4672
Add sensible output
Nov 1, 2019
e771f21
Discard the tree
Nov 1, 2019
ba49d54
Enable single-story-per-tracker output
Nov 1, 2019
408775a
Fix invalid += for dict
Nov 1, 2019
9d0485f
Start using trackers only for conflict finding
Nov 1, 2019
2a3e140
Fix print problem
Nov 1, 2019
fb2a552
Fix bad position of idx
Nov 1, 2019
343d98a
Fix rule collection
Nov 4, 2019
3025287
Recreate output
Nov 4, 2019
f32806d
Dont sort states, since not sortable
Nov 4, 2019
7605845
Fix forms and slots handling
Nov 4, 2019
5c55573
Minimal output
Nov 5, 2019
bfea1ce
Introduce rasa data validate stories
Nov 5, 2019
178c673
Inform user about max_history
Nov 5, 2019
4488439
Add max_history parameter
Nov 5, 2019
406353e
Merge branch 'master' into stroy-tree-1
Nov 5, 2019
9e8036f
Enable `rasa data validate --stories`
Nov 5, 2019
3f1d689
Implement verify_story_names
Nov 6, 2019
b62c8e9
Implement deduplicate_story_names for rasa data clean
Nov 6, 2019
0a193e7
Remove story_tree.py script since its not used
Nov 7, 2019
a0c8adc
Use logger instead of print
Nov 7, 2019
9f124e9
Reduce cognitive complexity
Nov 7, 2019
aaccc17
Merge branch 'master' into stroy-tree-1
Nov 7, 2019
5454a3b
Split add_subparser to reduce lines of code
Nov 7, 2019
762b14f
Improve logging text
Nov 11, 2019
4cc621a
Use MESSAGE_INTENT_ATTRIBUTE
Nov 11, 2019
1ff31fa
Output all conflicting stories
Nov 11, 2019
40865eb
Merge branch 'master' into story-tree-1
Nov 13, 2019
6426cc1
Check story structure on `data validate` if error-free otherwise
Nov 13, 2019
379165b
Setup --prompt flag for `rasa data validate stories`
Nov 13, 2019
0aa8c77
List duplicate story names
Nov 13, 2019
b5591fe
Make output of story names more user friendly
Nov 13, 2019
fb6c162
Introduce StoryConflict
Nov 13, 2019
51570c7
Enable --prompt flag (dummy)
Nov 13, 2019
0224ab3
Let deduplicate_story_names replace the files
Nov 13, 2019
0c380d0
Remove unused code
Nov 13, 2019
26e2927
Improve prompt
Nov 13, 2019
3e69cda
Add stories_to_correct
Nov 13, 2019
e808928
Fix return value
Nov 14, 2019
ecf76ff
Make --max-history a necessary parameter
Nov 15, 2019
a1d1f5a
Refer to trackers as trackers, not stories
Nov 15, 2019
8e329d2
Fix check for story name duplicates
Nov 21, 2019
b51bdb7
Remove conflicts that arise from unpredictable actions
Nov 21, 2019
c5e0b66
Remove debug print statements
Nov 21, 2019
bec21a8
Fix missing [0]
Nov 21, 2019
32247b3
Merge branch 'master' into story-tree-1
Nov 21, 2019
3fdc08a
Merge branch 'master' into story-tree-1
Nov 27, 2019
fa4bcc1
Move finding of conflicts to StoryConflict class
Dec 11, 2019
ce86ede
Respect ignore_warnings
Dec 11, 2019
fe46fb9
Apply BLACK formatting
Dec 11, 2019
93acaa0
Add doc strings
Dec 11, 2019
44649d4
Declare types
Dec 11, 2019
5aa3b53
Remove verify_story_names, as it does not work
Dec 11, 2019
48814db
Drop writing bak files when deduplicating
Dec 11, 2019
4a36d1e
Clean up deduplicate_story_names
Dec 11, 2019
4d984e6
Add some comments
Dec 11, 2019
add4a2b
Write first test for StoryConflict class
Dec 11, 2019
f9b6d16
Add warning about non-markdown file cleaning
Dec 11, 2019
bbb10b7
Fix test to yield actual conflicts
Dec 11, 2019
3398dcb
Remove unnecessary arguments
Dec 11, 2019
ebab3a3
Add more tests
Dec 11, 2019
f631ac4
Add more tests
Dec 11, 2019
d44b0f2
Reformat with BLACK
Dec 11, 2019
68055ce
Add more tests
Dec 11, 2019
9062f44
Optimize imports
Dec 11, 2019
b3f5a13
Add more tests
Dec 11, 2019
cc8d850
Add more tests
Dec 11, 2019
0c66536
Apply BLACK formatting
Dec 11, 2019
c2ec80d
Explain rasa data validate stories in docs
Dec 11, 2019
38fdb5c
Merge branch 'master' into johannes-storystructure
Dec 13, 2019
9c26332
Revert check for duplicates and data clean
Dec 13, 2019
ebd48f7
Fix test_data_validate_help
Dec 13, 2019
1e9705b
Add more tests
Dec 13, 2019
69df7b4
Let data validate check stories even if other tests unsuccessful
Dec 13, 2019
63d5699
Fix test_verify_bad_story_structure
Dec 13, 2019
c1310a9
Apply BLACK
Dec 13, 2019
3ffc945
Simplify code
Dec 13, 2019
8ab741a
Fix Pygments lexer type
Dec 13, 2019
4f755f5
Add warning about uniqueness of story names
Dec 13, 2019
d645cd5
Clarify code of _get_prev_event
Dec 13, 2019
b7c0dd2
Clarify code of _build_conflicts_from_states
Dec 13, 2019
6c53b64
Clarify code of incorrect_stories
Dec 13, 2019
cb471c6
Clarify code of story_prior_to_conflict
Dec 13, 2019
9d3a425
Clarify code of __str__
Dec 13, 2019
81d366c
Clarify code of _find_conflicting_states
Dec 13, 2019
37f0ba5
Remove ToDo
Dec 13, 2019
9a36c15
Fix _get_prev_event
Dec 15, 2019
0cd92d0
Fix _find_conflicting_states
Dec 15, 2019
87e3d75
Fix _build_conflicts_from_states
Dec 15, 2019
a9414f7
Update docs/user-guide/validate-files.rst
Jan 3, 2020
9f3c42a
Rasa Core -> Rasa
Jan 3, 2020
d7ff1f1
Replace Rasa Core -> Rasa
Jan 3, 2020
fd9f1a4
Add correct quotes
Jan 3, 2020
dc99136
Avoid sys.exit(0) if everything ok
Jan 3, 2020
a031235
Define StoryConflict.__hash__
Jan 3, 2020
d375c13
Move static functions outside of StoryConflict object
Jan 3, 2020
67365c2
Use raise argparse.ArgumentError in validate_stories
Jan 3, 2020
0005186
Merge branch 'johannes-4088b' of github.com:RasaHQ/rasa into johannes…
Jan 3, 2020
8a40e00
Import missing print_error
Jan 3, 2020
35cca6f
Move story_conflict.py to core.training
Jan 3, 2020
3ee76e2
Fix tests to use find_story_conflicts
Jan 3, 2020
bc181fb
Fix arguments of argparse.ArgumentError
Jan 3, 2020
7e77e0f
Remove empty line
Jan 10, 2020
6b70c51
Specify return type
Jan 10, 2020
3d0400d
Specify return type
Jan 10, 2020
8a80184
Specify return type
Jan 10, 2020
7965ee6
Clarify error message
Jan 10, 2020
cc230aa
Clarify text
Jan 13, 2020
d2e3772
Fix typo
Jan 13, 2020
25238db
Clarify code in conflicting_actions_with_counts
Jan 13, 2020
62dccd7
Remove unused code
Jan 13, 2020
29f572d
Declare output types
Jan 13, 2020
6e26d08
Reformat doc strings
Jan 13, 2020
e917ea6
Rename variables for clarity
Jan 13, 2020
8141bcc
Use `defaultdict(list)` to simplify code
Jan 13, 2020
53c2b2a
Declare types in `validate_*` functions
Jan 13, 2020
c55a276
Define and use TrackerEventStateTuple
Jan 13, 2020
700e028
Rename function to _get_previous_event
Jan 13, 2020
c4c3f8c
Add fullstops to doc strings
Jan 13, 2020
8f107f7
Simplify StoryConflict.__str__
Jan 13, 2020
a9f5621
Declare return type
Jan 13, 2020
6a196fa
Rename test_story_conflict.py
Jan 13, 2020
cea9208
Remove empty lines
Jan 13, 2020
8487207
Apply BLACK formatting
Jan 13, 2020
f27812f
Add tests for story conflicts
Jan 13, 2020
86aefe5
Define _setup_trackers_for_testing
Jan 13, 2020
3e2c20e
Merge branch 'johannes-4088b' of github.com:RasaHQ/rasa into johannes…
Jan 13, 2020
1123db6
Clarify output message
Jan 13, 2020
01c60f7
Add missing tick-marks
Jan 13, 2020
863a462
Clarify help strings
Jan 13, 2020
0ea7315
Use else
Jan 13, 2020
adc45c5
Avoid exit on successful validate
Jan 13, 2020
846537d
Dirty bugfix for sanic-plugins-framework dependency
Jan 14, 2020
ceb0abc
Clarify documentation of `rasa data validate`
Jan 16, 2020
7c23e80
Make logger messages consistent
Jan 16, 2020
a45a9b0
Remove outdated ToDo string
Jan 16, 2020
0e4bfee
Remove irrelevant comment
Jan 16, 2020
e76e90b
Add test_verify_bad_story_structure_ignore_warnings
Jan 16, 2020
498e377
Merge branch 'master' into johannes-4088b
Jan 16, 2020
485537a
Fix consequences of renaming INTENT_ATTRIBUTE
Jan 20, 2020
6b617c0
Fix return type annotation
Jan 20, 2020
07b3d8a
Add test for _get_previous_event
Jan 20, 2020
c8e2e7e
Simplify StoryConflict.__str__
Jan 20, 2020
070b597
Simplify _build_conflicts_from_states
Jan 20, 2020
6cfc0c8
Merge branch 'master' into johannes-4088b
Jan 20, 2020
1b95d6c
Simplify _get_previous_event
Jan 20, 2020
f3d2937
Simplify _get_previous_event
Jan 20, 2020
a4b74cc
Fix return type of StoryConflict._sliced_states_iterator
Jan 20, 2020
03952f5
Avoid `from`
Jan 20, 2020
54936b0
Update rasa/cli/data.py
Jan 20, 2020
0c83877
Merge branch 'johannes-4088b' of github.com:RasaHQ/rasa into johannes…
Jan 20, 2020
c33eef5
Optimize imports
Jan 20, 2020
f787eb3
Remove quotes from type
Jan 20, 2020
90b3452
Clarify doc string
Jan 20, 2020
8ab7869
Rename variable for clarity
Jan 20, 2020
19cd71f
Apply BLACK formatting
Jan 20, 2020
8c17f8c
Delete unused property
Jan 20, 2020
a14c9b7
Rename `state_action_mapping`
Jan 20, 2020
2dbc7c2
Remove quotes from type declaration
Jan 20, 2020
a88b6f9
Add comment for clarification
Jan 20, 2020
71074d8
Spell out variable names
Jan 20, 2020
797df7d
Rename `turn_label`
Jan 20, 2020
3a7ccdf
Use subclassing to define `TrackerEventStateTuple`
Jan 20, 2020
d902a72
Define `TrackerEventStateTuple.sliced_states_hash`
Jan 20, 2020
9a45691
Merge branch 'master' into johannes-4088b
wochinge Jan 22, 2020
9225a19
Use double quotemarks in rst
Jan 27, 2020
303b2a7
Update rasa/core/training/story_conflict.py
Jan 27, 2020
8f7074a
Simplify code with `_append_story_structure_arguments`
Jan 27, 2020
bd2f56d
Absorb `validate_stories` into `validate_files`
Jan 27, 2020
7510b7b
Make `StoryConflict._sliced_states` private
Jan 27, 2020
06a6622
Rename `conflict_has_prior_events`
Jan 27, 2020
fcd61b2
Add quote ticks
Jan 27, 2020
bd2a5ed
Declare types for `StoryConflict._summarize_conflict`
Jan 27, 2020
047c73d
Let conflict summary always show at least two names
Jan 27, 2020
5f5df0d
Declare return type of `TrackerEventStateTuple.sliced_states_hash`
Jan 27, 2020
7f72d7c
Update rasa/core/training/story_conflict.py
Jan 27, 2020
c4c936f
Rename local variable `conflicting_state_action_mapping`
Jan 27, 2020
aac90e0
Clarify comment
Jan 27, 2020
4e2b767
Use `return` instead of `break`
Jan 27, 2020
af9d877
Fix _get_previous_event
Jan 27, 2020
339299b
Expand doc string
Jan 27, 2020
6c9764a
Use `not` instead of `len(...) == 0`
Jan 27, 2020
1b44bef
Add tests for `data validate ...` warnings
Jan 27, 2020
87e99b5
Rephrase
Jan 27, 2020
c3651cb
Merge branch 'johannes-4088b' of github.com:RasaHQ/rasa into johannes…
Jan 27, 2020
f0c8eb5
Add `run_in_default_project_with_info`
Jan 28, 2020
eea7a95
Apply BLACK formatting
Jan 28, 2020
6f85f07
Merge branch 'master' into johannes-4088b
Jan 28, 2020
68d43be
Fix `args.max_history`
Jan 28, 2020
bb25354
Simplify `StoryConflict._summarize_conflict`
Jan 28, 2020
d73f89b
Apply BLACK formatting
Jan 28, 2020
bde9ad4
Add changelog
Jan 28, 2020
81e87d6
Merge branch 'master' into johannes-4088b
Jan 28, 2020
90c4a30
Fix `args.max_history` and `stories_only`
Jan 29, 2020
c2c9a37
Update docstring
Jan 29, 2020
0a0e703
Rename `test_find_conflicts_slots_that_break` and `_dont_break`
Jan 29, 2020
4dd5149
Merge branch 'johannes-4088b' of github.com:RasaHQ/rasa into johannes…
Jan 29, 2020
7ccdef0
Add doc-strings
Jan 29, 2020
300a8ae
Fix `run_in_default_project` vis. `os.environ["LOG_LEVEL"]`
Jan 29, 2020
34ba1de
Apply BLACK formatting
Jan 30, 2020
2f54c1f
Enable story structure validation without `max_history`
Feb 10, 2020
aea42db
Use `print_error_and_exit`
Feb 10, 2020
e4d484a
Rename `_summarize_conflicting_actions`
Feb 10, 2020
b1e4d71
Change error message text
Feb 10, 2020
c7fae3b
Fix docstring formatting
Feb 10, 2020
cebf957
Avoid importing `find_story_conflicts` directly
Feb 10, 2020
85a6ec4
Merge branch 'master' into johannes-4088b
Feb 10, 2020
4ab2537
Apply BLACK formatting
Feb 11, 2020
a20cf03
Merge branch 'master' into johannes-4088b
Feb 12, 2020
eb2a5f8
Avoid all-caps output
Feb 17, 2020
d5344d9
Add _get_length_of_longest_story
Feb 18, 2020
5ec7e54
Add types for _setup_trackers_for_testing
Feb 18, 2020
31e02b8
Add type hint for `split_nlu_data`
Feb 18, 2020
eb29b52
Merge branch 'master' into johannes-4088b
Feb 18, 2020
d3fc1da
Merge branch 'master' into johannes-4088b
tmbo Feb 20, 2020
0b59736
Merge branch 'master' into johannes-4088b
tmbo Feb 21, 2020
bf16cc4
Use Monkeypatch for `test_data_validate_stories_with_max_history_zero`
Feb 21, 2020
d005e3d
Merge branch 'master' into johannes-4088b
Feb 21, 2020
89e853c
Merge branch 'johannes-4088b' of github.com:RasaHQ/rasa into johannes…
Feb 21, 2020
c1e70c9
Apply BLACK formatting
Feb 21, 2020
5b18b28
Clean up test_data_validate_stories_with_max_history_zero
Feb 21, 2020
267ce89
Merge branch 'master' into johannes-4088b
Feb 24, 2020
040980a
Move `Validator` up to `rasa.validator`
Feb 24, 2020
c044a30
Apply BLACK formatting
Feb 24, 2020
99937db
Fix `test_data_validate_stories_with_max_history_zero`
Feb 24, 2020
f4ddd78
Apply BLACK formatting again
Feb 24, 2020
7347307
Merge branch 'master' into johannes-4088b
Feb 24, 2020
1790a4e
Merge branch 'master' into johannes-4088b
Feb 24, 2020
da2ad2c
Merge branch 'master' into johannes-4088b
Feb 25, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions data/test_stories/stories_conflicting_1.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
## story 1
* greet
- utter_greet
* greet
- utter_greet
JEM-Mosig marked this conversation as resolved.
Show resolved Hide resolved
* greet
- utter_greet

## story 2
* default
- utter_greet
* greet
- utter_greet
* greet
- utter_default
14 changes: 14 additions & 0 deletions data/test_stories/stories_conflicting_2.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
## greetings
* greet
- utter_greet
> check_greet

## happy path
> check_greet
* default
- utter_default

## problem
> check_greet
* default
- utter_goodbye
14 changes: 14 additions & 0 deletions data/test_stories/stories_conflicting_3.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
## greetings
* greet
- utter_greet
> check_greet

## happy path
> check_greet
* default OR greet
- utter_default

## problem
> check_greet
* greet
- utter_goodbye
17 changes: 17 additions & 0 deletions data/test_stories/stories_conflicting_4.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
## story 1
* greet
- utter_greet
* greet
- slot{"cuisine": "German"}
- utter_greet
* greet
- utter_greet

## story 2
* greet
- utter_greet
* greet
- slot{"cuisine": "German"}
- utter_greet
* greet
- utter_default
16 changes: 16 additions & 0 deletions data/test_stories/stories_conflicting_5.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
## story 1
* greet
- utter_greet
* greet
- utter_greet
- slot{"cuisine": "German"}
* greet
- utter_greet

## story 2
* greet
- utter_greet
* greet
- utter_greet
* greet
- utter_default
22 changes: 22 additions & 0 deletions data/test_stories/stories_conflicting_6.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
## story 1
* greet
- utter_greet

## story 2
* greet
- utter_default

## story 3
* greet
- utter_default
* greet

## story 4
* greet
- utter_default
* default

## story 5
* greet
- utter_default
* goodbye
53 changes: 52 additions & 1 deletion docs/user-guide/validate-files.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,8 @@ You can run it with the following command:

rasa data validate

The script above runs all the validations on your files. Here is the list of options to
The script above runs all the validations on your files, except for story structure validation,
which is omitted unless you provide the ``--max-history`` argument. Here is the list of options to
the script:

.. program-output:: rasa data validate --help
Expand Down Expand Up @@ -65,3 +66,53 @@ To use these functions it is necessary to create a `Validator` object and initia
stories='data/stories.md')

validator.verify_all()

JEM-Mosig marked this conversation as resolved.
Show resolved Hide resolved
Test Story Files for Conflicts
------------------------------

In addition to the default tests described above, you can also do a more in-depth structural test of your stories.
In particular, you can test if your stories are inconsistent, i.e. if different bot actions follow from the same dialogue history.
If this is not the case, then Rasa cannot learn the correct behaviour.

Take, for example, the following two stories:

.. code-block:: md

## Story 1
* greet
- utter_greet
* inform_happy
- utter_happy
- utter_goodbye

## Story 2
* greet
- utter_greet
* inform_happy
- utter_goodbye

These two stories are inconsistent, because Rasa doesn't know if it should predict ``utter_happy`` or ``utter_goodbye``
after ``inform_happy``, as there is nothing that would distinguish the dialogue states at ``inform_happy`` in the two
stories and the subsequent actions are different in Story 1 and Story 2.

This conflict can be automatically identified with our story structure validation tool.
To do this, use ``rasa data validate`` in the command line, as follows:

.. code-block:: bash

rasa data validate stories --max-history 3
> 2019-12-09 09:32:13 INFO rasa.core.validator - Story structure validation...
> 2019-12-09 09:32:13 INFO rasa.core.validator - Assuming max_history = 3
JEM-Mosig marked this conversation as resolved.
Show resolved Hide resolved
> Processed Story Blocks: 100% 2/2 [00:00<00:00, 3237.59it/s, # trackers=1]
> 2019-12-09 09:32:13 WARNING rasa.core.validator - CONFLICT after intent 'inform_happy':
> utter_goodbye predicted in 'Story 2'
wochinge marked this conversation as resolved.
Show resolved Hide resolved
> utter_happy predicted in 'Story 1'

Here we specify a ``max-history`` value of 3.
This means, that 3 events (user messages / bot actions) are taken into account for action predictions, but the particular setting does not matter for this example, because regardless of how long of a history you take into account, the conflict always exists.

.. warning::

The ``rasa data validate stories`` script assumes that all your **story names are unique**.
If your stories are in the Markdown format, you may find duplicate names with a command like
``grep -h "##" data/*.md | uniq -c | grep "^[^1]"``.
102 changes: 86 additions & 16 deletions rasa/cli/data.py
Original file line number Diff line number Diff line change
@@ -1,21 +1,23 @@
import logging
import argparse
import asyncio
import sys
from typing import List

from rasa import data
from rasa.cli.arguments import data as arguments
from rasa.cli.utils import get_validated_path
import rasa.cli.utils
from rasa.constants import DEFAULT_DATA_PATH
from typing import NoReturn
from rasa.core.validator import Validator
from rasa.importers.rasa import RasaFileImporter

logger = logging.getLogger(__name__)


# noinspection PyProtectedMember
def add_subparser(
subparsers: argparse._SubParsersAction, parents: List[argparse.ArgumentParser]
):
import rasa.nlu.convert as convert

data_parser = subparsers.add_parser(
"data",
conflict_handler="resolve",
Expand All @@ -26,6 +28,17 @@ def add_subparser(
data_parser.set_defaults(func=lambda _: data_parser.print_help(None))

data_subparsers = data_parser.add_subparsers()

_add_data_convert_parsers(data_subparsers, parents)
_add_data_split_parsers(data_subparsers, parents)
_add_data_validate_parsers(data_subparsers, parents)


def _add_data_convert_parsers(
data_subparsers, parents: List[argparse.ArgumentParser]
) -> None:
from rasa.nlu import convert

convert_parser = data_subparsers.add_parser(
"convert",
formatter_class=argparse.ArgumentDefaultsHelpFormatter,
Expand All @@ -45,6 +58,10 @@ def add_subparser(

arguments.set_convert_arguments(convert_nlu_parser)


def _add_data_split_parsers(
data_subparsers, parents: List[argparse.ArgumentParser]
) -> None:
split_parser = data_subparsers.add_parser(
"split",
formatter_class=argparse.ArgumentDefaultsHelpFormatter,
Expand All @@ -65,21 +82,46 @@ def add_subparser(

arguments.set_split_arguments(nlu_split_parser)


def _add_data_validate_parsers(
data_subparsers, parents: List[argparse.ArgumentParser]
) -> None:
validate_parser = data_subparsers.add_parser(
"validate",
formatter_class=argparse.ArgumentDefaultsHelpFormatter,
parents=parents,
help="Validates domain and data files to check for possible mistakes.",
)
_append_story_structure_arguments(validate_parser)
validate_parser.set_defaults(func=validate_files)
arguments.set_validator_arguments(validate_parser)

validate_subparsers = validate_parser.add_subparsers()
story_structure_parser = validate_subparsers.add_parser(
"stories",
formatter_class=argparse.ArgumentDefaultsHelpFormatter,
parents=parents,
help="Checks for inconsistencies in the story files.",
)
_append_story_structure_arguments(story_structure_parser)
story_structure_parser.set_defaults(func=validate_files, stories_only=True)
arguments.set_validator_arguments(story_structure_parser)


def _append_story_structure_arguments(parser: argparse.ArgumentParser) -> None:
parser.add_argument(
"--max-history",
JEM-Mosig marked this conversation as resolved.
Show resolved Hide resolved
type=int,
default=None,
help="Number of turns taken into account for story structure validation.",
)


def split_nlu_data(args) -> None:
JEM-Mosig marked this conversation as resolved.
Show resolved Hide resolved
from rasa.nlu.training_data.loading import load_data
from rasa.nlu.training_data.util import get_file_format

data_path = get_validated_path(args.nlu, "nlu", DEFAULT_DATA_PATH)
data_path = rasa.cli.utils.get_validated_path(args.nlu, "nlu", DEFAULT_DATA_PATH)
data_path = data.get_nlu_directory(data_path)

nlu_data = load_data(data_path)
Expand All @@ -91,22 +133,50 @@ def split_nlu_data(args) -> None:
test.persist(args.out, filename=f"test_data.{fformat}")


def validate_files(args) -> NoReturn:
"""Validate all files needed for training a model.

Fails with a non-zero exit code if there are any errors in the data."""
from rasa.core.validator import Validator
from rasa.importers.rasa import RasaFileImporter

def validate_files(args: argparse.Namespace) -> None:
JEM-Mosig marked this conversation as resolved.
Show resolved Hide resolved
loop = asyncio.get_event_loop()
file_importer = RasaFileImporter(
domain_path=args.domain, training_data_paths=args.data
)

validator = loop.run_until_complete(Validator.from_importer(file_importer))
domain_is_valid = validator.verify_domain_validity()
if not domain_is_valid:

if "stories_only" in args:
JEM-Mosig marked this conversation as resolved.
Show resolved Hide resolved
all_good = _validate_story_structure(validator, args)
elif "max_history" not in args or args.max_history is None:
JEM-Mosig marked this conversation as resolved.
Show resolved Hide resolved
logger.info(
"Will not test for inconsistencies in stories since "
"you did not provide a value for `--max-history`."
)
all_good = _validate_domain(validator) and _validate_nlu(validator, args)
else:
all_good = (
_validate_domain(validator)
and _validate_nlu(validator, args)
and _validate_story_structure(validator, args)
JEM-Mosig marked this conversation as resolved.
Show resolved Hide resolved
)

if not all_good:
JEM-Mosig marked this conversation as resolved.
Show resolved Hide resolved
rasa.cli.utils.print_error("Project validation completed with errors.")
sys.exit(1)

everything_is_alright = validator.verify_all(not args.fail_on_warnings)
sys.exit(0) if everything_is_alright else sys.exit(1)

def _validate_domain(validator: Validator) -> bool:
return validator.verify_domain_validity()


def _validate_nlu(validator: Validator, args: argparse.Namespace) -> bool:
return validator.verify_nlu(not args.fail_on_warnings)


def _validate_story_structure(validator: Validator, args: argparse.Namespace) -> bool:
# Check if a valid setting for `max_history` was given
if not isinstance(args.max_history, int) or args.max_history < 1:
raise argparse.ArgumentError(
args.max_history,
"You have to provide a positive integer for `--max-history`.",
)

return validator.verify_story_structure(
not args.fail_on_warnings, max_history=args.max_history
)
Loading