Ignore rules when validating stories (#7143)

* Ignore rules when validating stories * Fix docs * Update docs/docs/command-line-interface.mdx Co-authored-by: Ella Rohm-Ensing <erohmensing@gmail.com> * Update docs/docs/command-line-interface.mdx Co-authored-by: Ella Rohm-Ensing <erohmensing@gmail.com> * Update docs/docs/command-line-interface.mdx Co-authored-by: Ella Rohm-Ensing <erohmensing@gmail.com> * Add tests * Fix docs * Fix note * Use generate_story_trackers Co-authored-by: Ella Rohm-Ensing <erohmensing@gmail.com>
RasaHQ · Nov 9, 2020 · d37631e · d37631e
1 parent a3316ab
commit d37631e
Show file tree

Hide file tree

Showing 9 changed files with 90 additions and 21 deletions.
diff --git a/changelog/6973.bugfix.md b/changelog/6973.bugfix.md
@@ -0,0 +1 @@
+Ignore rules when validating stories
diff --git a/changelog/6973.doc.md b/changelog/6973.doc.md
@@ -0,0 +1 @@
+Correct data validation docs
diff --git a/changelog/README.md b/changelog/README.md
@@ -18,7 +18,7 @@ Each file should be named like `<ISSUE>.<TYPE>.md`, where
 * `feature`: new user facing features, like new command-line options and new behavior.
 * `improvement`: improvement of existing functionality, usually without requiring user intervention.
 * `bugfix`: fixes a reported bug.
-* `doc`: documentation improvement, like rewording an entire session or adding missing docs.
+* `doc`: documentation improvement, like rewording an entire section or adding missing docs.
 * `removal`: feature deprecation or feature removal.
 * `misc`: fixing a small typo or internal change, will not be included in the changelog.
 

diff --git a/data/test_stories/rules_without_stories_and_wrong_names.md b/data/test_stories/rules_without_stories_and_wrong_names.md
@@ -0,0 +1,23 @@
+>> rule 1
+    - form{"name": "loop_q_form"}  <!-- condition that form is active-->
+    - slot{"requested_slot": "some_slot"}  <!-- some condition -->
+    - ...
+* some_intent_that_doesnt_exist{"some_slot":"bla"} <!-- can be ANY -->
+    - loop_q_form <!-- can be internal core action, can be anything -->
+
+>> rule 2
+    - form{"name": "loop_q_form"} <!-- condition that form is active-->
+    - slot{"requested_slot": "some_slot"}  <!-- some condition -->
+    - ...
+* explain                          <!-- can be anything -->
+    - utter_some_action_that_doesnt_exist
+    - loop_q_form
+    - form{"name": "loop_q_form"} <!-- condition that form is active-->
+
+>> rule 3
+    - form{"name": "loop_q_form"} <!-- condition that form is active-->
+    - ...
+    - loop_q_form <!-- condition that form is active -->
+    - form{"name": null}
+    - slot{"requested_slot": null}
+    - action_stop_q_form
diff --git a/data/test_stories/stories_with_rules_conflicting.md b/data/test_stories/stories_with_rules_conflicting.md
@@ -0,0 +1,9 @@
+>> rule 1
+* greet
+    - utter_noworries
+
+## ML story 1
+* greet
+    - utter_greet
+* thankyou
+    - utter_noworries
diff --git a/docs/docs/command-line-interface.mdx b/docs/docs/command-line-interface.mdx
@@ -313,31 +313,34 @@ rasa data convert nlg --help
 
 ## rasa data validate
 
-You can check your domain, NLU data, or conversation data for mistakes and inconsistencies. 
+You can check your domain, NLU data, or story data for mistakes and inconsistencies. 
 To validate your data, run this command:
 
 ```bash
 rasa data validate
 ```
 
-By default, the validator searches only for errors in the data, e.g. the same training
-example being listed as an example for two intents.
-To catch minor issues that don't prevent training a model but might indicate messy data
-(e.g. unused intents), use the `--fail-on-warnings` flag.
+The validator searches for errors in the data, e.g. two intents that have some
+identical training examples.
+The validator also checks if you have any stories where different assistant actions follow from the same 
+dialogue history. Conflicts between stories will prevent a model from learning the correct
+pattern for a dialogue. 
 
-You can also validate the story structure by running this command:
+If you pass a `max_history` value to one or more policies in your `config.yml` file, provide the 
+smallest of those values in the validator command using the `--max-history <max_history>` flag. 
+
+You can also validate only the story structure by running this command:
 
 ```bash
 rasa data validate stories
 ```
 
-This validator checks if you have any stories where different assistant actions follow from the same 
-dialogue history. Conflicts between stories will prevent a model from learning the correct
-pattern for a dialogue. 
+:::note
+Running `rasa data validate` does **not** test if your [rules](./rules.mdx) are consistent with your stories. 
+However, during training, the `RulePolicy` checks for conflicts between rules and stories. Any such conflict will abort training.
+:::
 
-If you have a [Memoization Policy](./policies.mdx#memoization-policy) in your 
-`config.yml` file, run the validator with the `--max-history` argument and provide the `max_history` 
-value set in `config.yml`. If you didn't set `max_history` in the config file, provide the default value of `5`.
+To interrupt validation even for minor issues such as unused intents or responses, use the `--fail-on-warnings` flag.
 
 :::caution check your story names
 The `rasa data validate stories` command assumes that all your story names are unique!

diff --git a/docs/docs/setting-up-ci-cd.mdx b/docs/docs/setting-up-ci-cd.mdx
@@ -38,20 +38,29 @@ you can make a test run only if the pull request has a certain label (e.g. “NL
 
 ### Validating Data and Stories
 
-Data validation verifies that there are no mistakes or major inconsistencies in your domain, NLU 
-data, or conversation data. To validate your data, have your CI run this command:
+Data validation verifies that no mistakes or major inconsistencies appear in your domain, NLU 
+data, or story data. To validate your data, have your CI run this command:
 
 ```bash
-rasa data validate --fail-on-warnings --max-history <max_history>
+rasa data validate
 ```
 
-If you pass a `max_history` value to a Memoization policy in your `config.yml` file, provide the 
-same value in the above validator command. Otherwise, provide the default value of `5`.
+If you pass a `max_history` value to one or more policies in your `config.yml` file, provide the 
+smallest of those values as
 
-If data validation results in errors, training a model will also fail, so it's
+```bash
+rasa data validate --max-history <max_history>
+```
+
+If data validation results in errors, training a model can also fail or yield bad performance, so it's
 always good to run this check before training a model. By including the
 `--fail-on-warnings` flag, this step will fail on warnings indicating more minor issues.
 
+:::note
+Running `rasa data validate` does **not** test if your [rules](./rules.mdx) are consistent with your stories. 
+However, during training, the `RulePolicy` checks for conflicts between rules and stories. Any such conflict will abort training.
+:::
+
 To read more about the validator and all of the available options, see [the documentation for 
 `rasa data validate`](./command-line-interface.mdx#rasa-data-validate).
 
@@ -95,8 +104,10 @@ as you make improvements to your assistant. A good rule of thumb to follow is th
 to be representative of the true distribution of real conversations.
 Rasa X makes it easy to [add test conversations based on real conversations](https://rasa.com/docs/rasa-x/user-guide/test-assistant/#how-to-create-tests).
 
-Note: Running test stories does **not** execute your action code. You will need to
+:::note
+Running test stories does **not** execute your action code. You will need to
 [test your action code](./setting-up-ci-cd.mdx#testing-action-code) in a separate step.
+:::
 
 ### Comparing NLU Performance
 

diff --git a/rasa/validator.py b/rasa/validator.py
@@ -232,7 +232,7 @@ def verify_story_structure(
             domain=self.domain,
             remove_duplicates=False,
             augmentation_factor=0,
-        ).generate()
+        ).generate_story_trackers()
 
         # Create a list of `StoryConflict` objects
         conflicts = rasa.core.training.story_conflict.find_story_conflicts(

diff --git a/tests/test_validator.py b/tests/test_validator.py
@@ -47,6 +47,18 @@ async def test_verify_valid_responses():
     assert validator.verify_utterances_in_stories()
 
 
+async def test_verify_valid_responses_in_rules():
+    importer = RasaFileImporter(
+        domain_path="data/test_domains/default.yml",
+        training_data_paths=[
+            DEFAULT_NLU_DATA,
+            "data/test_stories/rules_without_stories_and_wrong_names.md",
+        ],
+    )
+    validator = await Validator.from_importer(importer)
+    assert not validator.verify_utterances_in_stories()
+
+
 async def test_verify_story_structure():
     importer = RasaFileImporter(
         domain_path="data/test_domains/default.yml",
@@ -65,6 +77,15 @@ async def test_verify_bad_story_structure():
     assert not validator.verify_story_structure(ignore_warnings=False)
 
 
+async def test_verify_story_structure_ignores_rules():
+    importer = RasaFileImporter(
+        domain_path="data/test_domains/default.yml",
+        training_data_paths=["data/test_stories/stories_with_rules_conflicting.md"],
+    )
+    validator = await Validator.from_importer(importer)
+    assert validator.verify_story_structure(ignore_warnings=False)
+
+
 async def test_verify_bad_story_structure_ignore_warnings():
     importer = RasaFileImporter(
         domain_path="data/test_domains/default.yml",