Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Optional feature: More verbose failed expression reporting #114

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

stanislaw
Copy link
Contributor

@stanislaw stanislaw commented Apr 15, 2023

The verbose=False/True option is added to the Parser class. This option is disabled by default, and the existing behavior is fully preserved.

When the option is enabled, the final expected message is extended with extra information about the previously "weakly failed" rules. This way, not only the last failed NoMatch exception and its failing rules are displayed but also all the rules that were not matched during the whole parsing process.

Open points:

  • The markers could be removed, and all nodes could be printed when verbose=True. Implemented by verbose2.
  • The reporting code is extremely messy and needs a cleanup.
  • verbose may be not the best name for this option. I am happy to change it to any other name.

Will be done after/if the approach is confirmed.

Code review checklist

  • Pull request represents a single change (i.e. not fixing disparate/unrelated things in a single PR)
  • Title summarizes what is changing
  • Commit messages are meaningful (see this for details)
  • Tests have been included and/or updated
  • Docstrings have been included and/or updated, as appropriate
  • Standalone docs have been updated accordingly
  • Changelog(s) has/have been updated, as needed (see CHANGELOG.md, no need
    to update for typo fixes and such).

@stanislaw
Copy link
Contributor Author

I have hit another problem with this approach, though. If everything is written to the buffer, then during the report time, also the exceptions that were eventually matched on a higher level are printed.

I have solved this problem by introducing the so-called markers, which help me to exclude the failed nodes that should not be printed. I hope that my code, in spite of it being messy, is easy to understand.

What I also found is that it could even make sense to print everything of Match* that is found in the deque because it gives a very nice presentation of what the grammar is capable of within the context of a currently edited file.

@stanislaw stanislaw force-pushed the stanislaw/circular4 branch 4 times, most recently from cf58a6f to c05f908 Compare April 17, 2023 19:50
Copy link
Member

@igordejanovic igordejanovic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @stanislaw.

I did a first iteration of review where I focused more on the behavior than the implementation. I have several questions. Please see the comments.

I'll run some tests with real-world inputs to see if this new approach yields better/more intuitive reports. So far it seems to show a good potential.

arpeggio/tests/test_parsing_expressions.py Outdated Show resolved Hide resolved
arpeggio/tests/test_error_reporting_verbose2.py Outdated Show resolved Hide resolved
"Expected:\n"
"1:5: EOF\n"
"1:7: 'b'\n"
" => 'a, c*, '."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is nice as it shows the possibility of ending earlier.

"'six' at position (1, 1) or "
"'4' at position (1, 15) or "
"'five' at position (1, 20) => "
"'*one two th'."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this is technically correct I'm not sure how intuitive it will be for the user. I'll have to test with larger inputs to see what are the benefits. Maybe I'm wrong but I imagine if the input is large there will be many week failures and the report might get obscured as it will start reporting very early week failures. In addition, maybe the * marker should be at the last failure position. Just wondering.

Copy link
Contributor Author

@stanislaw stanislaw Apr 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition, maybe the * marker should be at the last failure position. Just wondering.

I have changed this. Indeed, the * should be at the actual last failure's position.

@@ -1413,7 +1563,7 @@ class Parser(DebugPrinter):
FIRST_NOT = Not()

def __init__(self, skipws=True, ws=None, reduce_tree=False, autokwd=False,
ignore_case=False, memoization=False, **kwargs):
ignore_case=False, memoization=False, verbose=False, verbose2=False, **kwargs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why two different verbose flags? What is the purpose of each? Could we just use verbosity as a level (int).

@igordejanovic
Copy link
Member

@stanislaw Just a head up. I've been experimenting with this branch today. The experiment consisted of running examples (bibtex and simple) with verbose2 flag set and introducing syntax errors at various places.

The reports I got were quite unintuitive, at least for me. The master branch still provides a simple but more to the point report. What are your thoughts? Does this give any better messages in the context of StrictDoc?

I still find much easier to understand what is going on by running the parser with debug=True.

@stanislaw stanislaw force-pushed the stanislaw/circular4 branch 2 times, most recently from 66f4b3a to 01166f5 Compare April 22, 2023 15:29
An option is added to the Parser class. This option is disabled by default,
and the existing behavior is fully preserved.

When the option is enabled, the final expected message is extended with
extra information about the previously "weakly failed" rules. This way,
not only the last failed NoMatch exception and its failing rules are displayed
but also all the rules that were not matched during the whole parsing process.
@stanislaw stanislaw force-pushed the stanislaw/circular4 branch from 01166f5 to 57a9003 Compare April 23, 2023 11:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants