Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

addresses issue #310 #311

Merged
merged 4 commits into from
Jun 6, 2024
Merged

addresses issue #310 #311

merged 4 commits into from
Jun 6, 2024

Conversation

timbray
Copy link
Owner

@timbray timbray commented May 31, 2024

add a prettyprinter for NFAs

add a prettyprinter for NFAs

Signed-off-by: Tim Bray <tbray@textuality.com>
@codecov-commenter
Copy link

codecov-commenter commented May 31, 2024

Codecov Report

Attention: Patch coverage is 97.95918% with 2 lines in your changes missing coverage. Please review.

Project coverage is 96.39%. Comparing base (b74612e) to head (7c5c99d).

Files Patch % Lines
prettyprinter.go 97.40% 1 Missing and 1 partial ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #311      +/-   ##
==========================================
+ Coverage   96.32%   96.39%   +0.06%     
==========================================
  Files          17       18       +1     
  Lines        1634     1718      +84     
==========================================
+ Hits         1574     1656      +82     
- Misses         34       35       +1     
- Partials       26       27       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@timbray
Copy link
Owner Author

timbray commented Jun 1, 2024

Once again, deep NFA twiddling, will push it myself if nobody wants to review. However, have a glance at TestPP() in prettyerinter_test.go to see what the prettyprinter output looks like for the wildcard pattern "x*9". I found this output absolutely essential in debugging the transition from DFA to NFA. In that output, the abbreviation gS stands for "glob state", i.e. when you arrive at * and loop back on anything but a particular byte value. The state you reach when you see that particular value and exit the glob has a name including gX for "glob exit".

If you have an idea for an improvement in the prettyerinter output, feel free to comment.

Copy link
Collaborator

@embano1 embano1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a comment regarding potential performance (allocation) issues. You should also update the README if you think this could be useful. For example, would it be good to enable printing behavior (for users) using an environment variable like Q_DEBUG_NFA to enable the printer?

Regarding printing output, might also be worth explaining that in README unless you want users/developers to only send a printer output file to you so you can debug?

// printer is an interface used to generate representations of Quamina data structures to facilitate
// debugging and optimization. It's an interface rather than a type so that a null implementation can
// be provided for production that should incur very little performance cost.
type printer interface {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest following what Google did when they developed the slog logging package. Instead of going with an interface type, they used a concrete struct with a specific interface handler type to avoid allocations: https://youtu.be/8rnI2xLrdeM?feature=shared&t=1336 (starts at that conversation)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tl;dr

// inspired by https://pkg.go.dev/golang.org/x/exp/slog#Logger
type Handler interface {
  Enabled() bool
  Handle(...)
}

type Printer struct {
  handler handler
}

func New() *Printer {...}

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, looking at this. Unless you're actively testing/debugging and using newPrettyPrinter() the only allocation currently in production code is in value_matcher:

		case shellStyleType:
			newFA, nextField = makeShellStyleAutomaton(valBytes, &nullPrinter{})

It dawns on me that if I put a global variable in pretty_printer.go like this:

var sharedNullPrinter = &nullPrinter{}

Then that line in value_matcher becomes:

		case shellStyleType:
			newFA, nextField = makeShellStyleAutomaton(valBytes, sharedNullPrinter)

Then there would be no allocations ever outside of when you're testing/debugging. Would that address your concern?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I agree about documenting this stuff. Probably best to create a DEVELOPING.md or some such; is there a standard name for this kind of thing?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this implementation is purely for developing, CONTRIBUTING.md might be common

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a comment in the place where I think alloc might happen? Ideally, we'd have a benchmark which shows the before/after here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, CONTRIBUTING is the place.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a description to CONTRIBUTING in the latest commit. If that's OK I think we can resolve this conversation?

field_matcher.go Show resolved Hide resolved
@@ -99,3 +105,52 @@ instructions for installing it.

When opening a new issue, try to roughly follow the commit message format
conventions above.

## Developing
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great section! Since we talked about OS env vars, how would a dev enable the pretty printer easily? Would be nice to explain this here too.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. CONTRIBUTING now contains a mini developer's user guide.

Signed-off-by: Tim Bray <tbray@textuality.com>
Copy link
Collaborator

@embano1 embano1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, perhaps squash before merging

@timbray
Copy link
Owner Author

timbray commented Jun 6, 2024

LGTM, perhaps squash before merging

Thanks! Your input really improved this one. I'll let GitHub do the squashing, that seems to work OK with the CI.

@timbray timbray merged commit 5eee82d into main Jun 6, 2024
7 checks passed
@timbray timbray deleted the issue-310 branch June 6, 2024 15:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants