Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code coverage for GoAWK #144

Closed
xonixx opened this issue Aug 19, 2022 · 10 comments
Closed

Code coverage for GoAWK #144

xonixx opened this issue Aug 19, 2022 · 10 comments

Comments

@xonixx
Copy link
Contributor

xonixx commented Aug 19, 2022

In this branch I'm working on code coverage functionality for GoAWK, similar to the one built in the Golang itself (https://go.dev/blog/cover).

In fact, I was able to make the go tool cover to produce the meaningful HTML coverage report for AWK.

I wonder if you @benhoyt would be interested in adding this to the project. My solution is a bit hacky here and there, but the approach seems to work well.

I'm still working on adding tests and documentation, and maybe some minor code adjustments, but I would be glad to start getting some feedback from you @benhoyt.

image

@benhoyt
Copy link
Owner

benhoyt commented Aug 21, 2022

Hi @xonixx, wow, this looks very interesting -- nice work. I'll take a closer look later this week and get back to you.

@benhoyt
Copy link
Owner

benhoyt commented Aug 22, 2022

I'll leave more comments later, but in the meantime I'm having trouble using go tool cover to view the report.

$ cat t.awk
BEGIN {
	if (1) {
		print "hello"
	}
	print "end"
}
$ go run . -d -covermode set -coverprofile cover.out -f t.awk
BEGIN {
    if (1) {
        __COVER[1] = 1
        print "hello"
    }
    __COVER[2] = 1
    print "end"
}
hello
end
$ cat cover.out
mode: set
t.awk:2.18,2.32 1 1
t.awk:2.18,2.32 1 1
t.awk:2.18,2.32 1 1
t.awk:2.18,2.32 1 1
t.awk:2.34,2.46 1 1
t.awk:2.18,2.32 1 1
t.awk:2.34,2.46 1 1
t.awk:2.18,2.32 1 1
t.awk:2.34,2.46 1 1
t.awk:2.18,2.32 1 1
t.awk:2.34,2.46 1 1
t.awk:3.3,3.16 1 1
t.awk:5.2,5.13 1 1
t.awk:3.3,3.16 1 1
t.awk:5.2,5.13 1 1
$ go tool cover -html=cover.out
cover: did not find package for t.awk in go list output

Do you know why? How did you run go tool cover?

@xonixx
Copy link
Contributor Author

xonixx commented Aug 22, 2022

Just put a commit with a fix. Basically go tool cover handles it if we put absolute paths to awk sources in a cover report.

When you re-run the test, please rm cover.out first. The way I implemented it - if the coverprofile file is present it's not truncated but rather appended with new coverage data. This helps to collect coverage info across multiple invocations and this is handled properly by go tool cover.

@benhoyt
Copy link
Owner

benhoyt commented Aug 22, 2022

Thanks for that (it works). I've gone through your code, and have a few high-level thoughts. There are two approaches we could go here:

  1. Leave coverage support in a branch. This seems fairly reasonable to me, as it will be a pretty rarely-used tool, I think!
  2. Clean it up and merge it to master.

The advantage of leaving it in a branch is you'll be able to iterate faster, and code review won't be nearly as strict (I'm fairly strict about what I include in master as I'll need to understand it fully and maintain it :-). On the other hand, it'd be nice to have it in master as a "selling point" of GoAWK.

I think if we want it in master, there are a few structural things I'd like to see addressed.

  1. I think we need a better way to track the filenames being parsed. I don't like the lexer.FILENAME_QUOTE thing you've got there now -- it's a fun hack, but I think this needs restructuring. We could either add proper support in the parser for multiple files with names, for example, adding something like this API to package parser:
type Parser {
    p *parser
}

func NewParser(config *ParserConfig) *Parser { ... }

func (p *Parser) ParseFile(path string, source io.Reader) error { ... }

func (p *Parser) Program() *Program { ... }

Or similar, so that we can incrementally parse files and the parser can store their filenames.

We'd need to adjust Program so it tracked the file names. I'm thinking a top-level data structure that is a slice of path names with their starting line number. We could also use this instead of errorFileLine, where I'm re-reading the source files if there's a parser error to determine which file it belongs to (or we could do it that way for coverage too, though it's a bit of a hack).

  1. Instead of GetBoundary, I'd suggest we add a Start() and End() method to every node type, similar to how every Node in Go's own AST has a Pos() and End(). This wouldn't store the filenames, as those would be in the separate data structure mentioned above. If you do those for all nodes, including expressions, then the Start() and End() of each statement can be defined in terms of the Start() / End() of its sub-nodes (like this).

  2. Currently the parser resolves arrays and variables as it goes (the "resolver"). This is problematic for when the caller wants to modify the AST as in the case of cover.go. (Other callers can't modify the AST as it's in an internal package.) So I think we want to update the parser API to allow this. Maybe add a Recompile() or Update() method to *Program which re-resolves and re-compiles? However, that would require separating out the resolver to a separate pass ... which is probably good to do anyway, but a bit of work.

  3. I'm not sure about appending coverage data. Is this how Go does it? (I thought it rewrote the file every time.) It seems error-prone to append it.

  4. For GetCoverData, instead of adding that specific method, I wonder if we should add a method func (p *Interpreter) GetArray(name string) map[string]string which fetches the contents of any named array? Later we could add other functions to get global variables, too.

I'm interested to hear your thoughts on which way you'd rather go: keeping this work in a branch, or getting it cleaned up and merged into master. And any thoughts you have on the specific points too.

@xonixx
Copy link
Contributor Author

xonixx commented Aug 22, 2022

I would prefer to merge in master, therefore I'm ready to address all required issues for this. (Of course, provided that I have enough time and I am alive, since there is a war in my country...)

Now I work on a test suite. I believe the complete e2e test suite for this functionality would be very useful (if not critical) for the rewrite.

I'll take a look at all items deeper later. However couple comments right away.

Indeed the most problems I've got is because of item 3 of your list. In fact this was my first iteration before the "hack". But it failed miserably when I realized that I can't simply re-resolve the "merged" program. So yeah, decoupling of parse / resolve steps needed.

Regarding 4. I'm not quite sure if Go does this - need to check. But the interesting thing is that its coverage format is designed in a way to support this! This is specifically useful for my scenario, where for testing purposes I run awk program many times with different inputs using this technique. So here is how this looks in real life.

But I'm totally fine with changing the default behavior here and hiding it behind a flag.

@xonixx
Copy link
Contributor Author

xonixx commented Aug 23, 2022

I think the most reasonable would be to start from item 3, namely decoupling of parse / resolve steps.
I think I can do this separately in its own PR to keep the scope limited and to move incrementally.
Then I can base my other changes for Code Coverage on top of this.

What do you think @benhoyt?

@benhoyt
Copy link
Owner

benhoyt commented Aug 23, 2022

All the best for you and your family -- war is like nothing we've experienced in far-away New Zealand.

Yes, decoupling that in a separate PR sounds like the right way to go, thanks! Ideally it'd be in internal/resolver (similar to internal/compiler). Note that the exported API needs to remain the same (though we can add new API if necessary).

@xonixx
Copy link
Contributor Author

xonixx commented Sep 16, 2022

Hey @benhoyt.

I would like to proceed with the coverage effort.

If you don't mind I would tackle the first item in a separate PR too. So this would be sort of refactoring and I hope I will improve with errorFileLine as well.

I already spent some time working on this, and seems like I have sort of a problem, would like to hear your thoughts.

So I started from adding the new API like you proposed:

type Parser {
    p *parser
}

func NewParser(config *ParserConfig) *Parser { ... }

func (p *Parser) ParseFile(path string, source io.Reader) error { ... }

func (p *Parser) Program() *Program { ... }

My understanding was that each call to ParseFile parses each individual source file to AST and then the Program() call merges all the ASTs and applies resolution and compilation. Is this how you imagined this to work?

But this doesn't seem to work because resulting positions in each AST would start from zero, so when merged - will be no way to tell the actual file/line as we do now by translating the global position in merged source to local position in a file.

So the alternative understanding of how this might work that I came to - is it'll be something similar to what we have now, only in terms of new API. Namely, the ParseFile() just appends to source (string or byte[]) and actual parsing + resolution/compilation happens in Program(). The difference is that we also additionally record in ParseFile() the number of lines in each file (basically just calculating number of "\n" in a file).

Or maybe you think of some other logic?

I've also analyzed the possibilty to feed the parser/lexer with more and more text so eventually feeding same parser with all the source files. But looks like the parser/lexer API is designed in a way to only allow to parse single string, since it comes in the constructor.

@xonixx
Copy link
Contributor Author

xonixx commented Sep 20, 2022

Actually I've found a reasonable approach to this, and prepared a PR #150.
The actual approach to solution is different from the couple options I've mentioned above, it's described in PR in more details.

@xonixx xonixx mentioned this issue Oct 22, 2022
benhoyt pushed a commit that referenced this issue Oct 23, 2022
One more PR for #144 that addresses couple of minor things, missed in #154 

In this PR:

1. Remove unintuitive behavior (outputting annotated code when only `-covermode` flag set) - just use `-d` flag
2. Update cover.png screenshot: now we show if/for/while conditions in green (covered)
3. Fix for `./goawk -covermode=set 'BEGIN { print 123 }'` to not hang waiting for stdin. This started to happen because processing stops if `prog.Actions == nil` and we by mistake were changing it to empty slice during coverage annotation.
@benhoyt
Copy link
Owner

benhoyt commented Oct 24, 2022

This is done now, primarily in #154 -- thanks @xonixx!

@benhoyt benhoyt closed this as completed Oct 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants