[Help needed] Adjusting build for code coverage #3406

echoix · 2024-02-06T03:51:11Z

I'm writing here to try getting a bit of help.

I've been working for the last two-three weeks or so to bring up some code coverage for the repo. The integration works quite well, and I was nearing the end of it, so I was cross checking that the results were trustworthy. What I observed was that for a no-code-change commit (changing a space in a comment of a workflow file, unrelated to the builds), code coverage changes (ups and down, but not on code that is tested directly). I think I finally understood the issue, but it's the solution I can't get right yet.

So basically, heres how getting code coverage works, and the integration to our repo as I got it now:

Using the codecov service to display the various results, history through time, interacting with PRs, showing differences, etc. I even reworked the CI part last weekend since their beta v4 uploader action was finally released. To have some coverage to upload, we need to generate these files: thats the following points.
For pytest, use the pytest-cov, a plugin that integrates and configures coverage.py. The plugin helps to correctly measure Python code that is called through multiprocessing or subprocesses, and provides summary info at the end. It works quite well, but not a lot is covered with pytest.
For gunittest, use coverage.py directly. To use it at a basic level, it is simply equivalent to replace a call to python with coverage run. But in our usage, it is not enough. Since we have some code installed in $HOME/install (in the Ubuntu workflow), and it isn't really inside the source code tree, but the test code remains in the source tree (and is the current working directory), there's some missing hits. I've got most of it worked out on this, it's not that much of an issue.
Still for gunittest, there's the fact that there's a lot of calls using subprocesses, either for c-based code, and even Python based code. I had to configure some workarounds as explained in the coverage.py docs, in order for the Python instances launched as subprocesses to hook up to coverage.py. It includes (through some env vars refered in the config file) handling the fact that the cwd directory inside the subprocesses can be different, and having the link back to the source code is tricky. Until I solve my other blockers and I prove myself otherwise, I think it is mostly solved.
Still for gunittest, usage of multiprocessing has some similar problems, but are solved by the same configuration made for subprocesses, so no extra to do.
That's it for Python for now. - But grass isn't a Python-only repo, it has C and C++ as a large part of the code base, and the backing library. There's also C/C++ coverage to get, and it gets tested with tests written in Python (or even shell) that also test Python code simultaneously.
That's where using gcov (part of gcc) comes into play. We compile the code with some extra flags to add profiling support, counting arcs, reducing or disabling optimizations (so coverage still matches the code), and a linker flag too. On compilation, a *.gcno note file is placed next to every object file. Roughly, it contains the flowcharts and line numbers (in reality, only the information needed to create it from the object files). These object files can be linked together, and the libgcov, and we have our instrumented program or library. When running counters are incremented and when the process ends, a *.gcda data file is created (by the extra code added to our program), that contains the counters/statistics. If the file already exists, like when that program is called a second time, the statistics are added to the same file, it is not replaced.
Once finished running what we want, we then use the gcov tool, that takes a *.gcda and *.gcno file, with the object file, and creates a *.*.gcov (for a source file, can be source or header file). These .gcov files are the real coverage report files. The codecov action takes care of finding all of these and running gcov with the correct options and fixing paths for the CI environments to match the repo in a cross computer way.
Now we are reaching the tricky parts...
If two instances of the program are running, and then terminates and try to write to the same .gcda file, what happens? Seems like only one of them wins (from my observations and some Stack overflow). The two solutions for this is either :
a) to use two env var, one with a prefix to add to the path of .gcda generated (GCOV_PREFIX) and an env var to strip levels of directories of an object's path before applying the prefix (GCOV_PREFIX_STRIP). That makes it possible for a /user/build/foo.o to create a /target/run/build/foo.gcda. https://gcc.gnu.org/onlinedocs/gcc/Cross-profiling.html A difficulty of using this approach is that our builds create OBJ folders at various depths, and I don't imagine a way that using this would be practical.
b) at compile time, use -fprofile-dir= to specify the folder to place the *.gcda files. It evaluates %p as the process id, and %v{VAR} to an env var. https://gcc.gnu.org/onlinedocs/gcc/Instrumentation-Options.html#index-fprofile-dir
I already use the -fprofile-abs-path so that the files know to generate in the original folder wherever the subdirectory it is, and keep a full path through the mangled file name. It was needed for another reason too.
A Python tool as a easier wrapper to gcov (that can create reports), gcovr, has some other useful info https://gcovr.com/en/7.0/guide/compiling.html#compiler-options
I tried using coverage.%p, but ended up filing up all the runner's disk space for the non-temporal half of the Ubuntu workflow. And all of the jobs took way more time, probably 25-75% more (3 min->9 min/7.5min->16min for pytest, 30.5 -> 32 min / 34 min-> 36min or 34 min -> DNF). For the jobs that succeeded, collecting, creating gcov from data files and uploading coverage took about 1 min 9 sec instead of about ten seconds. Locally, only building and starting the first couple tests created like 3000 folders at the root of the repo and with each about 10-30 gcda files. There were also folders scattered through the subdirectories too, where other OBJ folders were. It's not realistic, since it I suspect that either collecting tests through pytest, or just importing a Python file ends up loading our C-backed library somehow, and so without even starting coverage, there's already many copies of the data files. Yes they can be merged back two-by-two, but I still end up needing to copy back the gcno files with the gcda files, or the opposite, copying back gcda files back next to gcno files so that gcov could work. That is problem number 1.
I tried avoiding temporarily problem 1 by making pytest serial again, but coverage coming from pytest is still unstable. I suspect that even though one test is running at a time, maybe more than one instance of a program/library tries to write and only one of them wins. So until I solve the case for a serial test run, there's no point of trying my idea to use the pytest worker ID (gw1, gw2, gw3...) env var for the name of the profile directory (limiting the number of subfolders needed).

Problem number 2 is related to how we build grass. Remember that a gcno file is created for each object file? When gcda files are written by the instrumented program, it checks that the existing gcda file matches the compilation unit that it was created for (a checksum), such that after a recompilation, an outdated gcda file will be reset on the first start. See some info, then I continue:
https://stackoverflow.com/questions/68136230/gcda-file-coverage-info-not-getting-updated-when-two-executables-run-for-same-s
https://stackoverflow.com/questions/72985855/libgcov-profiling-error-overwriting-an-existing-profile-data-with-a-different
https://stackoverflow.com/questions/71582876/gcov-coverage-limited-to-test-files-in-minimal-g-project/71586011#71586011
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85759
The libcov ˋoverwriting an existing profile data with a different timestamp

message appears when building, for example with r.mapcalc/r3.mapcalc. I think that, appart from the possibility that these two targets are made multiple times, that what is happening is that the common files are compiled two times, once for r.mapcalc and another for r3.mapcalc (when make with -j > 1 ). So we end up with a gcno file overwritten by a second different compilation. What should be happening, ideally, is that the common files, ie

$(AUTO_OBJS)` with the exclusions of both these lines

grass/raster/r.mapcalc/Makefile

Lines 9 to 10 in a1cadc7

    
           r_mapcalc_OBJS  := $(filter-out map3.o xcoor3.o xres3.o, $(AUTO_OBJS)) 
        
           r3_mapcalc_OBJS := $(filter-out map.o xcoor.o xres.o, $(AUTO_OBJS))

Should be compiled (once), and then the remaining specific files should be compiled, before linking them together in

grass/raster/r.mapcalc/Makefile

Lines 20 to 21 in a1cadc7

    
           $(BIN)/$(PGM2)$(EXE): LIBES = $(LIBES2) 
        
           $(BIN)/$(PGM3)$(EXE): LIBES = $(LIBES3)

.
I tried for 5-8 hours and didn't end up with something working. There's too much obscure syntax in the make files for me to understand what's going on.

I hope that the way the recursive makes work here, with the dependencies (that I don't understand completely how they work) don't end up recompiling a dozen times the same library files (until one of the targets from a parallel job completes successfully, I could imagine that at the beginning, since recursive makes don't track dependencies well, some early-used targets' dependencies would be launched more than once).

So that's it, I'm stuck with make and don't know how to unblock myself.

The text was updated successfully, but these errors were encountered:

nilason · 2024-02-06T21:22:53Z

I have no practical experience setting up code coverage. Just some notes:

For gunittest, use coverage.py directly. To use it at a basic level, it is simply equivalent to replace a call to python with coverage run. But in our usage, it is not enough. Since we have some code installed in $HOME/install (in the Ubuntu workflow), and it isn't really inside the source code tree, but the test code remains in the source tree (and is the current working directory), there's some missing hits. I've got most of it worked out on this, it's not that much of an issue.

On the CIs GRASS is installed before the tests, but doesn't have to. In fact building (make) produces the main executable file in bin.* directory and everything goes to dist.* directory. You can do the testing there without installing.

Re r.mapcalc and r3.mapcalc they are two binary files, created by in part common source files (and hence object files).

echoix · 2024-02-07T01:11:18Z

I have no practical experience setting up code coverage. Just some notes:

For gunittest, use coverage.py directly. To use it at a basic level, it is simply equivalent to replace a call to python with coverage run. But in our usage, it is not enough. Since we have some code installed in $HOME/install (in the Ubuntu workflow), and it isn't really inside the source code tree, but the test code remains in the source tree (and is the current working directory), there's some missing hits. I've got most of it worked out on this, it's not that much of an issue.

On the CIs GRASS is installed before the tests, but doesn't have to. In fact building (make) produces the main executable file in bin.* directory and everything goes to dist.* directory. You can do the testing there without installing.

I'm pretty sure that there still would be the need to remap the folders, since the test files aren't in that dist folder. And by installing it also keeps making sure that this step works.

Re r.mapcalc and r3.mapcalc they are two binary files, created by in part common source files (and hence object files).

That I understand. Is there a Linux way to make sure a file isn't overwritten during building? To confirm my hypothesis that the files are not built only once, and that would be one of the causes of that specific message, and would be a reason why incomplete gcda are there.

echoix · 2024-02-07T01:27:30Z

I continued thinking about it. One thing that bothers me with the variation in coverage, was that there was a lot of changes even in the option parsing section of many main functions. I consider that it's normal that not everything is covered there, but, if the same tests are run, the same options parsed should be touched. That's where I'm pretty sure something isn't right.

I tried deleting the data files before starting tests, but uploading the coverage collected while building just before, to compare the two. But the coverage in the option parsing part of the main functions still varied. So some data must be lost somewhere. Even for the pre-tests coverage, programs built and used when building to generate the descriptions that call the modules, should still call all of the modules and pass through the same options, whatever the concurrency situation there is.

and use a .META file to tell make that both .c and .h are created by one call to allow for reproducible builds. Without this change, r.mapcalc and r3.mapcalc would be rebuilt a 2nd time in a make -j1 run with slightly varied order of mapcalc.yy.o and mapcalc.tab.o This might help improve OSGeo#3406 This patch was done while working on reproducible builds for openSUSE.

to tell make that both .c and .h are needed This allows for reproducible builds. Without this change, r.mapcalc and r3.mapcalc would be rebuilt a 2nd time in a make -j1 run with slightly varied order of mapcalc.yy.o and mapcalc.tab.o This might help improve OSGeo#3406 This patch was done while working on reproducible builds for openSUSE.

Add explicit dependency on mapcalc.tab.c to tell make that both .c and .h are needed This allows for reproducible builds. Without this change, r.mapcalc and r3.mapcalc would be rebuilt a 2nd time in a make -j1 run with slightly varied order of mapcalc.yy.o and mapcalc.tab.o This might help improve #3406 This patch was done while working on reproducible builds for openSUSE.

Add explicit dependency on mapcalc.tab.c to tell make that both .c and .h are needed This allows for reproducible builds. Without this change, r.mapcalc and r3.mapcalc would be rebuilt a 2nd time in a make -j1 run with slightly varied order of mapcalc.yy.o and mapcalc.tab.o This might help improve OSGeo#3406 This patch was done while working on reproducible builds for openSUSE.

ChromiteExabyte · 2024-08-12T04:01:24Z

@echoix Is this still an open issue?

This is an interesting problem!
Figuring this out would be helpful for code coverage and a useful exercise.

Hopefully this reply to adds to this thread; this issue is beyond my ability but within my interests. :)

Your hypothesis: 'files are built not only once' is certainly a testable hypothesis. Testable is good!
~ In Linux, files can have the attribute immutable. In principle that is useful here, I don't know how to apply that in practice.

A very verbose log with timestamps of the build could also allow a helpful approach. Have the build logs been helpful with the testing?

Clang is a different compiler and it might churn out different results. I think that because clang compiles machine-intermediate-langauge, there's different tools that exist for testing coverage.
https://clang.llvm.org/docs/SourceBasedCodeCoverage.html

The Linux Test Project has an extension of gcov called lcov and it also could be relevant?
https://github.com/linux-test-project/lcov

My understanding of the make / build issues here could be quite bad in worst-case scenarios.
If I understand correctly, GRASS has more than one make file, and that adds to the complexity of things. There is a paper that discusses make and

Recursive Make Considered Harmful, 2004, ACCU
https://accu.org/journals/overload/14/71/miller_2004/

Although computer science has many papers that label certain things as harmful
https://en.wikipedia.org/wiki/Considered_harmful

There's a lot to this for sure.
Would separating this into Python & C issues individually help?

echoix · 2024-08-12T11:35:13Z

I want to take some better time to answer, but yes, still on my radar, and worked towards it during the community meeting. I went down a huge rabbit hole for the c/c++ part, even asking authors of a paper for the code of their tool, that they published for me, and I fixed some parts of it to not have too much copying that slowed everything down. I never finished analyzing that. The effort on CMake went back alive, and that meant that I could try to use that build definition instead. A part of the detour was making the python bytecode compilation actually do something useful, but never finished it, and is debatable whether we need it or not.

After the community meeting I retried, but scoping myself only to Python. I got the gunittest part working correctly (enough):
echoix#168 and for macOS echoix#176

Problem is the slowdown. Instead of something taking 35 min, it takes 48+. macOS tests almost doubling in time. Not fun.

I tried slipcover, that seemed to work, found how to integrate their json output back to a coverage.py data file (see some PRs to my fork), and use coverage.py rules to fix paths and create reports. The thing is that scripts, on Linux, do not end in .py, and slipcover limits itself to .py files (at least without patching the tool).

echoix · 2024-08-12T11:46:55Z

The tool for analyzing makefile builds that was released by the authors after I asked them about it was veribuild. I wasn't able to find what I was looking for with the other (research level) existing tools, like mkcheck and the ones before. Mkcheck I got far though, but it required lots and lots of builds. (And had to adapt the syscalls supported to be usable as of this spring)

echoix added C Related code is in C CI Continuous integration help wanted Extra attention is needed labels Feb 6, 2024

echoix mentioned this issue Feb 11, 2024

r.mapcalc: Add explicit dependency on mapcalc.tab.h #3415

Merged

echoix mentioned this issue Feb 17, 2024

[Bug] libgrass_gis missing dependency on libgrass_datetime #3434

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Help needed] Adjusting build for code coverage #3406

[Help needed] Adjusting build for code coverage #3406

echoix commented Feb 6, 2024

nilason commented Feb 6, 2024

echoix commented Feb 7, 2024

echoix commented Feb 7, 2024

ChromiteExabyte commented Aug 12, 2024

echoix commented Aug 12, 2024

echoix commented Aug 12, 2024 •

edited

Loading

[Help needed] Adjusting build for code coverage #3406

[Help needed] Adjusting build for code coverage #3406

Comments

echoix commented Feb 6, 2024

nilason commented Feb 6, 2024

echoix commented Feb 7, 2024

echoix commented Feb 7, 2024

ChromiteExabyte commented Aug 12, 2024

echoix commented Aug 12, 2024

echoix commented Aug 12, 2024 • edited Loading

echoix commented Aug 12, 2024 •

edited

Loading