-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for running the test cases in subprocesses #74
Comments
I see you're using the spawn start method. I wonder if you could improve performance here by using a forkserver start method, and using I know that in SciPy, it can take quite a while to import some modules. For example, on my computer it can take 0.4 seconds to run More info: https://bnikolic.co.uk/blog/python/parallelism/2019/11/13/python-forkserver-preload.html |
Wow, first of all, this is impressive! I agree to all of your points, the way the current execution is being built is probably not the best it could be. I am very much willing to integrate this into Pynguin, first, because I believe that it could overcome the limitations and makes Pynguin more flexible, and second, because the track record (five found bugs) is already quite nice—I am pretty sure there is more to follow and I need to maintain a list of found bugs at some point. Regarding the slow-down: do you have some average numbers? 40x is a massive slowdown, I agree, but if this is only a rare worst case the picture would probably look different. Also, could you perhaps try @nickodell 's suggestion (thanks @nickodell for the suggestion of the forkserver) if that would bring an improvement? Random additional thought: even if the execution in a subprocess is slower, do you see any potential to parallelise these executions? Currently, Pynguin uses the threads to isolate executions but only executes the test cases in sequential order. If the subprocess would allow to parallelise test-case executions easily, the overhead might not be that critical any more. |
I do have some numbers regarding the slow-down, but it's a bit hard to interpret an average because the 40x speed decrease was calculated using the few cases where Pynguin didn't crash, so I felt that averaging out all the cases wasn't very representative of the true speed decrease. If you're interested, here's my master's thesis, there are averages on the number of iterations achieved per module in chapters 5 and 7. This thesis also tried to implement a plugin system to allow testers' knowledge to be easily incorporated into the test generation algorithm. Initially, the architectural change was just a necessary improvement to be able to run the plugin system on machine learning libraries but I thought it was the most interesting change to add to Pynguin at the moment. Regarding the forkserver, it might be interesting to check whether this has an impact. However, I noticed that even with just a spawn start method, the thing that took the most time was not sending data to the subprocess, but transferring data from the subprocess to the main process. I haven't checked this in detail yet, but I think it's due to the fact that there are a lot of references in the ExecutionResult class and that, in the end, the whole test cluster is transferred back to the main process. Regarding the parallelization, I did try to implement it at one point and noticed that most of the time, it was faster to start a single subprocess, run every test case in it, and fallback to running each test case in separate subprocesses only when a crash was detected. That's what I've done to improve the speed of the "TestSuiteChromosomeComputation" class. However, it's true that it could be interesting to parallelize the "TestCaseChromosomeComputation" class, but I think that would require a lot of changes, and I didn't do it because I didn't have much time. |
Hi @BergLucas , What I could do is running Pynguin from your branch with the current executor and your subprocess-based executor on some benchmark that I have. I'll set this up and run it—will report back numbers as soon as they have arrived. Also thank you for your thesis, I'll have a look. Your comments regarding the data that has to be transferred between processes is quite helpful; avoiding to transfer large amounts of data might be achievable, but first I want to see how the executors behave on my benchmark. I'll add results to this issue as soon as I have any. |
The subprocess-based executor is only used when using the |
Finally found the time to run Pynguin on the latest version of your branch, @BergLucas . Configurations use DynaMOSA in its default settings, assertion generation deactivated, with a generation timeout of 600s.
Because I know that there are always some failures, I filter the raw data and remove all modules that did not produce a result in all repetitions (I did 15 repetitions per configuration):
Finally, I've also plotted coverage over the 600s generation time: The plot and the results show what you've already noted, namely that there is a large slow-down, which also influences coverage significantly. |
Hi @stephanlukasczyk , That's very interesting. I would have thought that the number of modules that did not yield 15 iterations would have been higher on default execution because of the types of crashes I mentioned before, but that doesn't seem to be the case. In theory, the execution using subprocesses can't crash unless there's a bug in the implementation so I guess the branch isn't super stable yet as I suspected. |
If you are interested and have time for debugging, I can provide you the full raw results, including log files etc. |
Hello, I've fixed two of the crash bugs you identified in SciPy. I think this is very valuable in terms of identifying surprising corner-cases in the library. By the way, I would be interested in looking into the performance issue with subprocess-based execution. Would you mind showing me how to set up your branch to test SciPy? I took a look at the docs, but I'm not sure how to apply that in the case where I'm testing a package that needs to be built before I can test it. |
Thank you @nickodell that you offer to investigate the performance. What I usually do is setting up a new virtual environment (based on Python 3.10 because Pynguin will only work with this version), install Pynguin into this virtual environment and also install the respective library into it--scipy in this case. I can then run Pynguin from this environment, having all the dependencies required. This should work also for binaries that are part of a package. If @BergLucas uses a different approach, he probably could elaborate, too. |
Thanks - when you do this, what do you set the |
I usually also have a source-code checkout of the subject under test (here scipy) lying around to which I point the parameter to. |
Hi @stephanlukasczyk, |
Hi @stephanlukasczyk and thanks for the raw data, I analysed the modules that didn't work at all with my branch and found 2 small bugs that appeared when the data was serialized and sent back to the main process. I've fixed them in this branch: https://github.com/BergLucas/pynguin/tree/improvement/subprocess-execution. I also noticed a few things about the benchmark, such as modules that required git or dependencies that weren't installed, and modules that had circular imports. I don't know if it's normal. Now, I'm also going to check the modules that have run at least once but that also crashed at least once and I'll post a comment afterwards.
|
Here are the data concerning the modules that sometimes fail on my branch. Most of it comes from the bugs mentioned in my previous comment and are therefore fixed, but there are some that are related to Bad file descriptor or SystemError that would be interesting to study in more detail.
|
Thanks for reporting back. I assume the The Anyway, I guess I need to have a look at the logs, too. |
Yes, the dill errors are specific to my implementation and were the cause of the bugs. |
Is your feature request related to a problem? Please describe.
Over the last four months, I've had the opportunity to write a master's thesis about improving automated test case generation for machine learning libraries. During this project, I discovered several limitations and bugs in Pynguin that I've already reported. However, some bugs could not be easily fixed. These are segmentation faults, memory leaks, floating point exceptions, and Python's GIL deadlocks, which do not come from Pynguin but rather from the module under test. Unfortunately, with Pynguin's current architecture which executes test cases in threads rather than subprocesses, these types of bugs cause the main process to crash and, therefore, the crash of Pynguin. I have observed these kinds of crashes on very popular libraries such as
numpy
,pandas
,polars
,scipy
andsklearn
, and it could also happen on other modules as I've only focused on these few.Describe the solution you'd like
To solve the problem, I propose to change some aspects of Pynguin's architecture so that test cases can be executed in subprocesses depending on a Pynguin parameter. I've already built a working prototype here, but due to too much data transfer between the main process and the subprocesses, the execution in a subprocess is up to 40x slower than the execution in a thread, so I think it would first be necessary to rethink the changes I've made in my prototype to increase speed by limiting data transfer between the main process and the subprocesses.
Describe alternatives you've considered
To the best of my knowledge, I think that the only way to detect segmentation faults, memory leaks, etc, is to use subprocesses, so I don't see any other alternatives for dealing with these crashes.
Additional context
With this new architecture, it would also be possible to create error-revealing test cases, as Randoop does. Indeed, by checking the exit code of the subprocesses, a crash can be detected and, therefore, a test case created to reproduce it. This is something that has already been implemented in my prototype and that has already helped me find a few bugs in some libraries:
None
to the functionscipy.ndimage.binary_propagation
scipy/scipy#21009scipy.linalg.interpolative
module scipy/scipy#21010scipy.cluster.hierarchy
module scipy/scipy#21011np.longdouble
with aNone
value numpy/numpy#26767The text was updated successfully, but these errors were encountered: