Exec spawning is a commonly misunderstood mitigation, mostly due to the impact it had on app openings on the Pixel 3a. In this blog post, I will attempt to show the rationale of the mitigation (from the project), the impact it has on exploitation, as well as multiple sources agreeing that the mitigation is valid (indirectly or directly).
First of all, what is a probabilistic mitigation? Well, they are mitigations that have a chance to defeat the attacker exploiting a vulnerability. Examples of this include ASLR and memory tagging (a new ARM feature relying on cryptographic tags). These rely upon randomization in some way, shape, or form (to create tags in memory tagging, layouts in ASLR, etc.), meaning that a weakness in that randomization is a weakness in that mitigation. AOSP, unfortunately, has a weakness.
Now what really is that weakness? During app spawning, the spawning process, called Zygote, forks a template process and "specializes" it. The details of this specialization are irrelevant here, but this means that all apps share the same randomization (whether that be tags, layouts, or canaries). This means that if that "randomization" is brute forced or broken in some way in an app, every app is now more vulnerable to exploits that these mitigations defended against. An example of this is shown in the Morula paper (also note that Morula gives an alternative approach to exec spawning to fix this problem, sacrificing memory usage for performance). In this paper, the team shows an exploit where a Chrome infoleak (revealing the randomized layout of Chrome) is abused, and then a bug exploited in VLC, using the infoleak. This can be done with any pair of apps (ex: another browser and a messaging app). To fix this issue, GrapheneOS fork()s the template process and then execve()s the app. (The reason that exec() fixes this is because the app is compiled instead of copied, meaning that any randomization occurs again, disregarding the old process's randomization).
This mitigation has lead to cold app spawns being slower on phones like the Pixel 3a (~100 ms), due to it having poorer SSD memory quality (eMMC to be specific). This appears to be less of an issue on the newer phones. GrapheneOS has decided that this mitigation is valuable and worth the tradeoff, but is also looking for a way to lessen the performance impact (using something like Morula).
Many prominent security researchers have talked about the weakness described before, calling it "per boot ASLR" (as randomization happens once per boot). An example of this is Samuel Groß on the Project Zero blog(see note 1. to know the specific quote and area), where it is mentioned that this type of ASLR is a weakness, and one that was shown to be abused in the blog. He also mentions it in a related 36c3 talk. PaX also talks about execve() being a way to improve ASLR (see note 2. for specifics). In another talk, the creator of https://isopenbsdsecu.re talks about the same "per boot ASLR" issue in Android, iOS, and Windows in a 36c3 talk. An exploit developer and security researcher Connor McGarr mentions it as a weakness in Windows (see note 3. for specifics). Although ASLR is mentioned the most, this weakness leads to worse probabilistic mitigations overall.
Exec spawning is an important part of GrapheneOS's security defenses, allowing it to have superior probabilistic defenses to Android, as well as other conventional operating systems and ones forking from AOSP without hardening it. It is considered to be worth the performance impact in GrapheneOS, but they are planning for an alternative.
Note 1: In the Project Zero blog linked above, the relevant paragraph is under "The Dyld Shared Cache". The specific paragraph is: "On iOS (as well as on macOS), most system libraries are prelinked into one giant binary blob, called the dyld_shared_cache. Amongst other benefits, this improves program load times as it reduces the runtime overhead of symbol resolution. One security relevant aspect of the shared cache is that it is mapped at the same address in every process, with its exact location only being randomized once during device boot. This is likely due to the shared cache being mapped into all userspace processes (thus reducing overall system memory usage) but also containing absolute pointers to itself, making it not position independent. As such, once the base address of the shared cache is known, the addresses of pretty much all libraries in any userspace process on that device, including thousands of ROP gadgets, all ObjC Classes, various strings, and much more, are also known. This is sufficient for an RCE exploit."
Note 2: In the PaX documentation linked above, here is the relevant paragraph: "In practice brute forcing can be applied to bugs that are in network daemons that fork() on each connection since fork() preserves the randomized layout, as opposed to execve() which replaces it with a new one. This distinction between the attack methods becomes meaningless if the system has monitoring and reaction mechanisms for program crashes because the reaction can then be triggered at such low levels that the two attack methods will have practically the same (im)probability to succeed."
Note 3: In the blog on Windows exploitation linked above, the relevant paragraph is under "Legacy Mitigation #2: ASLR/kASLR", around the 5th/6th paragraph: "Because Windows only performs ASLR on a per-boot basis, all processes share the same address space layout once the system has started. Therefore, ASLR is not effective against a local attacker that already has achieved code execution."