-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Switch PVF preparation to CPU time #4217
Comments
Is another thread necessary? I think it's possible to wrap this syscall into the future that will check the usage every second on polling. Then simply |
According to the manual, @slumber the wasmtime compilation is non-async blob, so I assume you mean a future that wraps the wasm compiling thread and internally uses a timer to poll that thread (there needs to be some source of a wake to poll every ~1sec)? Bottom line is still one extra thread is required, which one is executed in the thread depends on the usage semantics - async or not. |
wasmtime's |
This |
Ok, I see where the misunderstanding is. See the quote:
This assumes the context of the children worker (that's where the compilation of the PVF is happening atm and that's why I mentioned OTOH, the code that you linked is the second case, that is:
In that case, we do not change anything on the child process but change the polkadot validation host process. This way the child worker won't get a new thread. We cannot solely rely on So the idea that I expressed there, is that we will still leave the timer as is (i.e. leave it as a wallclock), but just increase the timeout. After either the process finishes or the deadline timer triggers, we see how much of CPU time was spent in the child process. I hope this is more clear. |
@m-cat also an interesting ticket I would really love to see implemented some time soon. |
I've been investigating this and @eskimor asked me to post my findings. Code here. GoalWe wanted to determine how CPU time was affected by load, and to what degree it I ran an experiment by launching simultaneous processes and counting how many ParametersI had another thread in each process that would run and terminate the process Another parameter was a 1 second delay between sleeps in the polling thread, Ran on an 8-core machine. Data
Results
Next steps@eskimor suggests a hybrid approach:
I will next be investigating this question:
|
Awesome, thanks @m-cat ! What we can see is that wall clock time almost as good/bad a metric until 16 processes only then the number of hashes per wall clock time declines further, while per cpu time it stays stable. This means cputime is likely only a significant improvement on very high load. We reduce the factor from 73/21 = 3.47 to 73/44 = 1.659 ... so the variation under heavy load can be pretty much halved, which is still quite an improvement - not as good as we hoped (perfectly stable) - but still seems worthwhile to explore! Is there maybe any other timing information available, that is even less susceptible to system load? |
Probably not. Given that the number of hashes per CPU time stays stable once all threads (16) are saturated, I would assume that the slowdown comes from CPU throttling to prevent it from overheating. Usually CPUs can do higher clock speeds on single thread and have to reduce clock speed once more cores get busy. |
TL;DR: cputime indeed helps reducing variance due to load. Variance remains, likely due to thermal management of CPUs. |
@eskimor To get the child's CPU time in the parent before execution ended, it
|
Well, the portable way to get this information should be |
For the platform dependent option: We could go that way if there are alternatives on other platforms. In any case if a platform independent solution is available (e.g. the open socket) we should go with that one. |
Appears some BSDs have procfs btw, not sure if it's the same, and I guess not MacOS. |
I had a quick call with @eskimor today, where I proposed we simply have the So having the child poll the CPU time and exit on its own was deemed acceptable, Please let me know if I missed any important details @eskimor, as most of the Now, I thought about this problem a bit and believe we can simplify more: I'm |
We do what the running time in future, but the child could supply this when done. |
For controlling the execution time of preparation (and potentially execution) of PVF it seems to be better to use the CPU time and not the wall clock. The hope here is that CPU time has less influence by the load of the machine.
The CPU time can be obtained by either clock_gettime or getrusage for the calling process. #4123 makes the PVF compilation single-threaded, so there should be only one thread compiling or executing. There should be a watchdog thread though that wakes up from time to time (as roughly as 1 sec) and terminates the process in case it reached the deadline. That means that the controlling of timeouts shifts from the polkadot host into the worker, which is not ideal but acceptable.
Alternatively, we could leave it as is maybe just increasing the current timeout. Then, we would
wait4
on the children worker and get itsrusage
(not unlike the well knowntime
program). This would make the preparation process completely single threaded and leave the control of the timeouts on the host. However, that seems more complicated in terms of implementation.The text was updated successfully, but these errors were encountered: