-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Use all available CPUs to collect dump #14
base: main
Are you sure you want to change the base?
Conversation
Hi @sourabhjains, As reminded by @daveyoung, we haven't verified if increasing the number of CPUs could bring unexpected consequences like breaking kdump. And I also think increasing the number of CPUs will demand more memory i.e. crashkernel needs to be larger. So it's better to not make this option enabled by default before we have well widely tested it. |
Hello @coiby
This patch is not about increasing the number of CPUs in the kdump environment. It is more about using the already available CPUs to collect the dump. For example, if the kdump kernel boots with 8 CPUs, then makedumpfile will use This patch doesn't change the default number of CPUs for the kdump kernel. It only changes the number of makedumpfile threads based on the CPUs available in the kdump kernel. |
Oh, I misunderstood the patch. Thanks for dismissing my concern! |
Yes, I also misunderstood the purpose, thanks for the explanation. But multi thread is also not well tested as we use one cpu by default. Maybe we can have some test first. @baoquan-he @pfliu what do you think? Otherwise, it would be better to test and compare the single thread captured vmcore and the multi-thread captured vmcore, see if the binary files are any different. |
Hello @daveyoung
Please let me know if you would like me to conduct any specific tests on PowerPC. |
Hello @daveyoung @coiby I am not sure if adding the --num-thread=1 command line argument to makedumpfile is the same as not adding it. Shall we not add --num-thread if the number of available CPUs is 1 or the user is configured to use only 1 CPU, to avoid changing the default behavior? Thanks, |
Hi @sourabhjains,
|
Thanks @prudo1 for the review.
Sure I will update the patch.
I will share the statistics.
On PowerPC, we have another dump collection mechanism called fadump, which doesn't use nr_cpus=1 for the capture kernel. Regardless, I agree with you that introducing a new configuration is not necessary. It is better to use all available CPUs directly. Apologies for the delayed response. |
By default, use all available CPUs to collect the dump instead of just one CPU. This reduces the dump collection time significantly, specially for the systems with very large memory configuration. The graph below show the time reduction in dump collection on large memory system from 4 Hours to 10 Minutes. | 14000 | * | | | | | | 1600 | * | | 1400 | execution Time (Sec) | | 1200 | | | 1000 | | * | 800 | | | * 600 | | * * | ------------------------------------ 1 9 17 25 33 41 Number of Threads System details: Architecture: PowerPC /proc/vmcore size: 3.7T Filter level: 1 Similar tests with different filter levels, 31 and 16, also show significant reduction in time consumption for dump collection. Although the above tests were performed only on the PowerPC architecture, but I think using all CPUs for dump collection will have performance benefits on other architectures too. Note: The `--num-threads` command-line argument is added to the `makedumpfile` core collector only when more than one processing unit is available in the system. This helps to keep the default case (nr_cpus=1) the same. Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Patch has been update as suggested in #14 (comment) Change log:
|
The graph shows relationship between Makedumpfile memory consumption and the --num-thread option. My observation is that, on average, there is a 1MB per CPU memory increase. Given that Power systems can have hundreds of CPUs, using --num-thread= My experiment shows that the performance gain maxes out at --num-threads=32 CPUs on PowerPC. Further increasing the thread count doesn't result in much additional performance gain. Since each additional thread requires 1MB of memory, and there isn't much performance improvement beyond --num-thread=32, I think we should set a cap on the number of threads for Makedumpfile. Please let me know your opinion on this. P.S. The above graph is plotted using below points:
Note: CPUs 0 mean, --num-threads option was not used. Below command was used to find the Maximum resident set size:
|
By default, use all available CPUs to collect the dump instead of just one CPU. This reduces the dump collection time significantly, specially for the systems with very large memory configuration.
The graph below show the time reduction in dump collection on large memory system from 4 Hours to 10 Minutes.
execution Time (Sec) |
|
1200 |
|
|
1000 |
| *
|
800 |
|
| *
600 |
| * *
|
------------------------------------
1 9 17 25 33 41
System details:
Architecture: PowerPC
/proc/vmcore size: 3.7T
Filter level: 1
Similar tests with different filter levels, 31 and 16, also show significant reduction in time consumption for dump collection.
Although the above tests were performed only on the PowerPC architecture, but I think using all CPUs for dump collection will have performance benefits on other architectures too.
A new configuration parameter,
kdump_cpus
, is introduced to allow users to control the number of CPUs used for dump collection. Ifkdump_cpus
is set to a negative value or if the configured CPU count is larger than the available CPUs in the kdump kernel, it will fall back to the default value, which is all available CPUs.Note: The newly introduced configuration is only applicable to the makedumpfile core collector for now.