[RFC] Use all available CPUs to collect dump #14

sourabhjains · 2024-06-24T06:42:23Z

By default, use all available CPUs to collect the dump instead of just one CPU. This reduces the dump collection time significantly, specially for the systems with very large memory configuration.

The graph below show the time reduction in dump collection on large memory system from 4 Hours to 10 Minutes.

                           |
                   14000   | *
                           |
                           |
                           |
                           |
                           |
                           |
                    1600   |      *
                           |
                           |
                    1400   |

execution Time (Sec) |
|
1200 |
|
|
1000 |
| *
|
800 |
|
| *
600 |
| * *
|
------------------------------------
1 9 17 25 33 41

                                      Number of Threads

System details:
Architecture: PowerPC
/proc/vmcore size: 3.7T
Filter level: 1

Similar tests with different filter levels, 31 and 16, also show significant reduction in time consumption for dump collection.

Although the above tests were performed only on the PowerPC architecture, but I think using all CPUs for dump collection will have performance benefits on other architectures too.

A new configuration parameter, kdump_cpus, is introduced to allow users to control the number of CPUs used for dump collection. If kdump_cpus is set to a negative value or if the configured CPU count is larger than the available CPUs in the kdump kernel, it will fall back to the default value, which is all available CPUs.

Note: The newly introduced configuration is only applicable to the makedumpfile core collector for now.

coiby · 2024-07-02T07:40:37Z

Hi @sourabhjains,

As reminded by @daveyoung, we haven't verified if increasing the number of CPUs could bring unexpected consequences like breaking kdump. And I also think increasing the number of CPUs will demand more memory i.e. crashkernel needs to be larger. So it's better to not make this option enabled by default before we have well widely tested it.

sourabhjains · 2024-07-02T08:49:02Z

Hello @coiby

As reminded by @daveyoung, we haven't verified if increasing the number of CPUs could bring unexpected consequences like breaking kdump. And I also think increasing the number of CPUs will demand more memory i.e. crashkernel needs to be larger. So it's better to not make this option enabled by default before we have well widely tested it.

This patch is not about increasing the number of CPUs in the kdump environment. It is more about using the already available CPUs to collect the dump.

For example, if the kdump kernel boots with 8 CPUs, then makedumpfile will use --num-thread=8 to collect the dump faster.

This patch doesn't change the default number of CPUs for the kdump kernel. It only changes the number of makedumpfile threads based on the CPUs available in the kdump kernel.

coiby · 2024-07-03T06:21:43Z

Oh, I misunderstood the patch. Thanks for dismissing my concern!

daveyoung · 2024-07-05T02:35:07Z

Yes, I also misunderstood the purpose, thanks for the explanation. But multi thread is also not well tested as we use one cpu by default. Maybe we can have some test first. @baoquan-he @pfliu what do you think? Otherwise, it would be better to test and compare the single thread captured vmcore and the multi-thread captured vmcore, see if the binary files are any different.

sourabhjains · 2024-07-06T06:26:33Z

Hello @daveyoung

Yes, I also misunderstood the purpose, thanks for the explanation. But multi thread is also not well tested as we use one cpu by default. Maybe we can have some test first. @baoquan-he @pfliu what do you think? Otherwise, it would be better to test and compare the single thread captured vmcore and the multi-thread captured vmcore, see if the binary files are any different.

Please let me know if you would like me to conduct any specific tests on PowerPC.

sourabhjains · 2024-07-06T06:44:28Z

Hello @daveyoung @coiby

I am not sure if adding the --num-thread=1 command line argument to makedumpfile is the same as not adding it.

Shall we not add --num-thread if the number of available CPUs is 1 or the user is configured to use only 1 CPU, to avoid changing the default behavior?

Thanks,
Sourabh Jain

prudo1 · 2024-07-16T15:26:25Z

Hi @sourabhjains,
I think in principle we can go this way. Nevertheless some comments from my side.

I would prefer to only add --num-threads=$cpus when $cpus > 1. To keep the default behavior the same.
Even when the cpus are already online it would be good to know how much additional memory we need when running multi-threaded.
I don't see the point in adding the new kdump_cpus config option in kdump.conf. On most architectures, except power, we set nr_cpus=1 per default on the kernel command line. So setting kdump_cpus in kdump.conf won't have any impact unless the user also changes nr_cpus in /etc/sysconf/kdump. This will confuse most users. So I think it's better that the user simply increases nr_cpus, which needs to do be done anyway. This also has the benefit that we don't need to rebuild the kdump initrd for the change to take effect.

sourabhjains · 2024-08-06T04:30:42Z

Hi @sourabhjains, I think in principle we can go this way. Nevertheless some comments from my side.

Thanks @prudo1 for the review.

I would prefer to only add --num-threads=$cpus when $cpus > 1. To keep the default behavior the same.

Sure I will update the patch.

Even when the cpus are already online it would be good to know how much additional memory we need when running multi-threaded.

I will share the statistics.

I don't see the point in adding the new kdump_cpus config option in kdump.conf. On most architectures, except power, we set nr_cpus=1 per default on the kernel command line. So setting kdump_cpus in kdump.conf won't have any impact unless the user also changes nr_cpus in /etc/sysconf/kdump. This will confuse most users. So I think it's better that the user simply increases nr_cpus, which needs to do be done anyway. This also has the benefit that we don't need to rebuild the kdump initrd for the change to take effect.

On PowerPC, we have another dump collection mechanism called fadump, which doesn't use nr_cpus=1 for the capture kernel. Regardless, I agree with you that introducing a new configuration is not necessary. It is better to use all available CPUs directly.

Apologies for the delayed response.

By default, use all available CPUs to collect the dump instead of just one CPU. This reduces the dump collection time significantly, specially for the systems with very large memory configuration. The graph below show the time reduction in dump collection on large memory system from 4 Hours to 10 Minutes. | 14000 | * | | | | | | 1600 | * | | 1400 | execution Time (Sec) | | 1200 | | | 1000 | | * | 800 | | | * 600 | | * * | ------------------------------------ 1 9 17 25 33 41 Number of Threads System details: Architecture: PowerPC /proc/vmcore size: 3.7T Filter level: 1 Similar tests with different filter levels, 31 and 16, also show significant reduction in time consumption for dump collection. Although the above tests were performed only on the PowerPC architecture, but I think using all CPUs for dump collection will have performance benefits on other architectures too. Note: The `--num-threads` command-line argument is added to the `makedumpfile` core collector only when more than one processing unit is available in the system. This helps to keep the default case (nr_cpus=1) the same. Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>

sourabhjains · 2024-08-12T06:17:20Z

Patch has been update as suggested in #14 (comment)

Change log:

remove kdump_cpu kdump config
Do not add --num-threads command line argument to makedumpfile if only 1 CPU present in the system (to keep the default case same)

sourabhjains · 2024-08-14T06:54:11Z

The graph shows relationship between Makedumpfile memory consumption and the --num-thread option.

My observation is that, on average, there is a 1MB per CPU memory increase.

Given that Power systems can have hundreds of CPUs, using --num-thread=nproc may not be a good idea.

My experiment shows that the performance gain maxes out at --num-threads=32 CPUs on PowerPC. Further increasing the thread count doesn't result in much additional performance gain.

Since each additional thread requires 1MB of memory, and there isn't much performance improvement beyond --num-thread=32, I think we should set a cap on the number of threads for Makedumpfile.

Please let me know your opinion on this.

P.S.

The above graph is plotted using below points:

CPUs(--num-threads), Maximum resident set size (kbytes)
0,4096
1,4096
2,6144
3,6144
4,8192
5,8192
6,10240
7,10240
8,10240
9,12288
10,12288
11,14336
12,14336
13,14336
14,16384
15,16384
16,18432

Note: CPUs 0 mean, --num-threads option was not used.

Below command was used to find the Maximum resident set size:

time -v makedumpfile -c -d 31 /proc/vmcore  /tmp/vmcore --num-threads=X

daveyoung requested review from coiby, pfliu and baoquan-he July 2, 2024 06:37

sourabhjains force-pushed the all-cpus branch from fb080b4 to db21788 Compare August 12, 2024 06:06

sourabhjains force-pushed the all-cpus branch from db21788 to b32ef03 Compare August 12, 2024 06:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Use all available CPUs to collect dump #14

[RFC] Use all available CPUs to collect dump #14

sourabhjains commented Jun 24, 2024

coiby commented Jul 2, 2024

sourabhjains commented Jul 2, 2024 •

edited

Loading

coiby commented Jul 3, 2024

daveyoung commented Jul 5, 2024

sourabhjains commented Jul 6, 2024

sourabhjains commented Jul 6, 2024

prudo1 commented Jul 16, 2024

sourabhjains commented Aug 6, 2024

sourabhjains commented Aug 12, 2024

sourabhjains commented Aug 14, 2024 •

edited

Loading

[RFC] Use all available CPUs to collect dump #14

Are you sure you want to change the base?

[RFC] Use all available CPUs to collect dump #14

Conversation

sourabhjains commented Jun 24, 2024

coiby commented Jul 2, 2024

sourabhjains commented Jul 2, 2024 • edited Loading

coiby commented Jul 3, 2024

daveyoung commented Jul 5, 2024

sourabhjains commented Jul 6, 2024

sourabhjains commented Jul 6, 2024

prudo1 commented Jul 16, 2024

sourabhjains commented Aug 6, 2024

sourabhjains commented Aug 12, 2024

sourabhjains commented Aug 14, 2024 • edited Loading

sourabhjains commented Jul 2, 2024 •

edited

Loading

sourabhjains commented Aug 14, 2024 •

edited

Loading