Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Use all available CPUs to collect dump #14

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

sourabhjains
Copy link
Contributor

By default, use all available CPUs to collect the dump instead of just one CPU. This reduces the dump collection time significantly, specially for the systems with very large memory configuration.

The graph below show the time reduction in dump collection on large memory system from 4 Hours to 10 Minutes.

                           |
                   14000   | *
                           |
                           |
                           |
                           |
                           |
                           |
                    1600   |      *
                           |
                           |
                    1400   |

execution Time (Sec) |
|
1200 |
|
|
1000 |
| *
|
800 |
|
| *
600 |
| * *
|
------------------------------------
1 9 17 25 33 41

                                      Number of Threads

System details:
Architecture: PowerPC
/proc/vmcore size: 3.7T
Filter level: 1

Similar tests with different filter levels, 31 and 16, also show significant reduction in time consumption for dump collection.

Although the above tests were performed only on the PowerPC architecture, but I think using all CPUs for dump collection will have performance benefits on other architectures too.

A new configuration parameter, kdump_cpus, is introduced to allow users to control the number of CPUs used for dump collection. If kdump_cpus is set to a negative value or if the configured CPU count is larger than the available CPUs in the kdump kernel, it will fall back to the default value, which is all available CPUs.

Note: The newly introduced configuration is only applicable to the makedumpfile core collector for now.

@coiby
Copy link
Member

coiby commented Jul 2, 2024

Hi @sourabhjains,

As reminded by @daveyoung, we haven't verified if increasing the number of CPUs could bring unexpected consequences like breaking kdump. And I also think increasing the number of CPUs will demand more memory i.e. crashkernel needs to be larger. So it's better to not make this option enabled by default before we have well widely tested it.

@sourabhjains
Copy link
Contributor Author

sourabhjains commented Jul 2, 2024

Hello @coiby

As reminded by @daveyoung, we haven't verified if increasing the number of CPUs could bring unexpected consequences like breaking kdump. And I also think increasing the number of CPUs will demand more memory i.e. crashkernel needs to be larger. So it's better to not make this option enabled by default before we have well widely tested it.

This patch is not about increasing the number of CPUs in the kdump environment. It is more about using the already available CPUs to collect the dump.

For example, if the kdump kernel boots with 8 CPUs, then makedumpfile will use --num-thread=8 to collect the dump faster.

This patch doesn't change the default number of CPUs for the kdump kernel. It only changes the number of makedumpfile threads based on the CPUs available in the kdump kernel.

@coiby
Copy link
Member

coiby commented Jul 3, 2024

Oh, I misunderstood the patch. Thanks for dismissing my concern!

@daveyoung
Copy link
Contributor

Yes, I also misunderstood the purpose, thanks for the explanation. But multi thread is also not well tested as we use one cpu by default. Maybe we can have some test first. @baoquan-he @pfliu what do you think? Otherwise, it would be better to test and compare the single thread captured vmcore and the multi-thread captured vmcore, see if the binary files are any different.

@sourabhjains
Copy link
Contributor Author

Hello @daveyoung

Yes, I also misunderstood the purpose, thanks for the explanation. But multi thread is also not well tested as we use one cpu by default. Maybe we can have some test first. @baoquan-he @pfliu what do you think? Otherwise, it would be better to test and compare the single thread captured vmcore and the multi-thread captured vmcore, see if the binary files are any different.

Please let me know if you would like me to conduct any specific tests on PowerPC.

@sourabhjains
Copy link
Contributor Author

Hello @daveyoung @coiby

I am not sure if adding the --num-thread=1 command line argument to makedumpfile is the same as not adding it.

Shall we not add --num-thread if the number of available CPUs is 1 or the user is configured to use only 1 CPU, to avoid changing the default behavior?

Thanks,
Sourabh Jain

@prudo1
Copy link
Collaborator

prudo1 commented Jul 16, 2024

Hi @sourabhjains,
I think in principle we can go this way. Nevertheless some comments from my side.

  1. I would prefer to only add --num-threads=$cpus when $cpus > 1. To keep the default behavior the same.
  2. Even when the cpus are already online it would be good to know how much additional memory we need when running multi-threaded.
  3. I don't see the point in adding the new kdump_cpus config option in kdump.conf. On most architectures, except power, we set nr_cpus=1 per default on the kernel command line. So setting kdump_cpus in kdump.conf won't have any impact unless the user also changes nr_cpus in /etc/sysconf/kdump. This will confuse most users. So I think it's better that the user simply increases nr_cpus, which needs to do be done anyway. This also has the benefit that we don't need to rebuild the kdump initrd for the change to take effect.

@sourabhjains
Copy link
Contributor Author

Hi @sourabhjains, I think in principle we can go this way. Nevertheless some comments from my side.

Thanks @prudo1 for the review.

  1. I would prefer to only add --num-threads=$cpus when $cpus > 1. To keep the default behavior the same.

Sure I will update the patch.

  1. Even when the cpus are already online it would be good to know how much additional memory we need when running multi-threaded.

I will share the statistics.

  1. I don't see the point in adding the new kdump_cpus config option in kdump.conf. On most architectures, except power, we set nr_cpus=1 per default on the kernel command line. So setting kdump_cpus in kdump.conf won't have any impact unless the user also changes nr_cpus in /etc/sysconf/kdump. This will confuse most users. So I think it's better that the user simply increases nr_cpus, which needs to do be done anyway. This also has the benefit that we don't need to rebuild the kdump initrd for the change to take effect.

On PowerPC, we have another dump collection mechanism called fadump, which doesn't use nr_cpus=1 for the capture kernel. Regardless, I agree with you that introducing a new configuration is not necessary. It is better to use all available CPUs directly.

Apologies for the delayed response.

By default, use all available CPUs to collect the dump instead of just
one CPU. This reduces the dump collection time significantly, specially
for the systems with very large memory configuration.

The graph below show the time reduction in dump collection on large
memory system from 4 Hours to 10 Minutes.

                               |
                       14000   | *
                               |
                               |
                               |
                               |
                               |
                               |
                        1600   |      *
                               |
                               |
                        1400   |
 execution Time (Sec)          |
                               |
                        1200   |
                               |
                               |
                        1000   |
                               |            *
                               |
                        800    |
                               |
                               |                   *
                        600    |
                               |                          *     *
                               |
                                ------------------------------------
                                   1    9    17    25     33    41

                                          Number of Threads

System details:
Architecture: PowerPC
/proc/vmcore size: 3.7T
Filter level: 1

Similar tests with different filter levels, 31 and 16, also show significant
reduction in time consumption for dump collection.

Although the above tests were performed only on the PowerPC architecture, but
I think using all CPUs for dump collection will have performance benefits on
other architectures too.

Note: The `--num-threads` command-line argument is added to the `makedumpfile`
core collector only when more than one processing unit is available in the
system. This helps to keep the default case (nr_cpus=1) the same.

Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
@sourabhjains
Copy link
Contributor Author

Patch has been update as suggested in #14 (comment)

Change log:

  • remove kdump_cpu kdump config
  • Do not add --num-threads command line argument to makedumpfile if only 1 CPU present in the system (to keep the default case same)

@sourabhjains
Copy link
Contributor Author

sourabhjains commented Aug 14, 2024

Makedumpfile memory consumption with --num-thread

The graph shows relationship between Makedumpfile memory consumption and the --num-thread option.

My observation is that, on average, there is a 1MB per CPU memory increase.

Given that Power systems can have hundreds of CPUs, using --num-thread=nproc may not be a good idea.

My experiment shows that the performance gain maxes out at --num-threads=32 CPUs on PowerPC. Further increasing the thread count doesn't result in much additional performance gain.

Since each additional thread requires 1MB of memory, and there isn't much performance improvement beyond --num-thread=32, I think we should set a cap on the number of threads for Makedumpfile.

Please let me know your opinion on this.

P.S.

The above graph is plotted using below points:

CPUs(--num-threads), Maximum resident set size (kbytes)
0,4096
1,4096
2,6144
3,6144
4,8192
5,8192
6,10240
7,10240
8,10240
9,12288
10,12288
11,14336
12,14336
13,14336
14,16384
15,16384
16,18432

Note: CPUs 0 mean, --num-threads option was not used.

Below command was used to find the Maximum resident set size:

time -v makedumpfile -c -d 31 /proc/vmcore  /tmp/vmcore --num-threads=X

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants