gcc8+ memory usage regression for compiling indexing_op.o #18501

wkcn · 2020-06-06T05:35:13Z

Description

Hi there, I try to build MXNet2.0 (only cpu) in my laptop with 16GB memory. I found that it takes over 16GB memory to compile a single file src/operator/tensor/index_op.o. I need to create extra 8GB virtual memory for building this file.

Is it possible to divide indexing_op into multiple small files to reduce the memory cost?

Environment

The latest code of MXNet 2.0
Arch Linux

Conclusion

The issue has been solved.

The cost of memory depends on the compiler and the building method (ninja or make)
I build indexing_op.o by ninja with different version of gcc.

Compiler	The cost of memory(Child high-water RSS)
g++ 6.4.1	1.95 GB
g++ 7.4.1	1.78 GB
g++ 10.1.0	11 GB

Besides, since the compiler flags is different in different building ways (for example Makefile enable -funroll-loops, it will takes more memory), the cost of memory is different.

The solution is to build MXNet with g++-6 or g++-7.

The text was updated successfully, but these errors were encountered:

leezu · 2020-06-07T22:03:57Z

Fixing this would be a welcome improvement. Did you investigate if the high memory consumption is consistent among gcc and clang, as well as still present on gcc 9 (or 10) / clang 10?

wkcn · 2020-06-07T23:12:59Z

Hi @leezu , the compiler I used is the latest version of gcc, namely gcc 10.1.0.
indexing_op.o is the only file which takes a long time and over 16GB memory to build.
I have not yet tried clang.

woreom · 2020-06-08T14:52:48Z

I believe maybe I could help since clearly we are having the same problem, I don't have a notion about cross-compiling but I have access to 16GB+ mem computer

wkcn · 2020-06-08T15:43:12Z

I remember that it takes fewer than 8GB memory to build the eldder version of MXNet.
In the latest version, most files still takes fewer than 4GB memory, but only a few of files (e.g. indexing_op) takes more than 16GB memory (building by g++ 10.1.0).

If we can reduce the cost of memory, it is helpful for building MXNet on laptop computer and edge machine, which own less than 8GB/16GB memory.

woreom · 2020-06-08T15:55:22Z

I remember that it takes fewer than 8GB memory to build the eldder version of MXNet.
In the latest version, most files still takes fewer than 4GB memory, but only a few of files (e.g. indexing_op) takes more than 16GB memory (building by g++ 10.1.0).

If we can reduce the cost of memory, it is helpful for building MXNet on laptop computer and edge machine, which own less than 8GB/16GB memory.

Only 4 files take more than 8GB

leezu · 2020-06-08T16:19:03Z

I think we should consider this a release-critical bug. @woreom @wkcn did you try if this affects the 1.7 / 1.x branches as well?

cc: @ciyongch

wkcn · 2020-06-09T00:37:59Z

@leezu sorry that I did not check 1.7 anx 1.x branches.

leezu · 2020-06-10T01:41:07Z

There seem to be some more issues. In certain build configuration with llvm 7, many of the numpy object files blow up

875M    build/CMakeFiles/mxnet.dir/src/operator/tensor/broadcast_reduce_norm_value.cc.o
918M    build/CMakeFiles/mxnet.dir/src/operator/numpy/np_elemwise_broadcast_logic_op.cc.o
1.2G    build/CMakeFiles/mxnet.dir/src/operator/numpy/np_where_op.cc.o
1.9G    build/CMakeFiles/mxnet.dir/src/operator/numpy/np_broadcast_reduce_op_value.cc.o
2.1G    build/CMakeFiles/mxnet.dir/src/operator/numpy/linalg/np_norm_forward.cc.o

ciyongch · 2020-06-11T06:46:01Z

Hi @leezu, @wkcn , as this is only a build issue when building MXNet from source on some certain machines (installed with small memory < 16GB), I suggest not to tag it a block issue for 1.7.0 and consider to include the fix if it's available before the release happened.
User still can install MXNet via binary release/nightly image or increase the virtual memory of their build machine as a workaround.
What do you think?

woreom · 2020-06-11T11:15:04Z

it would be great if you could make a prebuild that works on a raspberry pi with armv7 because I tried to build all versions from 1.2.1 to 1.6.0 and failed.

wkcn · 2020-06-11T11:38:21Z

Hi @ciyongch , I agree that we don't need to tag it a block issue, and the issue can be fixed after MXNet 1.7 releases.

After the problem addressed, we can backport the PR to 1.7.x branch.

wkcn · 2020-06-11T11:48:08Z

Hi @woreom , could you please create a issue about requesting pre-build MXNet on ARM?

MXNet consists of ARM build and test (#18264 , #18058 ). I don't know whether the pre-build package will be released.

woreom · 2020-06-11T11:53:15Z

@wkcn #18471 I did but @leezu closed it. I will open another one

ciyongch · 2020-06-11T14:00:18Z

I agree that we don't need to tag it a block issue, and the issue can be fixed after MXNet 1.7 releases.

After the problem addressed, we can backport the PR to 1.7.x branch.

Thanks for your confirm @wkcn :)

wkcn · 2020-06-11T14:19:30Z

@woreom It seems that the pre-built MXNet 1.5 package will not be uploaded because of ASF licensing policy, but pre-built MXNet 1.7 and 2.0+ on ARM may be uploaded.

Before that, you can try the naive build or cross-compiling, following the instruction: https://mxnet.apache.org/get_started?platform=devices&iot=raspberry-pi&

leezu · 2020-06-11T16:20:16Z

I disagree. Official MXNet releases are source releases. At this point in time, there exist 0 compliant binary releases.
It's very important that we don't introduce regressions that prevent users from building MXNet.

I didn't check if this is present in 1.7, but if it is, it certainly is a release blocker in my opinion. Note that this is probably a regression due the work on mxnet 2. It's not acceptable to introduce such regressions in the 1.x series.

leezu · 2020-06-11T23:30:46Z

I measure the overall memory consumption during compilation using linux control group feature. https://github.com/gsauthof/cgmemtime

Results are

v1.7.x
Child user: 7658.352 s
Child sys : 263.657 s
Child wall: 199.661 s
Child high-water RSS : 1952024 KiB
Recursive and acc. high-water RSS+CACHE : 54680084 KiB

v1.6.x
Child user: 5758.186 s
Child sys : 222.487 s
Child wall: 131.241 s
Child high-water RSS : 2040712 KiB
Recursive and acc. high-water RSS+CACHE : 45344596 KiB

v1.5.x
Child user: 3800.705 s
Child sys : 143.353 s
Child wall: 112.121 s
Child high-water RSS : 1604820 KiB
Recursive and acc. high-water RSS+CACHE : 37374300 KiB

ccache is always cleaned between compilations. Results obtained with:

CC=gcc-7 CXX=g++-7 cmake -GNinja -DUSE_CUDA=0 ..
cgmemtime ninja

This is preliminary in that it measures parallel compilation, thus memory usage is very high. Overall there's a 44% increase from 1.5

leezu · 2020-06-12T01:29:55Z

Doing a single-process build of 1.7.x branch (ninja -j1) just costs around 2 GB memory at maximum.

Child user: 4167.479 s
Child sys : 159.497 s
Child wall: 4327.964 s
Child high-water RSS : 1952008 KiB
Recursive and acc. high-water RSS+CACHE : 2155568 KiB

wkcn · 2020-06-12T01:34:04Z

I'm trying to use ninja to build MXNet 2.0 (the master branch) on my laptop computer (16 GB mem + 8GB virtual mem). I will update the log later.

cmake  -GNinja -DUSE_CUDA=0 ..
cgmemtime ninja

I run ninja two times since the building was interrupted, and the second time continues the building.
(gcc 10.1.0, i7-7500u (2 cores 4 threads), MXNet(master, 1bf881f))

Child user: 3692.505 s
Child sys :  177.550 s
Child wall: 1017.096 s
Child high-water RSS                    :    1852208 KiB
Recursive and acc. high-water RSS+CACHE :    3877980 KiB

Child user: 13315.378 s
Child sys :  353.862 s
Child wall: 3847.364 s
Child high-water RSS                    :   11402844 KiB
Recursive and acc. high-water RSS+CACHE :   12226040 KiB

leezu · 2020-06-12T01:36:24Z

Thanks @wkcn. I'll report the same with gcc7. You are using gcc10 right?

wkcn · 2020-06-12T01:38:47Z

@leezu
Yes, gcc 10.1.0, i7-7500u (2 cores 4 threads), MXNet(master, 1bf881f)

leezu · 2020-06-12T17:02:01Z

Single process build of MXNet master with gcc7 gives the following results:

Child user: 5288.372 s
Child sys :  188.645 s
Child wall: 5481.062 s
Child high-water RSS                    :    2504976 KiB
Recursive and acc. high-water RSS+CACHE :    2674692 KiB

That's a 24% increase to 1.7, but less than 3GB high-water. So I don't think we have any blocking issue here. @wkcn I suggest you reduce the number of parallel builds to stay under 16GB. Also recommend to use ccache to avoid rebuilding.

wkcn · 2020-06-13T10:02:56Z

Hi @leezu , I found the cause.
The cost of memory depends on the compiler and the building method (ninja or make)
I build indexing_op.o by ninja with different version of gcc.

Compiler	The cost of memory(Child high-water RSS)
g++ 6.4.1	1.95 GB
g++ 7.4.1	1.78 GB
g++ 10.1.0	11 GB

Besides, since the compiler flags is different in different building ways (for example Makefile enable -funroll-loops, it will takes more memory), the cost of memory is different.

wkcn · 2020-06-13T10:16:06Z

Hi @leezu @woreom @ciyongch , I have found the cause.

The cause is related to the compiler. g++ 10 takes over 11 GB memory to build indexing_op.o, but g++ 6 and 7 take less than 2 GB.

The solution is to build MXNet with g++-6 or g++-7.

Thanks for your help!

leezu · 2020-06-15T18:15:13Z

@wkcn thank you for investigating this. The regression in gcc is quite serious. Would you check if there is a report at https://gcc.gnu.org/bugs/ and potentially open a new bug report? Eventually gcc10 will be shipped by default on many platforms and this issue may affect more users later.

wkcn · 2020-06-15T21:10:59Z

@leezu Sorry that I do not know how to find the bug report in https://gcc.gnu.org/bugs/

leezu · 2020-06-15T21:43:37Z

@wkcn the bugtracker is linked on the page. It's https://gcc.gnu.org/bugzilla/

wkcn · 2020-06-16T01:42:30Z

@leezu Thank you! I guess that the bug is memory leak of the compiler gcc 10.1.0.

leezu · 2020-06-24T23:55:28Z

According to #15393 (comment) the leak already occurs with gcc8

wkcn added Question Build labels Jun 6, 2020

leezu mentioned this issue Jun 8, 2020

Can't install Mxnet on raspberry pi with opencv 4.0.0 #18471

Closed

leezu added Bug v2.0 labels Jun 8, 2020

leezu changed the title ~~It costs over 16GB memory to compile indexing_op.o~~ It costs ~3GB memory to compile indexing_op.o Jun 12, 2020

This comment has been minimized.

Sign in to view

wkcn closed this as completed Jun 15, 2020

woreom mentioned this issue Jun 15, 2020

MXNET build not working on armv7 (raspberry pi 4) #18536

Closed

leezu changed the title ~~It costs ~3GB memory to compile indexing_op.o~~ gcc10 memory usage regression for compiling indexing_op.o Jun 15, 2020

leezu added Upstream and removed Question v2.0 labels Jun 15, 2020

leezu reopened this Jun 15, 2020

leezu changed the title ~~gcc10 memory usage regression for compiling indexing_op.o~~ gcc8+ memory usage regression for compiling indexing_op.o Jun 24, 2020

leezu mentioned this issue Jun 24, 2020

Unable to build mxnet with OpenCV4 on Raspberry Pi 3B #15393

Closed

This comment has been minimized.

Sign in to view

ciyongch mentioned this issue Jul 3, 2020

[v1.7.x] backport mixed type binary ops to v1.7.x #18649

Merged

This was referenced Dec 3, 2020

out of memory during compilation on CI #19623

Open

[FEATURE] Restore Quantization API to MXNet #19587

Merged

wkcn mentioned this issue Dec 17, 2020

[RFC] Large MXNet source files causing CI build failures #19688

Open

mseth10 mentioned this issue Dec 21, 2020

use CC=gcc-7 CXX=g++-7 for all unix CI builds #19701

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gcc8+ memory usage regression for compiling indexing_op.o #18501

gcc8+ memory usage regression for compiling indexing_op.o #18501

wkcn commented Jun 6, 2020 •

edited

Loading

leezu commented Jun 7, 2020

wkcn commented Jun 7, 2020

woreom commented Jun 8, 2020

wkcn commented Jun 8, 2020 •

edited

Loading

woreom commented Jun 8, 2020

leezu commented Jun 8, 2020

wkcn commented Jun 9, 2020

leezu commented Jun 10, 2020

ciyongch commented Jun 11, 2020

woreom commented Jun 11, 2020

wkcn commented Jun 11, 2020

wkcn commented Jun 11, 2020

woreom commented Jun 11, 2020 •

edited

Loading

ciyongch commented Jun 11, 2020

wkcn commented Jun 11, 2020

leezu commented Jun 11, 2020 •

edited

Loading

leezu commented Jun 11, 2020 •

edited

Loading

leezu commented Jun 12, 2020 •

edited

Loading

wkcn commented Jun 12, 2020 •

edited

Loading

leezu commented Jun 12, 2020

wkcn commented Jun 12, 2020 •

edited

Loading

leezu commented Jun 12, 2020

This comment has been minimized.

wkcn commented Jun 13, 2020 •

edited

Loading

wkcn commented Jun 13, 2020

leezu commented Jun 15, 2020

wkcn commented Jun 15, 2020

leezu commented Jun 15, 2020

wkcn commented Jun 16, 2020

leezu commented Jun 24, 2020

This comment has been minimized.

gcc8+ memory usage regression for compiling indexing_op.o #18501

gcc8+ memory usage regression for compiling indexing_op.o #18501

Comments

wkcn commented Jun 6, 2020 • edited Loading

Description

Environment

Conclusion

leezu commented Jun 7, 2020

wkcn commented Jun 7, 2020

woreom commented Jun 8, 2020

wkcn commented Jun 8, 2020 • edited Loading

woreom commented Jun 8, 2020

leezu commented Jun 8, 2020

wkcn commented Jun 9, 2020

leezu commented Jun 10, 2020

ciyongch commented Jun 11, 2020

woreom commented Jun 11, 2020

wkcn commented Jun 11, 2020

wkcn commented Jun 11, 2020

woreom commented Jun 11, 2020 • edited Loading

ciyongch commented Jun 11, 2020

wkcn commented Jun 11, 2020

leezu commented Jun 11, 2020 • edited Loading

leezu commented Jun 11, 2020 • edited Loading

leezu commented Jun 12, 2020 • edited Loading

wkcn commented Jun 12, 2020 • edited Loading

leezu commented Jun 12, 2020

wkcn commented Jun 12, 2020 • edited Loading

leezu commented Jun 12, 2020

This comment has been minimized.

wkcn commented Jun 13, 2020 • edited Loading

wkcn commented Jun 13, 2020

leezu commented Jun 15, 2020

wkcn commented Jun 15, 2020

leezu commented Jun 15, 2020

wkcn commented Jun 16, 2020

leezu commented Jun 24, 2020

This comment has been minimized.

wkcn commented Jun 6, 2020 •

edited

Loading

wkcn commented Jun 8, 2020 •

edited

Loading

woreom commented Jun 11, 2020 •

edited

Loading

leezu commented Jun 11, 2020 •

edited

Loading

leezu commented Jun 11, 2020 •

edited

Loading

leezu commented Jun 12, 2020 •

edited

Loading

wkcn commented Jun 12, 2020 •

edited

Loading

wkcn commented Jun 12, 2020 •

edited

Loading

wkcn commented Jun 13, 2020 •

edited

Loading