[MXNET-1404] Implement storage tagging, the first half of the memory profiler #17656

ArmageddonKnight · 2020-02-21T21:37:07Z

Description

implement storage tagging, the first half of the memory profiler

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
Changes are complete (i.e. I finished coding on this PR)
Code is well-documented:
For user-facing API changes, API doc string has been updated.
For new C++ functions in header files, their functionalities and arguments are documented.
For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

This PR is the first half of the GPU memory profiler. It implements storage tagging which adds profiler scope, name, and data structure information to each allocated storage handle.

Example

    profiler.set_state('run')

    model = nn.HybridSequential(prefix='net_')
    with model.name_scope():
        model.add(nn.Dense(128, activation='tanh'))
        model.add(nn.Dropout(0.5))
        model.add(nn.Dense(64, activation='tanh'),
                  nn.Dense(32, in_units=64))
        model.add(nn.Activation('relu'))
    model.initialize(ctx=mx.gpu())
    model.hybridize()

    inputs = mx.sym.var('data')

    with mx.autograd.record():
        out = model(mx.nd.zeros((16, 10), ctx=mx.gpu()))
    out.backward()
    mx.nd.waitall()
    profiler.set_state('stop')
    profiler.dump(True)

The code snippet above will generate the following gpu_memory_profile.csv:

Attribute Name	Requested Size	Actual Size
<unk>:in_arg:data	640	4096
net:arg_grad:net_dense0_bias	512	4096
net:arg_grad:net_dense0_weight	5120	8192
net:arg_grad:net_dense1_bias	256	4096
net:arg_grad:net_dense1_weight	32768	32768
net:arg_grad:net_dense2_bias	128	4096
net:arg_grad:net_dense2_weight	8192	8192
net:dense0:net_dense0_fwd	8192	8192
net:dense0:tanh:net_dense0_tanh_fwd	8192	8192
net:dense1:net_dense1_fwd	4096	4096
net:dense1:tanh:net_dense1_tanh_fwd	4096	4096
net:dense2:net_dense2_fwd	2048	4096
net:dense2:net_dense2_fwd_backward	4096	4096
net:dropout0:net_dropout0_fwd	8192	8192
net:dropout0:net_dropout0_fwd	8192	8192
net:in_arg:net_dense0_bias	512	4096
net:in_arg:net_dense0_weight	5120	8192
net:in_arg:net_dense1_bias	256	4096
net:in_arg:net_dense1_weight	32768	32768
net:in_arg:net_dense2_bias	128	4096
net:in_arg:net_dense2_weight	8192	8192
net:relu0:net_relu0_fwd	2048	4096
net:relu0:net_relu0_fwd_backward	8192	8192
net:relu0:net_relu0_fwd_head_grad	2048	4096
resource:cudnn_dropout_state (dropout-inl.h +258)	1671168	1671168
resource:temp_space (fully_connected-inl.h +316)	34816	36864

ci/docker/runtime_functions.sh

…che#17656)

ArmageddonKnight requested review from anirudh2290, eric-haibin-lin and szha as code owners February 21, 2020 21:37

ArmageddonKnight mentioned this pull request Feb 22, 2020

[MXNET-1404] Implement storage tagging, the first half of the memory profiler #17376

Closed

5 tasks

ArmageddonKnight force-pushed the bojian/Storage_Tagging_ii branch 5 times, most recently from 53457cb to fd2a271 Compare February 25, 2020 18:51

ArmageddonKnight force-pushed the bojian/Storage_Tagging_ii branch 4 times, most recently from 4f118b1 to 24ce784 Compare March 4, 2020 22:28

ArmageddonKnight requested a review from leezu as a code owner March 5, 2020 07:38

ArmageddonKnight force-pushed the bojian/Storage_Tagging_ii branch from 592de97 to 2146773 Compare March 6, 2020 04:30

ArmageddonKnight requested review from aaronmarkham and marcoabreu as code owners March 6, 2020 06:29

ArmageddonKnight force-pushed the bojian/Storage_Tagging_ii branch 6 times, most recently from d37d7ba to 772fc7b Compare March 9, 2020 13:38

Implement storage tagging, the first half of the memory profiler

609f7e0

ArmageddonKnight force-pushed the bojian/Storage_Tagging_ii branch from 772fc7b to 609f7e0 Compare March 9, 2020 20:59

szha merged commit 4dddb08 into apache:master Mar 10, 2020

leezu reviewed Mar 11, 2020

View reviewed changes

ci/docker/runtime_functions.sh Show resolved Hide resolved

MoisesHer pushed a commit to MoisesHer/incubator-mxnet that referenced this pull request Apr 10, 2020

Implement storage tagging, the first half of the memory profiler (apa…

6062906

…che#17656)

ArmageddonKnight deleted the bojian/Storage_Tagging_ii branch May 6, 2020 08:05

anirudh2290 pushed a commit to anirudh2290/mxnet that referenced this pull request May 29, 2020

Implement storage tagging, the first half of the memory profiler (apa…

eb3e0af

…che#17656)

szha mentioned this pull request Aug 15, 2020

[Development] MXNet 2.0 Update #18931

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MXNET-1404] Implement storage tagging, the first half of the memory profiler #17656

[MXNET-1404] Implement storage tagging, the first half of the memory profiler #17656

ArmageddonKnight commented Feb 21, 2020 •

edited

Loading

[MXNET-1404] Implement storage tagging, the first half of the memory profiler #17656

[MXNET-1404] Implement storage tagging, the first half of the memory profiler #17656

Conversation

ArmageddonKnight commented Feb 21, 2020 • edited Loading

Description

Checklist

Essentials

Changes

Example

ArmageddonKnight commented Feb 21, 2020 •

edited

Loading