Skip to content
This repository has been archived by the owner on Sep 18, 2024. It is now read-only.

Add quantized model export description #3192

Merged
merged 5 commits into from
Dec 25, 2020
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 42 additions & 3 deletions docs/en_US/Compression/QuickStart.rst
Original file line number Diff line number Diff line change
Expand Up @@ -194,10 +194,10 @@ Some compression algorithms use epochs to control the progress of compression (e

``update_epoch`` should be invoked in every epoch, while ``step`` should be invoked after each minibatch. Note that most algorithms do not require calling the two APIs. Please refer to each algorithm's document for details. For the algorithms that do not need them, calling them is allowed but has no effect.

Export Compressed Model
^^^^^^^^^^^^^^^^^^^^^^^
Export Pruned Model
^^^^^^^^^^^^^^^^^^^^

You can easily export the compressed model using the following API if you are pruning your model, ``state_dict`` of the sparse model weights will be stored in ``model.pth``\ , which can be loaded by ``torch.load('model.pth')``. In this exported ``model.pth``\ , the masked weights are zero.
You can easily export the pruned model using the following API if you are pruning your model, ``state_dict`` of the sparse model weights will be stored in ``model.pth``\ , which can be loaded by ``torch.load('model.pth')``. In this exported ``model.pth``\ , the masked weights are zero.

.. code-block:: bash

Expand All @@ -209,4 +209,43 @@ You can easily export the compressed model using the following API if you are pr

pruner.export_model(model_path='model.pth', mask_path='mask.pth', onnx_path='model.onnx', input_shape=[1, 1, 28, 28])

Export Quantized Model
^^^^^^^^^^^^^^^^^^^^^^
You can export the quantized model directly by using ``torch.save`` api and the quantized model can be loaded by ``torch.load`` without any extra modification. The following example shows the normal procedure of saving, loading quantized model and get related parameters in QAT.

.. code-block:: python

# Init model and quantize it by using NNI QAT
model = Mnist()
configure_list = [...]
optimizer = torch.optim.SGD(model.parameters(), lr=0.01, momentum=0.5)
quantizer = QAT_Quantizer(model, configure_list, optimizer)
quantizer.compress()

model.to(device)

# Quantize aware training
for epoch in range(40):
print('# Epoch {} #'.format(epoch))
train(model, quantizer, device, train_loader, optimizer)

# Save quantized model which is generated by using NNI QAT algorithm
torch.save(model.state_dict(), "quantized_model.pkt")

# Simulate model loading procedure
# Have to init new model and compress it before loading
qmodel_load = Mnist()
optimizer = torch.optim.SGD(qmodel_load.parameters(), lr=0.01, momentum=0.5)
quantizer = QAT_Quantizer(qmodel_load, configure_list, optimizer)
quantizer.compress()

# Load quantized model
qmodel_load.load_state_dict(torch.load("quantized_model.pkt"))

# Get scale, zero_point and weight of conv1 in loaded model
conv1 = qmodel_load.conv1
scale = conv1.module.scale
zero_point = conv1.module.zero_point
weight = conv1.module.weight

If you want to really speed up the compressed model, please refer to `NNI model speedup <./ModelSpeedup.rst>`__ for details.