Usage

Important: For all following examples activate a python console in an environment with all required packages installed (see Installation Guide). The standard only specifies compression of the model weight parameters.

Overview

This software provides a python package called nnc, which works as a stand-alone model compression solution but can also be seamlessly integrated in existing python-based machine learning frameworks. Easy-to-use compression and decompression interfaces allow to achieve high compression without prior knowledge of the compression technologies. Interested users can achieve even higher compression using the advanced features.

Quickstart

A first example

An example model is provided at 'example/squeezenet1_1_pytorch_zoo.pt', which is SqueezeNet1_1 originally downloaded from the PyTorch (torchvision) model zoo (https://github.com/pytorch/vision). For compressing and decompressing the model with default settings do:

import nnc

nnc.compress_model('./example/squeezenet1_1_pytorch_zoo.pt', bitstream_path='./example/bitstream_squeezenet1_1.nnc')
nnc.decompress_model('./example/bitstream_squeezenet1_1.nnc', model_path='./example/reconstructed_squeezenet1_1.pt' )

This will create two files:

The compressed bitstream file at './example/bitstream_squeezenet1_1.nnc'
The reconstructed model file at './example/reconstructed_squeezenet1_1.pt'

PyTorch and TensorFlow Models

The software has a built-in support for PyTorch and TensorFlow models, which means that it can process arbitrary PyTorch/TensorFlow models (ending with .pt, .pth for PyTorch or .h5, .hdf5, .tf for TensforFlow) in the same fashion as in the exampled above. Provided that a model is stored at '/path/arbitrary_model.[pt, pth, h5, hdf5, tf]' it can be compressed and decompressed (default settings) as follows:

import nnc

nnc.compress_model('/path/arbitrary_model.[pt, pth, h5, hdf5, tf]', bitstream_path='/path/bitstream.nnc')
nnc.decompress_model('/path/bitstream.nnc', model_path='/path/reconstructed_arbitrary_model.[pt, pth, h5, hdf5, tf]' )

Key-Parameter: Quantization parameter `qp`

Due to the quantization, the reconstructed model slightly differs from the original one, but the default settings of the NNC software usually do not degrade the model performance. However, there is a key parameter in the encoder function that controls the rate-perfomance trade-off, called quantization parameter: qp (default: qp=-38). Decreasing the qp value will result in a higher bitrate but also in a lower model performance degradation. Accordingly, increasing the qp value will result in a lower bitrate but in a higher model performance degradation. For details refer to the Functions and Parameters section.

For a higher bitrate but lower model degradation, e.g. do:

nnc.compress_model('/path/example_model.[pt, pth, h5, hdf5, tf]', bitstream_path='/path/bitstream.nnc', qp=-42)

For a lower bitrate but a higher model degradation, e.g. do:

nnc.compress_model('/path/example_model.[pt, pth, h5, hdf5, tf]', bitstream_path='/path/bitstream.nnc', qp=-34)

Functions and Parameters

There are two main functions for the encoder called compress_model and compress (see Encoder). Please note, that compress_model internally calls compress. Accordingly, there are two main functions for the decoder called decompress_model and decompress (see Decoder) and here also decompress_model internally calls decompress.

Encoder

def compress_model( model_path_or_object, 
                    bitstream_path="./bitstream.nnc", 
                    qp=-38, 
                    qp_density=2, 
                    nonweight_qp=-75,
                    qp_per_tensor=None,
                    use_dq=True, 
                    codebook_mode=0, 
                    scan_order=0, 
                    lambda_scale=0, 
                    param_opt=True,
                    cabac_unary_length_minus1=10, 
                    opt_qp=False, 
                    ioq=False,
                    bnf=False,
                    lsa=False,
                    fine_tune=False,
                    block_id_and_param_type=None, 
                    model_name=None, 
                    model_executer=None,
                    model_struct=None, 
                    dataset_path=None, 
                    learning_rate=1e-5, 
                    batch_size=64,
                    epochs=30,
                    max_batches=600, 
                    num_workers=8,
                    return_model_data=False,
                    verbose=True,
                    return_bitstream=False,
                   ):

Required input to this function is either a path specifying the location of a stored model or a PyTorch- or TensorFlow-Model of type torch.nn.Module or tensorflow.Module, respectively.

def compress( parameter_dict, 
              bitstream_path="./bitstream.nnc", 
              qp=-38, 
              qp_density=2, 
              nonweight_qp=-75,
              qp_per_tensor=None,
              use_dq=True, 
              codebook_mode=0, 
              scan_order=0, 
              lambda_scale=0 , 
              param_opt=True, 
              cabac_unary_length_minus1=10, 
              opt_qp=False, 
              ioq=False, 
              bnf=False,
              lsa=False,
              fine_tune=False,
              block_id_and_param_type=None, 
              model=None, 
              model_executer=None,
              verbose=True,
              return_bitstream=False,
            ):

Required input to this function is only a dict, with the keys specifying the tensor names as strings and the values representing the tensor (values) as numpy arrays of type numpy.float32 or numpy.int32 regardless of its shape. This dict represents the state dictionary of the NN model.

Information: The function will compress any dict, that fulfills these requirements (keys of type string and values numpy arrays of type numpy.int32/float32), regardless of whether it contains values related to neural network tensors or not.

Parameters

Details on the parameters are given in the following table. For a better understanding also check out the Examples-Section.

Parameter	Function	Description
`model_path_or_object`	`compress_model`	Required, Type: [String, torch.nn.Module, tensorflow.Module], Default: -. Can be either a string specifying the path to the source model file or a model object of type torch.nn.Module or tensorflow.Module to be compressed. If it is a path ,it can be any PyTorch model (ending with pt. or .pth), any TensorFlow model (ending with .h5, .hdf5, .tf) or any file that can be loaded with pythons `pickle` module and that contains a parameter dict which fulfills the requirements of `parameter_dict`.
`parameter_dict`	`compress`	Required, Type: Dict, Default: -. Specifies a python dict which represents the state dictionary to be compressed. The keys are strings which denote the names of the parameter tensors and the values representing the tensors as numpy arrays (ndarrays). The numpy arrays must be of type numpy.float32 or numpy.int32
`bitstream_path`	`compress_model` `compress`	Optional, Type: String, Default: "./bitstream.nnc". Specifies the path where the bitstream file is stored after the compression process. Principally, an arbitrary file ending can be used, since it is not strictly specified, but it is recommended to use ".nnc".
`qp`	`compress_model` `compress`	Optional, Type: int32, Default: -38. Quantization parameter (`qp`) that controls the quantization stepsize and thus the rate-performance trade-off for all weight parameters. A lower `qp` is related to a lower quantization stepsize, which yields a higher bitrate but also a lower model performance degradation. Accordingly, increasing the `qp` value results in a lower bitrate but also in a higher model performance degradation. The quantization stepsize `delta` is derived from `qp` and `qp_density` as follows: mul = (1 << qp_density ) + ( qp & ( (1 << qp_density) -1) ) shift = qp >> qp_density delta = mul 2.0 ^{shift - qp_density}* A `qp` value of 0 corresponds to a quantization stepsize equal to 1. Assuming `qp_density` is equal to 2 (3, 4 ...), decreasing the `qp` value by 4 (8, 16, ...) means halving the quantization stepsize. Accordingly, increasing the `qp` value by 4 (8, 16, ...) means doubling the quantization stepsize. (See also: `qp_density`) Important: This `qp` value only applies to weight parameters (e.g. convolutional layers or fully connected layers). Usually, all tensors which have more than one dimension are interpreted as weight parameters.
`qp_density`	`compress_model` `compress`	Optional, Type: int32, Default: 2. Controls the mapping between quantization parameters `qp` and the quantization stepsizes. The higher the value of `qp_density`, the closer are adjacent quantization stepsizes achievable by the `qp`. E.g. for a `qp_density` equal to 2, the `qp` values equal to -4 , -6 and -8 coresspond to quantization stepsizes equal to 0.5, 0.375 and 0.25. For a `qp_density` equal to 3, `qp` values of -4, -6 and -8 correspond to quantization stepsizes equal to 0.75, 0.625 and 0.5. (See also: `qp`)
`nonweight_qp`	`compress_model` `compress`	Optional, Type: int32, Default: -75. Non-weight quantization parameter (`nonweight_qp`) that controls the quantization stepsize and thus the rate-performance trade-off for all non-weight parameters. Works exactly like the (regular) `qp` parameter. Generally, tensors related to non-weight parameters are more sensitive to quantization, so a much finer quantization needs to be applied. Important: This `nonweight_qp` value only applies to non-weight parameters (e.g. batch-norm layers or biases). Usually, all tensors which have only one dimension are interpreted as non-weight parameters.
`qp_per_tensor`	`compress_model` `compress`	Optional, Type: dict, Default: None. A dict that can be used to specify a `qp` (or `nonweight_qp`) per tensor. The keys are strings that must match exactly the tensor names in the `parameter_dict`. The values are integers specifiying the qp value to be applied and works exactly the same as `qp` and `nonweight_qp`, respectively. Important: For each tensor that is not specified in the dict, the value of `qp` or `nonweight_qp` (depending on the tensor type) is applied, respectively.
`use_dq`	`compress_model` `compress`	Optional, Type: boolean, Default: True. Enable dependent scalar quantization (DQ), also known as Trellis-coded Quantization (TCQ). DQ is a vector quantization method, which usually achieves lower model performance degradation at lower or equal bitrates. It employs a procedure to switch between two scalar quantizers each having distinct reconstruction values, depending on the quantization stepsize controlled by the `qp`. If `use_dq==False` a single scalar, uniform quantizer is applied. Important: In order to achieve a quantization performance (in terms of a similar quanitzation error) more or less comparable to uniform quantization with qp_uni and qp_density₀ the qp for dependent quantization qp_dq shall be set to qp_dq = qp_uni - (1 << qp_density₀). E.g. for the default settings with `qp_density=2` that means decreasing the qp for DQ by 4, or in other words uniform quantization with qp_uni=-34 (`use_dq==False`) is equivalent to dependent scalar quantization with qp_dq=-38 (`use_dq==True`). Please note, that this is only a general guideline. In order to achieve the same quantization error further adjustments of the qp and qp_density may be required!
`codebook_mode`	`compress_model` `compress`	Optional, Type: int32, Default: 0. The codebook mode denotes whether an integer codebook is derived for transmission of the quantized values of a tensor or not. Using a codebook does not change the quantization result but the way the quantized values are transmitted. Whenever a codebook is used, the quantized values are substituted by indices each denoting an entry in the codebook, which holds all unique values in the quantized tensor. There are three modes specified, denoted by the repective value for `codebook_mode`: 0: No codebook is used. The values are encoded as output by the uniform/DQ quanzization stage. 1: Force codebook. All tensors to be transmitted use a a codebook for the encoding process. DQ is disabled in this case. 2: Choose best. The encoder selects the mode which produces the lowest bitrate for each tensor. Note: This method produces the lowest bitrate, but may be time consuming, since it tests both variants for all tensors. Information: Tensors containing many unique values may result in big codebooks and slow encoding.
`scan_order`	`compress_model` `compress`	Optional, Type: int32, Default: 0. Specifies the scan order of the tensors for the quantization and encoding process. Internally, all tensors to be encoded are interpreted either as 1D vectors or 2D matrices, which means that e.g. a 4D tensor is transformed to a 2D matrix (dim0 x (dim1 * dim2 * dim3)). Five different scan orders are specified for 2D matrices: 0: Row-first scan (scanning matrix row-by-row) 1: 8x8 block scan (scanning the 8x8 blocks block row by block row) 2: 16x16 block scan (scanning the 16x16 blocks block row by block row) 3: 32x32 block scan (scanning the 32x32 blocks block row by block row) 4: 64x64 block scan (scanning the 64x64 blocks block row by block row) Note: For all block scan orders (`scan_order > 0`) a suitable decoder can decode each block row independently, which also enables parallel decoding of block rows. Currently, the provided decoder does not provide this feature, however this feature may be added at later stage.
`lambda_scale`	`compress_model` `compress`	Optional, Type: float32, Default: 0.0. A scaling factor which is applied to the lagrangian multiplier lambda in the rate-distortion (RD) cost function D + lambda R, which is used for the quantisation and encoding decisions. More specifically, `lambda_scale` denotes whether and to which degree the bitrate is considered for computing the costs during the quantization of weight parameters. Hence, setting `lambda_scale` to zero means that the bitrate R , measured in bits, is not taken into account and only the distortion D, measured as mean-squared-error (MSE), is considered in the cost function. Note*: It is recommended to set `lambda_scale` to 0.0. The results for values larger than 0.0 are not stable and might cause significant drops in the model performance!
`param_opt`	`compress_model` `compress`	Optional, Type: boolean, Default: True. Enables parameter optimization for DeepCABAC entropy coding. If enabled, the encoder optimizes parameters for the DeepCABAC probability estimation scheme, which control the adaptation rate of the probability estimators to the source statistics. These parameters are then written to the bitstream to make them available at the decoder. This procedure usually yields lower bitrates with a small overhead in encoding time.
`cabac_unary_length_minus1`	`compress_model` `compress`	Optional, Type: int32, Default: 10. A parameter that controls the length of the unary part in the binarization scheme of quantized neural network parameter values for the (DeepCABAC) entropy encoding process. Changing the values only affects the bitrate, however the effect is expected to be minor.
`opt_qp`	`compress_model` `compress`	Optional, Type: boolean, Default: False. Enables a QP optimization scheme, that is based on the tensor statistics. Note: May require adjustments to the parameter `qp`, because it could cause significant model performance degradation, otherwise. Furthermore, the results of this scheme may not be stable for all models.
`ioq`	`compress_model` `compress`	Optional, Type: boolean, Default: False. Enables Inference-optimized quantization (IOQ), an optimization scheme that tests different `qp` values for each tensor, also considering the model performance change. This method runs a whole quantization, encoding and evaluation step for each tensor and QP that is tested on an validation dataset. In most cases this procedure yields a siginificant improvement of that rate-performacnce trade-off. Important: Depending on the use case this may significantly increase in the encoding runtime (in some cases several orders). Furthermore, it requires a `model_executer` (of type ModelExecuter) which can run the inference on a dataset. More specifically, it requires the function `eval_model` to be implemented. For details refer to the sections Class Definitions and Advanced Features.
`bnf`	`compress_model` `compress`	Optional, Type: boolean, Default: False. Enables Batch-norm Folding (BNF), which reduces the number of batch-norm parameter vectors from 4 to 2, if batch-norm parameters are present. For this, the software needs to identify the batch-norm parameters and which layer they belong to. This can be specified using `block_id_and_param_type` (see parameter description below). For further details refer to the Advanced Features-Section. Important: Batch-norm folding requires the tensors to be shaped, such that the first dimensions corresponds to the number of output channels, which is usually the case or PyTorch Models but not for TensorFlow Models. For changing the order of the dimensions e.g. use tensorflow.transpose
`lsa`	`compress_model` `compress`	Optional, Type: boolean, Default: False. Enables local scaling adaptation (LSA), which adds a scaling vector to each weight tensor. The length of the vector is equal to the number of output channels of the weigth tensor. When enabled the encoder tunes the values of the scaling vector such that it partly compensates the quantization error introduced by quantizing the weight tensor. Requires a `model_executer` (of type ModelExecuter) which implements the function `tune_model` with the funcitonalty to tune lsa parameters.
`fine_tune`	`compress_model` `compress`	Optional, Type: boolean, Default: False. Enables fine tuning (FT), which fine tunes all non-weight tensors. When enabled the encoder fine tunes the values of the non-weight tensors such that it partly compensates the quantization error introduced by quantizing the weight tensor. Requires a `model_executer` (of type ModelExecuter) which implements the function `tune_model` with the funcitonalty to fine tune non-weight parameters.
`block_id_and_param_type`	`compress_model` `compress`	Optional, Type: dict, Default: None. A dict specifying the block id and parameter type for each tensor. The dict shall contain two keys of type string `'block_identifier'` and `'parameter_type'`. The values shall also be dicts, with the keys of type string specifying the tensor names (exactly as in the `parameter_dict`) and the values of type string specifying the related `'block_identifier'` and `'parameter_type'`, repectively. The parameter type strings can be any of: `'weight'` `'weight.ls'` `'bias'` `'bn.beta'` `'bn.gamma'` `'bn.mean'` `'bn.var'` `'unspecified'` (special type, see notes below) Important: A single block can contain parameters of each type only once (except for `'unspecified'`)! All tensors that belong to the same block shall have the same `'block_identifier'`. These identifiers can be arbitrary strings which shall be unique for different blocks. Tensors with the same `'block_identifier'` are encoded as a block structure in a single unit. Whenever a parameter is denoted as `'unspecified'`, it is ignored for the block structure and transmitted seperately in a single unit. This specifier can be used, whenever the parameter type of a tensor is unknown or does not fit any of the other parameter types. For further details on `block_id_and_param_type` and the meaning of the parameter types refer to the Advanced Features-Section. Note: If `compress_model` is called with batch-norm folding enabled (`bnf=True`) and `block_id_and_param_type=None`, the function tries to guess the the block identifiers and parameter types, which works at least for some models from the PyTorch and TensorFlow model zoo and probably for most PyTorch/TensorFlow models, which fulfill certain conditions (see Advanced Features-Section). By now, this feature is only available for PyTorch and TensorFlow!
`model`	`compress`	Optional, Type: NNRModel, Default: None. Instance of Class `NNRModel`, which provides model related information (e.g. parameter types, block identifiers, tensor dimensions, etc) required for the compression process, and functions for handling of the model. There are three types specified, a (generic) base class and two classes inherited from the base class for PyTorch and TensorFlow. `NNRModel`: Base Model-Class `PytorchModel( NNRModel )`: Model class for PyTorch Models `TensorflowModel( NNRModel )`: Model class for TensorFlow Models If not specified, an instance of the base model class `NNRModel` will be created, internally. Whenever `PyTorchModel` or `TensorflowModel` is used, an identifier is written to the bitstream such that the decoder can derive the related model framework. Note: Whenever the function `compress_model` detects a PyTorch or TensorFlow model, it internally creates the respective model type and provides it to the (internal) `compress` function call! Also see Class Definitions.
`model_name`	`compress_model`	Note: Obsolete from version 0.3.0. NNCodec tries to determine the name internally. Optional, Type: string, Default: None. Name of the model to be processed. Only required for TensorFlow models if a model_executer shall be created internally. For using data-based methods on ImageNet TensorFlow models need some preprocessing. Right now the following model names are supported: [ `'DenseNet121'`, `'DenseNet121'`, `'DenseNet201'`, `'EfficientNetB0'`, `'EfficientNetB1'`, `'EfficientNetB2'`, `'EfficientNetB3'`, `'EfficientNetB4'`, `'EfficientNetB5'`, `'EfficientNetB6'`, `'EfficientNetB7'`, `'InceptionResNetV2'`, `'InceptionV3'`, `'MobileNet'`, `'MobileNetV2'`, `'NASNetLarge'`, `'NASNetMobile'`, `'ResNet50'`, `'ResNet101'`, `'ResNet152'`, `'ResNet50V2'`, `'ResNet101V2'`, `'ResNet152V2'`, `'VGG16'`, `'VGG19'`, `'Xception'`]
`model_executer`	`compress_model` `compress`	Optional, Type: ModelExecuter, Default: None. A model_executer that can run the model, e.g. inference or training on a dataset. Must be an instance of ModelExecuter. If `dataset_path` and `model_struct` are provided, an instance of ModelExecuter for ImageNet-based models will be created within `compress_model`. However, the NNCodec software allows to also use user-customised `model_executer`s (e.g. for different datasets) as long as they are inherited from the ModelExecuter-Class and implement its interface. For details refer to Class Definitions and Advanced Features
`model_struct`	`compress_model`	Optional, Type: [torch,nn.Module, tensorflow.Module], Default: None. The model file that contains the computational graph, which is required to run the model. For PyTorch this shall be an instance of `torch.nn.Module`. For TensorFlow this shall be an instance of `tensorflow.Module`. This `model_struct` requires to fit the stucture of the model parameters stored at `model_path` or in `parameter_dict`, repectively. Or in other words, `model_struct` must be able to load the parameters as specified in the file at `model_path` or in `parameter_dict`. For further details on the usage, check out the Examples-Section.
`dataset_path`	`compress_model`	Optional, Type: String, Default: None. Specifies the path to the ImageNet-dataset for training and evaluation. In order to perform full training or fine tuning of the model the folder shall contain a subfolder "train", which contains the training set. If fine tuning is applied it uses a random subset if the training set. For testing the model by inference there shall be subfolder "test" which contains the test set. A third set shall be in the subfolder "val", which contains a validation set. Usually the validation set is different from the test set. The validation set is, e.g., used for validating the performance of the model during fine tuning or inference-optimized quantization by inference on a (reduced) dataset.
`learning_rate`	`compress_model`	Optional, Type: float32, Default: 1e-5. Learning rate that is applied for fine tuning and local scaling adaptation (LSA) on ImageNet.
`batch_size`	`compress_model`	Optional, Type: int32, Default: 64. Batch size that is applied for fine tuning and local scaling adaptation (LSA) on ImageNet.
`epochs`	`compress_model`	Optional, Type: int32, Default: 64. Number of epochs that the model is trained during fine tuning and local scaling adaptation (LSA) on ImageNet.
`max_batches`	`compress_model`	Optional, Type: int32, Default: 600. Maximum number of batches the model is trained on during fine tuning and local scaling adaptation (LSA) on ImageNet.
`num_workers`	`compress_model`	Optional, Type: int32, Default: 8. Number of (parallel) workers that are used for the dataloaders for training and inference in order to speed up the process.
`return_model_data`	`compress_model`	Optional, Type: boolean, Default: False. The flag determines whether the return value `block_id_and_param_type` is present (`return_model_data==True`) or not (`return_model_data==False`).
`verbose`	`compress_model` `compress`	Optional, Type: boolean, Default: True. Activate verbosive output.
`return_bitstream`	`compress_model` `compress`	Optional, Type: boolean, Default: False. The flag determines whether the return value `bitstream` is present (`return_bitstream==True`) or not (`return_bitstream==False`).

Return Values

Important: The return values depend on the configuration of the flags return_bitstream and return_model_data as follows:

for compress_model:

return_bitstream==False, return_model_data == False: No return values
return_bitstream==True , return_model_data == False: a single value bitstream
return_bitstream==False, return_model_data == True : a single value block_id_and_param_type
return_bitstream==True , return_model_data == True : a 2-tuple (bitstream, block_id_and_param_type)

for compress:

return_bitstream==False: No return values
return_bitstream==True : a single value bitstream

Return Value	Function	Description
`bitstream`	`compress_model` `compress`	Condition: return_bitstream==True, Type: bytearray. The compressed bitstream as a bytearray. Only present if `return_bitstream` is equal to True.
`block_id_and_param_type`	`compress_model`	Condition: return_model_data==True , Type: dict. A dict specifying the block id and parameter type for each tensor (also see description of compress parameter `block_id_and_param_type`). Only present if `return_model_data` is equal to True. Note: This dict can be used to provide it to the decoder, e.g. in order to reconstruct folded batch-norm parameters. The returned value is either equal to `block_id_and_param_type` guessed in compress_model or specified as input parameter of compress_model.

Decoder

def decompress_model( bitstream_or_path, 
                      model_path=None, 
                      block_id_and_param_type=None, 
                      model_struct=None,
                      model_executer=None,
                      model_name=None, 
                      dataset_path=None,  
                      batch_size=64,  
                      num_workers=8,
                      reconstruct_bnf=True,
                      reconstruct_lsa=True,
                      test_model=False,
                      return_model_information=False,
                      return_decompressed_model=False,
                      verbose=True
                    ):

def decompress( bitstream_or_path, 
                block_id_and_param_type=None, 
                return_model_information=False,
                verbose=True,
                reconstruct_lsa=True, 
                reconstruct_bnf=True
              ):

Required input to both functions is only a path specifying the location of the bitstream to be decompressed. decompress returns the model parameter state dict and the topology storage format, if return_tpl_storage_format is True. The topology storage format denotes the related model framework, if specified. If the framework can be detected the parameter state dict is in the respective format, such that it is compatible with the framework (e.g. PyTorch, TensorFlow).

Parameters

Parameter	Function	Description
`bitstream_or_path`	`decompress_model` `decompress`	Required, Type: [string, bytearray], Default: None. Specifies either the path to the bitstream file to be decompressed (usually ends with ".nnc") as a string or the bitstream as bytearray.
`model_path`	`decompress_model`	Optional, Type: string, Default: None. Specifies the path where the reconstructed model file is stored after the decompression process. If a model related to a known framework (e.g. PyTorch, TensorFlow) is detected, the state dict will be stored in the respective format, such that it is compatible with the given framework. If no model path is specified, it will be set to `"./rec.[pt, tf, mdl]"` by default depending on the detected model format.
`block_id_and_param_type`	`decompress_model` `decompress`	Optional, Type: dict, Default: None. A dict specifying the block id and parameter type for each tensor. The dict shall contain two keys of type string `'block_identifier'` and `'parameter_type'`. The values shall also be dicts, with the keys of type string specifying the tensor names (exactly as in the `parameter_dict`) and the values of type string specifying the related `'block_identifier'` and `'parameter_type'`, repectively. The parameter type strings can be any of: `'weight'` `'weight.ls'` `'bias'` `'bn.beta'` `'bn.gamma'` `'bn.mean'` `'bn.var'` `'unspecified'` (special type, see notes below) Important: A single block can contain parameters of each type only once (except for `'unspecified'`)! All tensors that belong to the same block shall have the same `'block_identifier'`. These identifiers can be arbitrary strings which shall be unique for different blocks. The parameter type `'unspecified'` can be used, whenever the parameter type of a tensor is unknown or does not fit any of the other parameter types. Note: This structure is not required for den decoding process, but provides information on the original structure of the model e.g. in order to reconstruct folded batch-norm parameters. For further details on `block_id_and_param_type` and the meaning of the parameter types refer to the Advanced Features-Section.
`model_struct`	`decompress_model`	Optional, Type: [torch.nn.Module, tensorflow.Module], Default: None. The model file that contains the computational graph, which is required to run the model. For PyTorch this must be an instance of `torch.nn.Module`. For TensorFlow this must be an instance of `tensorflow.Module`. This `model_struct` requires to fit the stucture of the model parameters stored at `model_path` or in `parameter_dict`, repectively. Or in other words, `model_struct` must be able to load the parameters as specified in the file at `model_path` or in `parameter_dict`. Information: If model_struct is provided, a copy of model_struct equipped with the decompressed parameters can be returned by decompress_model for further use. Currently, this feature is only available for PyTorch and TensorFlow. For further details on the usage, check out the Examples-Section.
`model_executer`	`deompress_model`	Optional, Type: ModelExecuter, Default: None. A model_executer that can run the model, e.g. inference or training on a dataset. Must be an instance of ModelExecuter. If `dataset_path` and `model_struct` are provided, an instance of ModelExecuter for ImageNet-based models will be created within `decompress_model`. However, the NNCodec software allows to also use user-customised `model_executer`s (e.g. for different datasets) as long as they are inherited from the ModelExecuter-Class and implement its interface. For details refer to Class Definitions and Advanced Features
`model_name`	`decompress_model`	Note: Obsolete from version 0.3.0. NNCodec tries to determine the name internally. Optional, Type: string, Default: None. Name of the model to be processed. Only required for TensorFlow models if an model_executer shall be created internally. For using data-based methods on ImageNet TensorFlow models need some preprocessing. Right now the following model names are supported: [ `'DenseNet121'`, `'DenseNet121'`, `'DenseNet201'`, `'EfficientNetB0'`, `'EfficientNetB1'`, `'EfficientNetB2'`, `'EfficientNetB3'`, `'EfficientNetB4'`, `'EfficientNetB5'`, `'EfficientNetB6'`, `'EfficientNetB7'`, `'InceptionResNetV2'`, `'InceptionV3'`, `'MobileNet'`, `'MobileNetV2'`, `'NASNetLarge'`, `'NASNetMobile'`, `'ResNet50'`, `'ResNet101'`, `'ResNet152'`, `'ResNet50V2'`, `'ResNet101V2'`, `'ResNet152V2'`, `'VGG16'`, `'VGG19'`, `'Xception'`]
`dataset_path`	`decompress_model`	Optional, Type: string, Default: None. Specifies the path to the ImageNet-dataset for training and evaluation. In order to perform full training or fine tuning of the model the folder shall contain a subfolder "train", which contains the training set. If fine tuning is applied it uses a random subset if the training set. For testing the model by inference there shall be subfolder "test" which contains the test set. A third set shall be in the subfolder "val", which contains a validation set. Usually the validation set is different from the test set. The validation set is, e.g., used for validating the performance of the model during fine tuning or inference-optimized quantization by inference on a (reduced) dataset.
`batch_size`	`decompress_model`	Optional, Type: int32, Default: 64. Batch size that is applied during inference on the testset for 'test_model'.
`num_workers`	`decompress_model`	Optional, Type: int32, Default: 8. Number of (parallel) workers that are used for the dataloaders for inference in order to speed up the process.
`reconstruct_bnf`	`decompress_model` `decompress`	Optional, Type: boolean, Default: True. Reconstruct (unfold) batch-norm parameters if possible. Requires block_id_and_param_type to be present.
`reconstruct_lsa`	`decompress_model` `decompress`	Optional, Type: boolean, Default: True. Apply (multiply) the LSA parameters if possible.
`test_model`	`decompress_model`	Optional, Type: boolean, Default: False. Run inference on a dataset. Requires a model_executer, which implements the function 'test_model'.
`return_model_information`	`decompress_model` `decompress`	Optional, Type: boolean, Default: False. The flag determines whether the return value model_information is present (return_model_information==True) or not (return_model_information==False).
`return_decompressed_model`	`decompress_model`	Optional, Type: boolean, Default: False. The flag determines whether the return value decompressed_model is present (return_decompressed_model==True) or not (return_decompressed_model==False).
`verbose`	`decompress_model` `decompress`	Optional, Type: boolean, Default: True. Activate verbosive output.

Return Values

Important: The return values depend on the configuration of the flags return_model_information and return_decompressed_model as follows:

for decompress_model:

return_decompressed_model==False, return_model_information == False: No return values
return_decompressed_model==True , return_model_information == False: a single value decompressed_model
return_decompressed_model==False, return_model_information == True : a single value model_information
return_decompressed_model==True , return_model_information == True : a 2-tuple (decompressed_model, model_information)

for decompress:

return_model_information==False: a single value rec_parameters
return_model_information==True : a 2-tuple (rec_parameters, model_information)

Return Value	Function	Description
`rec_parameters`	`decompress`	Condition: None, Type: dict. The reconstructed parameter state dict, containing the tensor names as keys and the parameter values as numpy array of type numpy.float32 or numpy.int32
`model_information`	`decompress_model` `decompress`	Condition: return_model_information==True, Type: dict. A dict that contains model related information, e.g. the topology storage format or pruning-, sparsification- decomposition- and unification performance maps (see [1]). Only present if `return_model_information` is equal to True. `model_information["topology_storage_format"]`: Denotes the storage format of the parameter dict (topology storage format). The following values are specified: 0: NNR_TPL_UNREC - unrecognized format 3: NNR_TPL_PT - PyTorch format 4: NNR_TPL_TEF - TensorFlow format Note: There are more topoplogy storage formats specified by the standard, which are not yet implemented. `model_information["performance_map_flags"][`performance_flag`]`: A dict specifying the value of the performance flag denoted by performance_flag per tensor. The keys are the tensor names as strings and the values are integers which denote the value of the flag (either 0 or 1). The following performance_flags are available: `'mps_sparsification_flag'` `'mps_pruning_flag'` `'mps_unification_flag'` `'mps_decomposition_performance_map_flag'` `'lps_sparsification_flag'` `'lps_pruning_flag'` `'lps_unification_flag'` `'lps_decomposition_performance_map_flag'` `model_information["performance_maps"]["mps"][`performance_map`]`: A dict specifying the values of the model parameter set (MPS) related performance map denoted by performance_map. The keys are the the names of the performance map syntax elements as strings and the values are the values of the respective syntax elements as decoded from the bitstream. The specifier performance_map can be any of: `'sparsification_performance_map'` (only present if `mps_sparsification_flag==1`) `'pruning_performance_map'` (only present if `mps_pruning_flag==1`) `'unification_performance_map'` (only present if `mps_unification_flag==1`) `'decomposition_performance_map'` (only present if `mps_decomposition_performance_map_flag==1`) `model_information["performance_maps"]["lps"][`performance_map`]`: A dict specifying the values of the layer parameter set (LPS) related performance map denoted by performance_map. The keys are the the names of the performance map syntax elements as strings and the values are the values of the respective syntax elements as decoded from the bitstream. The specifier performance_map can be any of: `'sparsification_performance_map'` (only present if `lps_sparsification_flag==1`) `'pruning_performance_map'`(only present if `lps_pruning_flag==1`) `'unification_performance_map'` (only present if `lps_unification_flag==1`) `'decomposition_performance_map'` (only present if `lps_decomposition_performance_map_flag==1`)
`decompressed_model`	`decompress_model`	Condition: model_struct, return_decompressed_model==True, Type: [torch.nn.Module, tensorflow.Module]. A model_struct equipped with the decompressed parameters. Requires a model_struct which can load the parameters. For PyTorch is shall be of type toch.nn.Module and for TensorFlow it shall be of type tensorflow.Module. If no suitable model_struct is provided the returned value is `None`. Currently, this feature is only available for PyTorch and TensorFlow. Only present if `return_model_information` is equal to True. For further details on the usage check out the Examples-Section.

PyTorch and TensorFlow Support

The NNCodec software has a built-in support for PyTorch and Tensorflow Models. Usually, related models can be detected and handled automatically. An identifier is written to the bitstream that enables the decoder to detect the related framework and output the model in the respective format.

Furthermore, the software provides data-based methods for models based on ImageNet. These methods include inference-optimised quantization, local scaling adaptation, fine tuning and testing the model by inference on the validation set at the decoder. For this, a structure of type torch.nn.Module or tensorflow.Module and the path to the ImageNet-Dataset must be provided. Check out the Examples-Section for further details on how to use this software with PyTorch and TensorFlow models.

Advanced Features

Coming soon! Soon, we will provide further details on how to use the advanced features of NNCodec! Meanwhile, you will find details on the usage of adavanced features in the Examples-Section.

Class Definitions

Model Executer

The class ModelExecute provides an interface, which needs to be implemented in order to enable data-based functionalities like fine tuning, local scaling adaptation or inference-based optimization. This class is part of the module nnc_core (see below):

nnc_core.nnr_model.ModelExecute(ABC)

All inherited classes shall implement the fowlloing interface:

class ModelExecute(ABC):
    def eval_model(self,
                   parameters,
                   verbose=False,
                   ):

    def test_model(self,
                   parameters,
                   verbose=False,
                   ):
    
    def tune_model(self,
                   parameters,
                   param_types,
                   lsa_flag,
                   ft_flag,
                   verbose=False,
                   ):

    @abstractmethod
    def has_eval(self):
        return False
    
    @abstractmethod
    def has_test(self):
        return False
    
    @abstractmethod
    def has_tune_ft(self):
        return False
    
    @abstractmethod
    def has_tune_lsa(self):
        return False

Functions and Parameters

Important: All function parameters defined by the interface are mandatory, other function parameters shall be optional.

def eval_model(self,
               parameters,
               verbose=False,
              ):

The function eval_model evaluates the model performance by inference on a evaluation dataset. Usually, the evaluation dataset is a reduced data set (e.g. a subset of the training set) and different from the validation test set. The function shall return a tuple of values, where the first value of the tuple is a scalar that denotes or is related to the model performance (e.g. Top1-Accuracy). A bigger value means better performance and a smaller value means lower performance. This value is evaluated for tools like inference-optimized quantization. The returned tuple shall at least contain this value. Other values are optional.

Parameter	Description
`parameters`	Required, Type: Dict, Default: -. Specifies a python dict which represents the parameter state dictionary. The keys are strings which denote the names of the parameter tensors and the values representing the tensors as numpy arrays (ndarrays). The numpy arrays must be of type numpy.float32 or numpy.int32
`verbose`	Optional, Type: Boolean, Default: False. Activate verbosive output. Shows a progress bar when activated.

def test_model(self,
               parameters,
               verbose=False,
              ):

The function test_model evaluates the model performance by inference on a validation test set, which is usally different from the evaluation dataset used by eval_model. The function shall return a tuple of values, which denote or are related to the model performance (e.g. Top1-Accuracy, Top5-Accuracy, etc.). The returned tuple shall at least contain one value. Other values are optional. The output of this function is not required for any functionality and is thus informative. It can be used, e.g., to evaluate the effect of the quantization on the performance, after decompression.

Parameter	Description
`parameters`	Required, Type: Dict, Default: -. Specifies a python dict which represents the parameter state dictionary. The keys are strings which denote the names of the parameter tensors and the values representing the tensors as numpy arrays (ndarrays). The numpy arrays must be of type numpy.float32 or numpy.int32
`verbose`	Optional, Type: Boolean, Default: False. Activate verbosive output. Shows a progress bar when activated.

def tune_model(self,
               parameters,
               param_types,
               lsa_flag,
               ft_flag,
               verbose=False,
              ):

The function tune_model tunes (trains) the non-weight parameters (fine tuning of e.g. biases, batch-norm parameters, etc.) and/or the local scaling parameters (local scaling adaptation) on a parameter tuning set, which unsually is a subset of the training set. The function shall return a tuple with at least two values. The first value is a dict which contains the parameter dict with the local scaling parameters, only. The keys are strings with the name of the tensor (usually the name of the related weight tensor with "_scaling" attached to it) and the values are the tensors as numpy arrays of type numpy.int32 or numpy.float32. Whenever lsa is disabled (lsa_flag==False) or no lsa parameters are present the returned parameter dictionary shall be empty. The second value of the returned tuple is a dict which contrains the parameter dict with the non-weight parameters, only. The keys are strings with the name of the tensor and the values are the tensors as numpy arrays of type numpy.int32 or numpy.float32. Whenever fune tuning is disabled (ft_flag==False) or no non-weight parameters are present the returned parameter dictionary shall be empty. Adiitional values in the returned tuple are optional.

Parameter	Description
`parameters`	Required, Type: Dict, Default: -. Specifies a python dict which represents the parameter state dictionary (including the lsa parameters, if lsa is enabled). The keys are strings which denote the names of the parameter tensors and the values representing the tensors as numpy arrays (ndarrays). The numpy arrays must be of type numpy.float32 or numpy.int32
`param_types`	Required, Type: Dict, Default: -. A python dict which specifies the parameter types for each tensor in the parameter dict. The keys are strings which denote the name of the tensor and shall match exactly the names in the parameter dictionary. The values are strings which specifiy the parameter type and can be any of the following: `'weight'` - weigths `'weight.ls'` - local scaling parameters `'bias'`- biases (non-weights) `'bn.beta'` - batch-norm parameter (non-weights) `'bn.gamma'` - batch-norm parameter (non-weights) `'bn.mean'` - batch-norm parameter (non-weights) `'bn.var'` - batch-norm parameter (non-weights) `'unspecified'` - others or not specified (treated as non-weights)
`lsa_flag`	Required, Type: Boolean, Default: -. Enable tuning of local scaling parameters, if present.
`ft_flag`	Required, Type: Boolean, Default: -. Enable fine tuning of non-weight parameters, if present.
`verbose`	Optional, Type: Boolean, Default: False. Enable verbosive output. Shows a progress bar when activated and additional information about the training process.

    @abstractmethod
    def has_eval(self):
        return False

The function has_eval denotes whether eval_model is implemented or not, and thus whether the functionality for evaluation on the evaluation dataset is available or not. If eval_modelis implemented the return value shall be 'True', otherwise it shall be 'False'.

    @abstractmethod
    def has_test(self):
        return False

The function has_test denotes whether test_model is implemented or not, and thus whether the functionality for inference on the validation test set is available or not. If test_modelis implemented the return value shall be 'True', otherwise it shall be 'False'.

    @abstractmethod
    def has_tune_ft(self):
        return False

The function has_tune_ft denotes whether tune_model is implemented for fine tuning of non-weight parameters or not, and thus whether the functionality for fine tuning of non-weight parameters is available or not. If tune_model implements fine tuning the return value shall be 'True', otherwise it shall be 'False'.

    @abstractmethod
    def has_tune_lsa(self):
        return False

The function has_tune_lsa denotes whether tune_model is implemented for fine tuning of local scaling parameters or not, and thus whether the functionality for local scaling adaptation is available or not. If tune_model implements local scaling adaptation the return value shall be 'True', otherwise it shall be 'False'.

Examples

This section provides several examples on how to use the software and specific features.

Basic Features

Compressing a model loaded from a file

The model file is stored at 'example/squeezenet1_1_pytorch_zoo.pt' (Squeezenet originally downloaded from the torchvision model zoo). The compressed bitstream is written to 'example/bitstream_squeezenet1_1.nnc'. After decompressing the model the reconstructed model is stored at 'example/reconstructed_squeezenet1_1.pt'.

import nnc

nnc.compress_model('./example/squeezenet1_1_pytorch_zoo.pt', bitstream_path='./example/bitstream_squeezenet1_1.nnc')
nnc.decompress_model('./example/bitstream_squeezenet1_1.nnc', model_path='./example/reconstructed_squeezenet1_1.pt' )

Here is an analogous example for a Tensorflow-Model, stored at 'example/densenet_121_tensorflow_zoo.h5' (DenseNet121 dowloaded from the keras model zoo).

import nnc

nnc.compress_model('example/densenet_121_tensorflow_zoo.h5', bitstream_path='./example/bitstream_densenet_121.nnc')
nnc.decompress_model('./example/bitstream_densenet_121.nnc', model_path='./example/reconstructed_densenet_121.pt' )

Compressing a model from a model object

Pytorch:

import nnc
import torchvision

model = torchvision.models.squeezenet1_1(pretrained=True)

nnc.compress_model( model, bitstream_path='./example/bitstream_squeezenet1_1.nnc')
nnc.decompress_model('./example/bitstream_squeezenet1_1.nnc', model_path='./example/reconstructed_squeezenet1_1.pt' )

Tensorflow:

import nnc
from tensorflow import keras

model = keras.applications.DenseNet121()

nnc.compress_model( model, bitstream_path='./example/bitstream_densenet_121.nnc')
nnc.decompress_model('./example/bitstream_densenet_121.nnc', model_path='./example/reconstructed_densenet_121.pt' )

Changing the quantization parameter (QP)

By default the quantization parameter (QP) is set to -38. Increasing the QP yields a lower bitrate, but usually a higer perfromance degradation. Decreasing the QO yields a higher bitrate, but usually a lower performance degradation.

import nnc
import torchvision

model = torchvision.models.mobilenet_v2(pretrained=True)

nnc.compress_model( model, bitstream_path='./example/bitstream_mobilenet_v2_qp-38.nnc', qp=-38)
nnc.decompress_model('./example/bitstream_mobilenet_v2_qp-38.nnc', model_path='./example/reconstructed_mobilenet_v2_qp-38.pt')

With the default QP value -38, MobileNetV2 achieves a compression ratio of 20.1% (compressed bitstream size 2.845741 MB) and a Top-1 accuracy of 71.622% on ImageNet.

import nnc
import torchvision

model = torchvision.models.mobilenet_v2(pretrained=True)

nnc.compress_model( model, bitstream_path='./example/bitstream_mobilenet_v2_qp.nnc', qp=-34)
nnc.decompress_model('./example/bitstream_mobilenet_v2_qp-34.nnc', model_path='./example/reconstructed_mobilenet_v2_qp-34.pt')

Increaing the qp value to -34 achieves a compression ratio of 16.93% (compressed bitstream size 2.395967 MB) and a Top-1 accuracy of 71.306% on ImageNet.

import nnc
import torchvision

model = torchvision.models.mobilenet_v2(pretrained=True)

nnc.compress_model( model, bitstream_path='./example/bitstream_mobilenet_v2_qp.nnc', qp=-30)
nnc.decompress_model('./example/bitstream_mobilenet_v2_qp-30.nnc', model_path='./example/reconstructed_mobilenet_v2_qp-30.pt')

Increaing the qp value to -30 achieves a compression ratio of 13.86% (compressed bitstream size 1.962087 MB) and a Top-1 accuracy of 69.432% on ImageNet.

Dependent scalar quatization (DQ) and uniform quantization

By default a vector quatization scheme called dependent scalar quatization (DQ) is applied, which usually achieves lower bitrates at a certain performance. However, there might be cases where DQ is not suitable and does not achieve a good performance. Hence DQ can be deacivated and a (simple) uniform quantizer is used instead. In order to achieve a similar performance the QP must be adjusted. For details refer to the parameter 'use_dq' at Functions and Parameters.

The following examples show how to use model compression with (enabled by default) and without DQ, respectively:

import nnc
import torchvision

model = torchvision.models.resnet50(pretrained=True)

nnc.compress_model( model, bitstream_path='./example/bitstream_resnet50_qp-38.nnc', qp=-38)
nnc.decompress_model('./example/bitstream_resnet50_qp-38.nnc', model_path='./example/reconstructed_resnet50_qp-38.pt')

With the default QP value -38 and DQ enabled, ResNet50 achieves a compression ratio of 13.84% (compressed bitstream size 14.173388 MB) and a Top-1 accuracy of 75.96% on ImageNet.

import nnc
import torchvision

model = torchvision.models.resnet50(pretrained=True)

nnc.compress_model( model, bitstream_path='./example/bitstream_resnet50_noDQ_qp-38.nnc', qp=-35, use_dq=False)
nnc.decompress_model('./example/bitstream_resnet50_noDQ_qp-38.nnc', model_path='./example/reconstructed_resnet50_noDQ_qp-38.pt')

With DQ disabled, ResNet50 achieves a compression ratio of 14.48% (compressed bitstream size 14.832298 MB) and a Top-1 accuracy of 75.952% on ImageNet. Note: The qp values has been increase to -35 in order to achieve a comparable bitstream size and quantization error.

Similar results can be obtained using ResNet50 from keras:

import tensorflow
from tensorflow import keras

model = keras.applications.ResNet50()

nnc.compress_model( model, bitstream_path='./example/bitstream_resnet50_keras_qp-38.nnc', qp=-38)
nnc.decompress_model('./example/bitstream_resnet50_keras_qp-38.nnc', model_path='./example/reconstructed_resnet50_keras_qp-38.pt')

Here, with the default QP value -38 and DQ enabled, ResNet50 achieves a compression ratio of 14.13% (compressed bitstream size 14.488576 MB) and a Top-1 accuracy of 74.814% on ImageNet.

import tensorflow
from tensorflow import keras

model = keras.applications.ResNet50()

nnc.compress_model( model, bitstream_path='./example/bitstream_resnet50_keras_noDQ_qp-38.nnc', qp=-35, use_dq=False)
nnc.decompress_model('./example/bitstream_resnet50_keras_noDQ_qp-38.nnc', model_path='./example/reconstructed_resnet50_keras_noDQ_qp-38.pt')

With DQ disabled, ResNet50 achieves a compression ratio of 14.79% (compressed bitstream size 15.169963 MB) and a Top-1 accuracy of 74.806% on ImageNet.

Batch-norm folding (BNF)

Batch-norm folding requires the tensors to be shaped such that the first dimension specifies the number of output channels. This is usually the case for PyTorch models but not for TensorFlow. Consequently, the presented examples refer to PyTorch. However, BNF can be applied to correctly shaped TensorFlow models in the same fashion.

Example for MobileNetV2:

import nnc
import torchvision

model = torchvision.models.mobilenet_v2(pretrained=True)

block_id_and_param_type = nnc.compress_model( model, bitstream_path="./example/mobilenet_v2_pytorch_bnf.nnc", qp=-38, bnf=True, return_model_data=True )
nnc.decompress_model( "./example/mobilenet_v2_pytorch_bnf.nnc", model_path="./example/rec_mobilenet_v2_pytorch_bnf.pt", block_id_and_param_type=block_id_and_param_type)

MobileNetV2 with batch-norm folding enabled and the default QP values et to -38 achieves a compression ratio of 19.84% (compressed bitstream size 2.809033 MB) and a Top-1 accuracy of 71.604% on ImageNet.

Example for ResNet50:

import nnc
import torchvision

model = torchvision.models.resnet50(pretrained=True)

block_id_and_param_type = nnc.compress_model( model, bitstream_path="./example/resnet50_pytorch_bnf.nnc", qp=-38, bnf=True, return_model_data=True )
nnc.decompress_model( "./example/resnet50_pytorch_bnf.nnc", model_path="./example/rec_resnet50_pytorch_bnf.pt", block_id_and_param_type=block_id_and_param_type )

ResNet50 with batch-norm folding enabled and the default QP values et to -38 achieves a compression ratio of 13.76% (compressed bitstream size 14.098468 MB) and a Top-1 accuracy of 75.962% on ImageNet.

In both cases the software internally derives block_id_and_param_type, which can then be provided to the decoder.

Testing the model by inference on ImageNet

import nnc
import torchvision

dataset_path = "/path/to/ImageNet"

model = torchvision.models.resnet50(pretrained=True)

nnc.compress_model( model, bitstream_path='./example/bitstream_resnet50_test_qp-38.nnc', qp=-38)
nnc.decompress_model('./example/bitstream_resnet50_test_qp-38.nnc', model_path='./example/reconstructed_resnet50_test_qp-38.pt', dataset_path=dataset_path, model_struct=model, test_model=True)

import tensorflow
from tensorflow import keras

dataset_path = "/path/to/ImageNet"

model = keras.applications.ResNet50()

nnc.compress_model( model, bitstream_path='./example/bitstream_resnet50_keras_test_qp-38.nnc', qp=-38)
nnc.decompress_model('./example/bitstream_resnet50_keras_test_qp-38.nnc', model_path='./example/reconstructed_resnet50_keras_test_qp-38.pt', dataset_path=dataset_path, model_struct=model, test_model=True, model_name="ResNet50")

Local scaling adaptation (LSA)

Currently, local scaling adaptation is only implemented for PyTorch models on ImageNet. Hence, the following results refer to PyTorch.

import nnc
import torchvision

dataset_path = "/path/to/ImageNet"

model = torchvision.models.resnet50(pretrained=True)

nnc.compress_model( model, bitstream_path='./example/bitstream_resnet50_lsa_qp-38.nnc', qp=-38, lsa=True, model_struct=model, dataset_path=dataset_path)
nnc.decompress_model('./example/bitstream_resnet50_lsa_qp-38.nnc', model_path='./example/reconstructed_resnet50_lsa_qp-38.pt')

import nnc
import torchvision

dataset_path = "/path/to/ImageNet"

model = torchvision.models.mobilenet_v2(pretrained=True)

nnc.compress_model( model, bitstream_path='./example/bitstream_mobilenet_v2_lsa_qp-38.nnc', qp=-38, lsa=True, model_struct=model, dataset_path=dataset_path)
nnc.decompress_model('./example/bitstream_mobilenet_v2_lsa_qp-38.nnc', model_path='./example/reconstructed_mobilenet_v2_lsa_qp-38.pt')

Fine-tuning (FT)

Currently, fine tuning is only implemented for PyTorch models on ImageNet. Hence, the following results refer to PyTorch.

import nnc
import torchvision

dataset_path = "/path/to/ImageNet"

model = torchvision.models.resnet50(pretrained=True)

nnc.compress_model( model, bitstream_path='./example/bitstream_resnet50_ft_qp-38.nnc', qp=-38, fine_tune=True, model_struct=model, dataset_path=dataset_path)
nnc.decompress_model('./example/bitstream_resnet50_ft_qp-38.nnc', model_path='./example/reconstructed_resnet50_ft_qp-38.pt')

import nnc
import torchvision

dataset_path = "/path/to/ImageNet"

model = torchvision.models.mobilenet_v2(pretrained=True)

nnc.compress_model( model, bitstream_path='./example/bitstream_mobilenet_v2_ft_qp-38.nnc', qp=-38, fine_tune=True, model_struct=model, dataset_path=dataset_path)
nnc.decompress_model('./example/bitstream_mobilenet_v2_ft_qp-38.nnc', model_path='./example/reconstructed_mobilenet_v2_ft_qp-38.pt')

Inference-optimized quantization (IOQ)

import nnc
import torchvision

dataset_path = "/path/to/ImageNet"

model = torchvision.models.mobilenet_v2(pretrained=True)

nnc.compress_model( model, bitstream_path='./example/bitstream_mobilenet_v2_ioq_qp-38.nnc', qp=-38, ioq=True, model_struct=model, dataset_path=dataset_path)
nnc.decompress_model('./example/bitstream_mobilenet_v2_ioq_qp-38.nnc', model_path='./example/reconstructed_mobilenet_v2_ioq_qp-38.pt')

import tensorflow
from tensorflow import keras

dataset_path = "/path/to/ImageNet"

model = keras.applications.ResNet50()

nnc.compress_model( model, bitstream_path='./example/bitstream_resnet50_keras_ioq_qp-38.nnc', qp=-38, ioq=True, dataset_path=dataset_path, model_struct=model, model_name="ResNet50")
nnc.decompress_model('./example/bitstream_resnet50_keras_ioq_qp-38.nnc', model_path='./example/reconstructed_resnet50_keras_ioq_qp-38.pt')

Home

Installation Guide

Usage

Compression Performance

Changelog

License

References

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Usage

Overview

Quickstart

A first example

PyTorch and TensorFlow Models

Key-Parameter: Quantization parameter `qp`

Functions and Parameters

Encoder

Parameters

Return Values

Decoder

Parameters

Return Values

PyTorch and TensorFlow Support

Advanced Features

Class Definitions

Model Executer

Functions and Parameters

Examples

Basic Features

Compressing a model loaded from a file

Compressing a model from a model object

Pytorch:

Tensorflow:

Changing the quantization parameter (QP)

Dependent scalar quatization (DQ) and uniform quantization

Batch-norm folding (BNF)

Testing the model by inference on ImageNet

Local scaling adaptation (LSA)

Fine-tuning (FT)

Inference-optimized quantization (IOQ)

Clone this wiki locally

Usage

Overview

Quickstart

A first example

PyTorch and TensorFlow Models

Key-Parameter: Quantization parameter qp

Functions and Parameters

Encoder

Parameters

Return Values

Decoder

Parameters

Return Values

PyTorch and TensorFlow Support

Advanced Features

Class Definitions

Model Executer

Functions and Parameters

Examples

Basic Features

Compressing a model loaded from a file

Compressing a model from a model object

Pytorch:

Tensorflow:

Changing the quantization parameter (QP)

Dependent scalar quatization (DQ) and uniform quantization

Batch-norm folding (BNF)

Testing the model by inference on ImageNet

Local scaling adaptation (LSA)

Fine-tuning (FT)

Inference-optimized quantization (IOQ)

Clone this wiki locally

Key-Parameter: Quantization parameter `qp`