Skip to content

AWD-LSTM from "Regularizing and Optimizing LSTM Language Models" with training-award quantization support for tensorflow.

Notifications You must be signed in to change notification settings


Repository files navigation

AWD-LSTM (Weight Drop LSTM) with training-award quantization in Tensorflow

AWD-LSTM from ("Regularizing and Optimizing LSTM Language Models") for tensorflow.

Training-award quantization for integer-arithmetic-only inference ("Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference") is also provided.

AWD-LSTM (Weight Drop LSTM)


This code is implemmented and tested with tensorflow 1.11.0. and 1.13.0.


  1. Simply initial AWD-LSTM, it's a standard LayerRNNCell.
from weight_drop_lstm import WeightDropLSTMCell

lstm_cell = WeightDropLSTMCell(
    num_units=CELL_NUM, weight_drop_kr=WEIGHT_DP_KR, 
    use_vd=True, input_size=INPUT_SIZE)

Arguments are define as follows:

num_units: the number of cell in LSTM layer. [ints]
weight_drop_kr: the number of steps that fast weights go forward. [int]
use_vd: If true, using variational dropout on weight drop-connect, standard dropout otherwise. [bool]
input_size: If use_vd=True, input_size (dimension of last channel) should be provided. [int]

The remaining keyword arguments is exactly the same as tf.nn.LSTMCell.

Noted that, if the weight_drop_kr is not provided or provided with 1.0, WeightDropLSTMCell is reducted as LSTMCell.

  1. Insert update operation of dropout kernel to the place you want.
# By simply in each training step

# Or use control_dependencies
vd_update_ops = lstm_cell.get_vd_update_op() 
with tf.control_dependencies(vd_update_ops):

You can also add get_vd_update_op() to GraphKeys.UPDATE_OPS when calling WeightDropLSTMCell.

Noted that, if you use control_dependencies, please be careful for the order of execution.
The variational dropout kernel should not be update before the optimizer step.

Implementation Details

The main idea of AWD-LSTM is the drop-connect weights and concatinated inputs. The drop-connect of weight and concatinated inputs

If is_vd=True, variables will be used to saved the dropout kernel. The update operation for variational dropout

Experimental results

I have conduct experiments on a many-to-many recursive task this implementation and carry out a better results than simple LSTMCell.

Training-Award Quantization

In a nutshell

lstm_cell = WeightDropLSTMCell(
    num_units=CELL_NUM, weight_drop_kr=WEIGHT_DP_KR, 
    is_quant=True, is_train=True)
tf.contrib.quantize.create_training_graph(sess.graph, quant_delay=0)

Detail explanation will be updated soon.

Noted that: some issue of quantization will occure in tf.while with version higher than 1.12.0

Addiction: Variational Dropout

I also provided a tensorflow implementation of variational dropout, which is more flexible than DropoutWrapper in tensorflow.

The usage is similar to using WeightDropLSTMCell:

from variational_dropout import VariationalDropout

vd = VariationalDropout(input_shape=[5], keep_prob=0.5)

# Directly to update

# Or use control_dependencies
with tf.control_dependencies(vd.get_update_mask_op()):
    step, results_array = tf.while_loop(
        cond=lambda step, _: step < 5,
        loop_vars=(step, results_array))
    This is just a simple example. 
    Usually, control_dependencies will be placed where optimizer stepping.

You can also add get_update_mask_op() to GraphKeys.UPDATE_OPS when calling VariationalDropout.

Once again, if you use control_dependencies, please be careful for the order of execution.


  1. Provide the regulization utilities mentioned in the paper.
  2. Maybe there is some more elegant way to implement variational dropout.
  3. Pull out quantization delay.
  4. Provide interface for non-quantized model and quantized mode.
  5. Documentation for quantization training.

If you have any suggestion, please let me know. I'll be pretty grateful!

Contact & Copy Right

Code work by Jia-Yau Shiau
Quantization code work is advised and forked from Peter Huang


AWD-LSTM from "Regularizing and Optimizing LSTM Language Models" with training-award quantization support for tensorflow.







No releases published


No packages published
