Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[MXNET-651] MXNet Model Backwards Compatibility Checker #11626

Merged
merged 61 commits into from
Jul 31, 2018
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
Show all changes
61 commits
Select commit Hold shift + click to select a range
4ee8b21
Added MNIST-MLP-Module-API models to check model save and load_checkp…
piyushghai Jul 6, 2018
118850f
Added LENET with Conv2D operator training file
piyushghai Jul 6, 2018
27863fd
Added LENET with Conv2d operator inference file
piyushghai Jul 6, 2018
b3e9774
Added LanguageModelling with RNN training file
piyushghai Jul 7, 2018
c141701
Added LamguageModelling with RNN inference file
piyushghai Jul 7, 2018
35cbefb
Added hybridized LENET Gluon Model training file
piyushghai Jul 9, 2018
418f805
Added hybridized LENET gluon model inference file
piyushghai Jul 9, 2018
600efaf
Added license headers
piyushghai Jul 9, 2018
d73b9e2
Refactored the model and inference files and extracted out duplicate …
piyushghai Jul 9, 2018
3eeba08
Added runtime function for executing the MBCC files
piyushghai Jul 10, 2018
9c0157c
Added JenkinsFile for MBCC to be run as a nightly job
piyushghai Jul 10, 2018
3d43bcd
Added boto3 install for s3 uploads
piyushghai Jul 10, 2018
4b70e4a
Added README for MBCC
piyushghai Jul 10, 2018
08ad342
Added license header
piyushghai Jul 10, 2018
5d1c3fc
Added more common functions from lm_rnn_gluon_train and inference fil…
piyushghai Jul 10, 2018
cfe8dfc
Added scripts for training models on older versions of MXNet
piyushghai Jul 11, 2018
7c41488
Added check for preventing inference script from crashing in case no …
piyushghai Jul 11, 2018
50be5d8
Fixed indentation issue
piyushghai Jul 11, 2018
c3c9129
Replaced Penn Tree Bank Dataset with Sherlock Holmes Dataset
piyushghai Jul 11, 2018
3485352
Fixed indentation issue
piyushghai Jul 11, 2018
af9b86d
Removed training in models and added smaller models. Now we are simpl…
piyushghai Jul 12, 2018
79cfa46
Updated README
piyushghai Jul 12, 2018
4df779b
Fixed indentation error
piyushghai Jul 12, 2018
04465b0
Fixed indentation error
piyushghai Jul 12, 2018
2d5cf09
Removed code duplication in the training file
piyushghai Jul 13, 2018
7bfdf87
Added comments for runtime_functions script for training files
piyushghai Jul 16, 2018
c80ee31
Merged S3 Buckets for storing data and models into one
piyushghai Jul 16, 2018
e764d5a
Automated the process to fetch MXNet versions from git tags
piyushghai Jul 16, 2018
05ded05
Added defensive checks for the case where the data might not be found
piyushghai Jul 16, 2018
60c7be0
Fixed issue where we were performing inference on state model files
piyushghai Jul 16, 2018
9d4d099
Replaced print statements with logging ones
piyushghai Jul 18, 2018
d08ba5a
Merge branch 'master' into mbcc
piyushghai Jul 25, 2018
cebfb26
Removed boto install statements and move them into ubuntu_python docker
piyushghai Jul 25, 2018
f7a36eb
Separated training and uploading of models into separate files so tha…
piyushghai Jul 25, 2018
1f63941
Updated comments and README
piyushghai Jul 26, 2018
fbaf3e0
Fixed pylint warnings
piyushghai Jul 26, 2018
edd6816
Removed the venv for training process
piyushghai Jul 26, 2018
87103d4
Fixed indentation in the MBCC Jenkins file and also separated out tra…
piyushghai Jul 26, 2018
eb24e8e
Fixed indendation
piyushghai Jul 26, 2018
3525656
Fixed erroneous single quote
piyushghai Jul 26, 2018
25e7ec7
Added --user flag to check for Jenkins error
piyushghai Jul 26, 2018
00ee6e7
Removed unused methods
piyushghai Jul 26, 2018
a3a72b8
Added force flag in the pip command to install mxnet
piyushghai Jul 26, 2018
86e8882
Removed the force-re-install flag
piyushghai Jul 26, 2018
ddb672a
Changed exit 1 to exit 0
piyushghai Jul 26, 2018
9e77064
Added quotes around the shell command
piyushghai Jul 26, 2018
69843fb
added packlibs and unpack libs for MXNet builds
piyushghai Jul 26, 2018
fae44fe
Changed PythonPath from relative to absolute
piyushghai Jul 27, 2018
c099979
Created dedicated bucket with correct permission
marcoabreu Jul 30, 2018
ffcc637
Fix for python path in training
piyushghai Jul 30, 2018
7f7f6e3
Merge branch 'mbcc' of https://github.com/piyushghai/incubator-mxnet …
piyushghai Jul 30, 2018
33096c0
Changed bucket name to CI bucket
piyushghai Jul 30, 2018
8a085b5
Added set -ex to the upload shell script
piyushghai Jul 30, 2018
5207ab1
Now raising an exception if no models are found in the S3 bucket
piyushghai Jul 30, 2018
5e30f7a
Added regex to train models script
piyushghai Jul 30, 2018
e079d3c
Added check for performing inference only on models trained on same m…
piyushghai Jul 30, 2018
ceac705
Added set -ex flags to shell scripts
piyushghai Jul 30, 2018
16d320a
Added multi-version regex checks in training
piyushghai Jul 30, 2018
19495d6
Fixed typo in regex
piyushghai Jul 30, 2018
d8fa75d
Now we will train models for all the minor versions for a given major…
piyushghai Jul 30, 2018
ca01aa2
Added check for validating current_version
piyushghai Jul 30, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions ci/docker/runtime_functions.sh
Original file line number Diff line number Diff line change
Expand Up @@ -862,6 +862,19 @@ nightly_model_backwards_compat_test() {
./tests/nightly/model_backwards_compatibility_check/model_backward_compat_checker.sh
}

#Backfills S3 bucket with models trained on earlier versions of mxnet
nightly_model_backwards_compat_train() {
set -ex
export PYTHONPATH=./python/
VENV=mbcc_py2_venv
virtualenv -p `which python2` $VENV
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you run inside a virtualenv, you won't have access to any installed packages. I'd recommend not to use virtualenv here because you're running inside docker anyways. This is also the reason for the CI error that states it could not find boto3.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see. Fixed in edd6816

source $VENV/bin/activate
pip install boto3
piyushghai marked this conversation as resolved.
Show resolved Hide resolved
./tests/nightly/model_backwards_compatibility_check/train_mxnet_legacy_models.sh
#Deactivate the virtual env once we are done with it
deactivate
}

# Deploy

deploy_docs() {
Expand Down
3 changes: 2 additions & 1 deletion tests/nightly/model_backwards_compatibility_check/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ This is configuration file for jenkins job.
- These APIs are covered over models with architectures such as : MLP, RNNs, LeNet covering the four scenarios described above.
- More operators/models will be added in the future to extend the operator coverage.
- The model train files suffixed by `_train.py` and the trained models are hosted in AWS S3.
- The trained models for now are backfilled into S3 starting from every MXNet release version v1.0.0
- The trained models for now are backfilled into S3 starting from every MXNet release version v1.1.0.
- The script for training the models on older versions of MXNet is : `train_mxnet_legacy_models.sh`.
- The inference files are suffixed by `_inference.py`.

174 changes: 173 additions & 1 deletion tests/nightly/model_backwards_compatibility_check/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,10 @@
from mxnet.gluon.data.vision import transforms, datasets
from mxnet import autograd as ag
import mxnet.ndarray as F
from mxnet.gluon import nn
from mxnet.gluon import nn, rnn
import re
import time
import sys

# Set fixed random seeds.
mx.random.seed(7)
Expand Down Expand Up @@ -111,6 +114,15 @@ def save_inference_results(inference_results_file, inference_results):
with open(inference_results_file, 'w') as file:
json.dump(inference_results, file)


def compare_versions(version1, version2):
'''
https://stackoverflow.com/questions/1714027/version-number-comparison-in-python
'''
def normalize(v):
return [int(x) for x in re.sub(r'(\.0+)*$','', v).split(".")]
return cmp(normalize(version1), normalize(version2))

def get_val_test_iter():
data = prepare_mnist_data(mx.test_utils.get_mnist())
val = data['val']
Expand Down Expand Up @@ -165,3 +177,163 @@ def forward(self, x):
x = F.tanh(self.fc1(x))
x = F.tanh(self.fc2(x))
return x

class Dictionary(object):
def __init__(self):
self.word2idx = {}
self.idx2word = []

def add_word(self, word):
if word not in self.word2idx:
self.idx2word.append(word)
self.word2idx[word] = len(self.idx2word) - 1
return self.word2idx[word]

def __len__(self):
return len(self.idx2word)

class Corpus(object):
def __init__(self, path):
self.dictionary = Dictionary()
self.download_data_from_s3()
self.train = self.tokenize(path + 'train.txt')
self.valid = self.tokenize(path + 'valid.txt')
self.test = self.tokenize(path + 'test.txt')

def download_data_from_s3(self, ):
print ('Downloading files from bucket : ptb-small-dataset' )
bucket = s3.Bucket('ptb-small-dataset')
piyushghai marked this conversation as resolved.
Show resolved Hide resolved
files = ['test.txt', 'train.txt', 'valid.txt']
for file in files:
if os.path.exists(args_data + file) :
print ('File %s'%(args_data + file), 'already exists. Skipping download')
continue
file_path = args_data + file
bucket.download_file(file_path, args_data + file)

def tokenize(self, path):
"""Tokenizes a text file."""
assert os.path.exists(path)
# Add words to the dictionary
with open(path, 'r') as f:
tokens = 0
for line in f:
words = line.split() + ['<eos>']
tokens += len(words)
for word in words:
self.dictionary.add_word(word)

# Tokenize file content
with open(path, 'r') as f:
ids = np.zeros((tokens,), dtype='int32')
token = 0
for line in f:
words = line.split() + ['<eos>']
for word in words:
ids[token] = self.dictionary.word2idx[word]
token += 1

return mx.nd.array(ids, dtype='int32')



#### Common utilies for lm_rnn_gluon_train & inference files
args_data = 'ptb.'
args_model = 'rnn_relu'
args_emsize = 100
args_nhid = 100
args_nlayers = 2
args_lr = 1.0
args_clip = 0.2
args_epochs = 2
args_batch_size = 32
args_bptt = 5
args_dropout = 0.2
args_tied = True
args_cuda = 'store_true'
args_log_interval = 500

class RNNModel(gluon.Block):
"""A model with an encoder, recurrent layer, and a decoder."""

def __init__(self, mode, vocab_size, num_embed, num_hidden,
num_layers, dropout=0.5, tie_weights=False, **kwargs):
super(RNNModel, self).__init__(**kwargs)
with self.name_scope():
self.drop = nn.Dropout(dropout)
self.encoder = nn.Embedding(vocab_size, num_embed,
weight_initializer = mx.init.Uniform(0.1))
if mode == 'rnn_relu':
self.rnn = rnn.RNN(num_hidden, num_layers, activation='relu', dropout=dropout,
input_size=num_embed)
elif mode == 'rnn_tanh':
self.rnn = rnn.RNN(num_hidden, num_layers, dropout=dropout,
input_size=num_embed)
elif mode == 'lstm':
self.rnn = rnn.LSTM(num_hidden, num_layers, dropout=dropout,
input_size=num_embed)
elif mode == 'gru':
self.rnn = rnn.GRU(num_hidden, num_layers, dropout=dropout,
input_size=num_embed)
else:
raise ValueError("Invalid mode %s. Options are rnn_relu, "
"rnn_tanh, lstm, and gru"%mode)
if tie_weights:
self.decoder = nn.Dense(vocab_size, in_units = num_hidden,
params = self.encoder.params)
else:
self.decoder = nn.Dense(vocab_size, in_units = num_hidden)
self.num_hidden = num_hidden

def forward(self, inputs, hidden):
emb = self.drop(self.encoder(inputs))
output, hidden = self.rnn(emb, hidden)
output = self.drop(output)
decoded = self.decoder(output.reshape((-1, self.num_hidden)))
return decoded, hidden

def begin_state(self, *args, **kwargs):
return self.rnn.begin_state(*args, **kwargs)

def batchify(data, batch_size):
"""Reshape data into (num_example, batch_size)"""
nbatch = data.shape[0] // batch_size
data = data[:nbatch * batch_size]
data = data.reshape((batch_size, nbatch)).T
return data

def get_batch(source, i):
seq_len = min(args_bptt, source.shape[0] - 1 - i)
data = source[i : i + seq_len]
target = source[i + 1 : i + 1 + seq_len]
return data, target.reshape((-1,))

def detach(hidden):
if isinstance(hidden, (tuple, list)):
hidden = [i.detach() for i in hidden]
else:
hidden = hidden.detach()
return hidden

def eval(data_source, model):
total_L = 0.0
ntotal = 0
loss = gluon.loss.SoftmaxCrossEntropyLoss()
hidden = model.begin_state(func = mx.nd.zeros, batch_size = args_batch_size, ctx=mx.cpu(0))
for i in range(0, data_source.shape[0] - 1, args_bptt):
data, target = get_batch(data_source, i)
output, hidden = model(data, hidden)
L = loss(output, target)
total_L += mx.nd.sum(L).asscalar()
ntotal += L.size
return total_L / ntotal

def clean_ptb_data():
files = ['test.txt', 'train.txt', 'valid.txt']
for file in files:
if os.path.isfile(args_data + file):
os.remove(args_data + file)

# This function is added so that if a download gets interrupted in between, one can clean the corrupted files
clean_mnist_data()
clean_ptb_data()
Loading