Skip to content
This repository has been archived by the owner on Jan 15, 2024. It is now read-only.

[TVM] Add TVM Support #1390

Merged
merged 72 commits into from
Oct 24, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
72 commits
Select commit Hold shift + click to select a range
c35bf54
Update test_models.py
sxjscience Oct 19, 2020
0f16b3c
Update lazy_imports.py
sxjscience Oct 19, 2020
c6cfa1c
Update test_models.py
sxjscience Oct 19, 2020
958971b
Update test_models.py
sxjscience Oct 19, 2020
3b1e095
Update unittests.yml
sxjscience Oct 19, 2020
1e3b6c3
Update unittests.yml
sxjscience Oct 19, 2020
0928ee3
Update unittests.yml
sxjscience Oct 19, 2020
74b0cee
Update test_models.py
sxjscience Oct 19, 2020
d76ddaa
Update test_models.py
sxjscience Oct 19, 2020
cafbc45
Update test_models.py
sxjscience Oct 19, 2020
3ebaf9f
Update test_models.py
sxjscience Oct 19, 2020
f4d4bd4
use the new API
sxjscience Oct 19, 2020
b2ee011
update dockerfile
sxjscience Oct 19, 2020
c10957b
update
sxjscience Oct 19, 2020
4f7285e
move recommended flags to misc
sxjscience Oct 19, 2020
b418a83
Update ubuntu18.04-gpu.Dockerfile
sxjscience Oct 19, 2020
096b6b8
Update ubuntu18.04-gpu.Dockerfile
sxjscience Oct 19, 2020
a64d234
Update ubuntu18.04-gpu.Dockerfile
sxjscience Oct 19, 2020
e5305e8
Update ubuntu18.04-gpu.Dockerfile
sxjscience Oct 19, 2020
23fdc42
Update ubuntu18.04-gpu.Dockerfile
sxjscience Oct 19, 2020
28c7032
Update ubuntu18.04-gpu.Dockerfile
sxjscience Oct 19, 2020
44833e8
Update install_ubuntu18.04_core.sh
sxjscience Oct 19, 2020
b495794
Update install_ubuntu18.04_core.sh
sxjscience Oct 19, 2020
70e6090
Update install_ubuntu18.04_core.sh
sxjscience Oct 19, 2020
98fd846
Update install_ubuntu18.04_core.sh
sxjscience Oct 19, 2020
43b26c6
Update install_ubuntu18.04_core.sh
sxjscience Oct 19, 2020
03b7ca7
update
sxjscience Oct 19, 2020
9c6ee2d
Update ubuntu18.04-gpu.Dockerfile
sxjscience Oct 19, 2020
288b208
Update install_tvm_cpu.sh
sxjscience Oct 19, 2020
2f51234
use C4
sxjscience Oct 19, 2020
c44f6e7
try to fix the doc
sxjscience Oct 19, 2020
202d87c
fix
sxjscience Oct 20, 2020
eb7a362
try to fix doc
sxjscience Oct 20, 2020
8a0d561
Update test_models.py
sxjscience Oct 20, 2020
4109cd9
Update test_models.py
sxjscience Oct 20, 2020
ee1fabf
Merge remote-tracking branch 'upstream/master' into add_tvm
sxjscience Oct 20, 2020
1a45b59
Update test_models.py
sxjscience Oct 20, 2020
76d4851
Update test_models.py
sxjscience Oct 20, 2020
683a032
Update test_models.py
sxjscience Oct 20, 2020
ade7089
update
sxjscience Oct 20, 2020
d0cfc64
try to add tvm to profiler
sxjscience Oct 20, 2020
668fee3
add tvm for inference benchmarking
sxjscience Oct 20, 2020
8ace01f
Update README.md
sxjscience Oct 20, 2020
0da3696
Update README.md
sxjscience Oct 20, 2020
4f6d7f2
Update benchmark_gluonnlp.py
sxjscience Oct 20, 2020
0cdb33d
Update benchmark_utils.py
sxjscience Oct 20, 2020
0ddb1a8
Update misc.py
sxjscience Oct 20, 2020
a870066
Update README.md
sxjscience Oct 20, 2020
30b9a7b
Update install_tvm_gpu.sh
sxjscience Oct 21, 2020
aa4674b
Merge remote-tracking branch 'upstream/master' into add_tvm
sxjscience Oct 22, 2020
899f1b1
update
sxjscience Oct 22, 2020
6c61030
Update install_tvm_gpu.sh
sxjscience Oct 22, 2020
60452b2
Update install_tvm_gpu.sh
sxjscience Oct 22, 2020
bb55741
Update ubuntu18.04-gpu.Dockerfile
sxjscience Oct 22, 2020
f43347a
update
sxjscience Oct 22, 2020
4312864
Update gluon_nlp_job.sh
sxjscience Oct 23, 2020
8908137
Update gluon_nlp_job.sh
sxjscience Oct 23, 2020
d294a6f
update
sxjscience Oct 23, 2020
38502dd
Create run_batch_backbone_profiling.sh
sxjscience Oct 23, 2020
cc0122c
fix to 20201016
sxjscience Oct 23, 2020
2573b20
Delete run_batch_backbone_profiling.sh
sxjscience Oct 23, 2020
71f6b8e
remove
sxjscience Oct 23, 2020
73742fb
Update test_models.py
sxjscience Oct 24, 2020
dd1e499
Merge remote-tracking branch 'upstream/master' into add_tvm
sxjscience Oct 24, 2020
0c01e46
remove all hybridize
sxjscience Oct 24, 2020
4947342
Update test_models_roberta.py
sxjscience Oct 24, 2020
1bedbfa
try to disable the check of roberta base
sxjscience Oct 24, 2020
b8d2518
test hybridize
sxjscience Oct 24, 2020
30b561c
Update test_models.py
sxjscience Oct 24, 2020
baa83d0
Update test_models.py
sxjscience Oct 24, 2020
a99cda6
Update test_models.py
sxjscience Oct 24, 2020
1e1b05d
Update test_models.py
sxjscience Oct 24, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 30 additions & 2 deletions .github/workflows/unittests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,11 +26,24 @@ jobs:
- name: Checkout repository
uses: actions/checkout@v2

# Install OS specific dependencies
- name: Compilation cache
uses: actions/cache@v2
with:
path: ~/.ccache
# We include the commit sha in the cache key, as new cache entries are
# only created if there is no existing entry for the key yet.
key: ${{ runner.os }}-ccache-${{ github.sha }}
# Restore any ccache cache entry, if none for
# ${{ runner.os }}-ccache-${{ github.sha }} exists
restore-keys: |
${{ runner.os }}-ccache
szha marked this conversation as resolved.
Show resolved Hide resolved

# Install Linux specific dependencies
- name: Install Linux dependencies
if: matrix.os == 'ubuntu-latest'
# TODO https://github.com/apache/incubator-mxnet/issues/18293
run: sudo apt-get install libopenblas-dev
run: |
sudo apt-get install -y libopenblas-dev ninja-build libedit-dev libxml2-dev

- name: Setup python
uses: actions/setup-python@v2
Expand All @@ -44,6 +57,21 @@ jobs:
python -m pip install --upgrade cython
python -m pip install --pre "mxnet>=2.0.0b20200802" -f https://dist.mxnet.io/python
python -m pip install -U -e .[extras]
- name: Build and Install TVM
if: matrix.os == 'ubuntu-latest'
run: |
git clone https://github.com/apache/incubator-tvm tvm --recursive
cd tvm
mkdir -p build
cp cmake/config.cmake build
echo set\(USE_LLVM ON\) >> build/config.cmake
echo set\(USE_GRAPH_RUNTIME ON\) >> build/config.cmake
echo set\(USE_BLAS openblas\) >> build/config.cmake
szha marked this conversation as resolved.
Show resolved Hide resolved
cd build
cmake .. -G Ninja
ninja
cd ../python
python -m pip install -U -e .
- name: Run Unittests
run: |
python -m pytest --cov=./ --cov-report=xml --device="cpu" --durations=50 tests/
Expand Down
12 changes: 3 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,16 +16,10 @@ process the text data, and train models.

# Features

For NLP Practitioners
- Easy-to-use Text Processing Tools
- Automatically Train Models via AutoNLP (TODO)

For Researchers
- Easy-to-use Text Processing Tools and APIs
- Pretrained Model Zoo
- Write Models with Numpy-like API

For Engineers
- Fast Inference via [TVM](https://tvm.apache.org/) (TODO)
- Fast Inference via [TVM](https://tvm.apache.org/)
- AWS Integration via [SageMaker](https://aws.amazon.com/sagemaker/)


Expand Down Expand Up @@ -98,7 +92,7 @@ docker run --gpus all --rm -it -p 8888:8888 -p 8787:8787 -p 8786:8786 --shm-size

# CPU Instance
docker pull gluonai/gluon-nlp:cpu-latest
docker run --gpus all --rm -it -p 8888:8888 -p 8787:8787 -p 8786:8786 --shm-size=2g gluonai/gluon-nlp:cpu-latest
docker run --rm -it -p 8888:8888 -p 8787:8787 -p 8786:8786 --shm-size=2g gluonai/gluon-nlp:cpu-latest
```

For more details, you can refer to the guidance in [tools/docker](tools/docker).
20 changes: 17 additions & 3 deletions scripts/benchmarks/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,21 @@ It will generate csv files with `gluonnlp_` as the prefix
├── gluonnlp_train_fp32_NT_NT.csv
├── gluonnlp_train_fp32_NT_TN.csv
├── gluonnlp_train_fp32_TN_TN.csv
├── gluonnlp_infer_fp32_NT_NT.csv
├── gluonnlp_infer_fp32_NT_TN.csv
├── gluonnlp_infer_fp32_TN_TN.csv
├── gluonnlp_infer_fp32_NT_NT_tvm0.csv
├── gluonnlp_infer_fp32_NT_TN_tvm0.csv
├── gluonnlp_infer_fp32_TN_TN_tvm0.csv
```

## GluonNLP + TVM for Inference

Install TVM as described in https://tvm.apache.org/docs/install/index.html

```bash
bash benchmark_gluonnlp_tvm.sh
```

```
├── gluonnlp_infer_fp32_NT_NT_tvm1.csv
├── gluonnlp_infer_fp32_NT_TN_tvm1.csv
├── gluonnlp_infer_fp32_TN_TN_tvm1.csv
```
18 changes: 14 additions & 4 deletions scripts/benchmarks/benchmark_gluonnlp.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,12 +54,17 @@ def get_parser():
help='The layout of the computation')
parser.add_argument('--compute_layout', type=str, default=None,
help='The compute layout of the computation')
parser.add_argument('--use_tvm', action='store_true',
help='Whether to use TVM for inference/training')
parser.add_argument('--instance_type', choices=['c4', 'c5', 'g4', 'p3'], default='g4',
help='The instance type that the profiling script will be run on.')
parser.add_argument('--mode', type=str, default='train',
choices=['train', 'inference'])
return parser


def run_benchmark(workload, model_name, out_file_name, is_train):
def run_benchmark(workload, model_name, out_file_name, is_train,
use_tvm, instance_type):
if is_train:
benchmark = GluonNLPBackboneBenchmark(
workloads=workload,
Expand All @@ -75,6 +80,8 @@ def run_benchmark(workload, model_name, out_file_name, is_train):
model_names=model_name,
profile_inference=True,
profile_train=False,
use_tvm=use_tvm,
instance_type=instance_type,
to_csv=True,
inference_out_csv_file=out_file_name)
benchmark.run()
Expand All @@ -93,7 +100,7 @@ def run_benchmark(workload, model_name, out_file_name, is_train):
else:
profile_models = [ele for ele in MODELS]
if args.mode == 'inference':
out_dir = 'infer_fp32_{}_{}'.format(layout, compute_layout)
out_dir = 'infer_fp32_{}_{}_tvm{}'.format(layout, compute_layout, int(args.use_tvm))
df = pd.DataFrame(columns=['model', 'batch_size', 'sequence_length',
'latency', 'memory'])
os.makedirs(out_dir, exist_ok=True)
Expand All @@ -103,12 +110,15 @@ def run_benchmark(workload, model_name, out_file_name, is_train):
workload[1]))
process = Process(
target=run_benchmark,
args=(workload, model_name, out_path, False))
args=(workload, model_name, out_path, False,
args.use_tvm, args.instance_type))
process.start()
process.join()
new_df = pd.read_csv(out_path)
df = df.append(new_df, ignore_index=True)
df.to_csv('gluonnlp_infer_fp32_{}_{}.csv'.format(layout, compute_layout))
df.to_csv('gluonnlp_infer_fp32_{}_{}_tvm{}.csv'.format(layout,
compute_layout,
int(args.use_tvm)))
elif args.mode == 'train':
out_dir = 'train_fp32_{}_{}'.format(layout, compute_layout)
df = pd.DataFrame(columns=['model', 'batch_size', 'sequence_length',
Expand Down
3 changes: 3 additions & 0 deletions scripts/benchmarks/benchmark_gluonnlp_tvm.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
python3 benchmark_gluonnlp.py --layout NT --compute_layout NT --mode inference --use_tvm --instance_type g4
python3 benchmark_gluonnlp.py --layout NT --compute_layout TN --mode inference --use_tvm --instance_type g4
python3 benchmark_gluonnlp.py --layout TN --compute_layout TN --mode inference --use_tvm --instance_type g4
153 changes: 146 additions & 7 deletions scripts/benchmarks/benchmark_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,8 @@
import numpy as np
import gluonnlp
from gluonnlp.models import get_backbone
from gluonnlp.utils.misc import logging_config
from gluonnlp.utils.misc import logging_config, get_ec2_tvm_flags
from gluonnlp.utils.lazy_imports import try_import_tvm
from collections import defaultdict, namedtuple
from datetime import datetime
import multiprocessing as mp
Expand Down Expand Up @@ -603,22 +604,127 @@ def bytes_to_mega_bytes(memory_amount: int) -> int:
return memory_amount >> 20


_TVM_RT_CACHE = dict()


def compile_tvm_graph_runtime(model, model_name, layout, compute_layout,
batch_size, seq_length, dtype, instance_type):
key = (model_name, layout, compute_layout, batch_size, seq_length, dtype, instance_type)
if key in _TVM_RT_CACHE:
return _TVM_RT_CACHE[key]
flags = get_ec2_tvm_flags()[instance_type]
tvm = try_import_tvm()
from tvm import relay
from tvm.contrib import graph_runtime
token_ids_shape = (batch_size, seq_length) if layout == 'NT' else (seq_length, batch_size)
valid_length_shape = (batch_size,)
if 'bart' in model_name:
shape_dict = {
'data0': token_ids_shape,
'data1': valid_length_shape,
'data2': token_ids_shape,
'data3': valid_length_shape,
}
dtype_dict = {
'data0': 'int32',
'data1': 'int32',
'data2': 'int32',
'data3': 'int32',
}
elif 'roberta' in model_name or 'xlmr' in model_name:
shape_dict = {
'data0': token_ids_shape,
'data1': valid_length_shape,
}
dtype_dict = {
'data0': 'int32',
'data1': 'int32',
}
else:
shape_dict = {
'data0': token_ids_shape,
'data1': token_ids_shape,
'data2': valid_length_shape,
}
dtype_dict = {
'data0': 'int32',
'data1': 'int32',
'data2': 'int32'
}
sym = model._cached_graph[1]
params = {}
for k, v in model.collect_params().items():
params[v._var_name] = tvm.nd.array(v.data().asnumpy())
mod, params = relay.frontend.from_mxnet(sym, shape=shape_dict, dtype=dtype_dict, arg_params=params)
target = flags['target']
use_gpu = flags['use_gpu']
opt_level = flags['opt_level']
required_pass = flags['required_pass']
with tvm.transform.PassContext(opt_level=opt_level, required_pass=required_pass):
lib = relay.build(mod, target, params=params)
if use_gpu:
ctx = tvm.gpu()
else:
ctx = tvm.cpu()
rt = graph_runtime.GraphModule(lib["default"](ctx))
_TVM_RT_CACHE[key] = rt
return rt


class GluonNLPBackboneBenchmark:
"""
Benchmarks is a simple but feature-complete benchmarking script
"""Benchmarks is a simple but feature-complete benchmarking script
to compare memory and time performance of models in Transformers.
"""
def __init__(self, workloads, model_names, use_fp16=False,
repeat=3, use_gpu=True, device_idx=0,
repeat=3, use_gpu=True,
device_idx=0,
profile_inference=True,
profile_train=True,
env_print=True,
to_csv=False,
use_tvm=False,
instance_type=None,
layout='NT',
compute_layout='auto',
inference_out_csv_file='inference_time_memory.csv',
train_out_csv_file='train_time_memory.csv',
env_info_file='env_info.csv'):
"""

Parameters
----------
workloads
List of workloads to profile
model_names
List of model names to profile
use_fp16
Whether to use fp16
repeat
The number of repeat
use_gpu
Whether to use GPU
device_idx
The GPU ID
profile_inference
Whether to profile inference
profile_train
Whether to profile training
env_print
Whether to print the environment
to_csv
Whether to dump to csv file
use_tvm
Whether to use TVM to accelerate the
instance_type
Type of the instance. This will only be used to set the
layout
The input + output layout
compute_layout
The computation layout
inference_out_csv_file
train_out_csv_file
env_info_file
"""
self._workloads = workloads
if not isinstance(workloads, list):
workloads = [workloads]
Expand All @@ -635,6 +741,8 @@ def __init__(self, workloads, model_names, use_fp16=False,
self._profile_train = profile_train
self._env_print = env_print
self._to_csv = to_csv
self._use_tvm = use_tvm
self._instance_type = instance_type
self._layout = layout
self._compute_layout = compute_layout
self._inference_out_csv_file = inference_out_csv_file
Expand Down Expand Up @@ -699,9 +807,40 @@ def run_forward():
else:
out.wait_to_read()

timeit.repeat(run_forward, repeat=1, number=3)
runtimes = timeit.repeat(run_forward, repeat=self._repeat, number=3)
mxnet.npx.waitall()
if self._use_tvm:
tvm = try_import_tvm()
run_forward()
if self._use_gpu:
ctx = tvm.gpu()
else:
ctx = tvm.cpu()
rt = compile_tvm_graph_runtime(model=model, model_name=model_name,
layout=self._layout, compute_layout=self._compute_layout,
batch_size=batch_size, seq_length=sequence_length,
instance_type=self._instance_type,
dtype='float32' if not self._use_fp16 else 'float16')
tvm_input_ids = tvm.nd.array(input_ids.asnumpy(), ctx=ctx)
tvm_token_types = tvm.nd.array(token_types.asnumpy(), ctx=ctx)
tvm_valid_length = tvm.nd.array(valid_length.asnumpy(), ctx=ctx)

def run_tvm_forward():
if 'roberta' in model_name or 'xlmr' in model_name:
rt.set_input(data0=tvm_input_ids, data1=tvm_valid_length)
elif 'bart' in model_name:
rt.set_input(data0=tvm_input_ids, data1=tvm_valid_length)
else:
rt.set_input(data0=tvm_input_ids, data1=tvm_token_types,
data2=tvm_valid_length)
rt.run()
for i in range(rt.get_num_outputs()):
out = rt.get_output(i)
# Warmup
timeit.repeat(run_tvm_forward, repeat=1, number=2)
runtimes = timeit.repeat(run_tvm_forward, repeat=self._repeat, number=3)
else:
timeit.repeat(run_forward, repeat=1, number=3)
runtimes = timeit.repeat(run_forward, repeat=self._repeat, number=3)
mxnet.npx.waitall()
# Profile memory
if self._use_gpu:
nvml.nvmlInit()
Expand Down
3 changes: 0 additions & 3 deletions src/gluonnlp/models/albert.py
Original file line number Diff line number Diff line change
Expand Up @@ -328,7 +328,6 @@ def __init__(self,
dtype=dtype,
layout=self._compute_layout
)
self.encoder.hybridize()
# Construct word embedding
self.word_embed = nn.Embedding(input_dim=vocab_size,
output_dim=embed_size,
Expand Down Expand Up @@ -578,7 +577,6 @@ def __init__(self, backbone_cfg,
flatten=False,
bias_initializer=bias_initializer))
self.mlm_decoder[-1].weight = self.backbone_model.word_embed.weight
self.mlm_decoder.hybridize()

@property
def layout(self):
Expand Down Expand Up @@ -674,7 +672,6 @@ def __init__(self, backbone_cfg,
flatten=False,
bias_initializer=bias_initializer))
self.mlm_decoder[-1].weight = self.backbone_model.word_embed.weight
self.mlm_decoder.hybridize()

@property
def layout(self):
Expand Down
Loading