fixup! fixup! fixup! Add TensorFlow examples - ResNet50 and BERT models

Signed-off-by: Satya <satyanaraya.illa@intel.com>
gramineproject · Jul 23, 2021 · b891ded · b891ded
1 parent b8c1bb8
commit b891ded
Show file tree

Hide file tree

Showing 3 changed files with 56 additions and 59 deletions.
diff --git a/Examples/tensorflow/BERT/python.manifest.template b/Examples/tensorflow/BERT/python.manifest.template
@@ -1,19 +1,17 @@
-# This manifest was tested on Ubuntu 18.04 with python3.6.
-
 libos.entrypoint = "{{ entrypoint }}"
 loader.preload = "file:{{ graphene.libos }}"
 
 # Graphene log level
 loader.log_level = "{{ log_level }}"
 
 # Read application arguments directly from the command line. Don't use this on production!
-loader.insecure__use_cmdline_argv = 1
+loader.insecure__use_cmdline_argv = true
 
 # Propagate environment variables from the host. Don't use this on production!
-loader.insecure__use_host_env = 1
+loader.insecure__use_host_env = true
 
 # Disable address space layout randomization. Don't use this on production!
-loader.insecure__disable_aslr = 1
+loader.insecure__disable_aslr = true
 
 # Update Library Path - overwrites environment variable
 loader.env.LD_LIBRARY_PATH = "{{ python.stdlib }}/lib:/lib:{{ arch_libdir }}:/usr/lib:/usr/{{ arch_libdir }}"
@@ -58,8 +56,8 @@ fs.mount.etc.uri = "file:/etc"
 # SGX general options
 sgx.enclave_size = "32G"
 sgx.thread_num = 256
-sgx.preheat_enclave = 1
-sgx.nonpie_binary = 1
+sgx.preheat_enclave = true
+sgx.nonpie_binary = true
 
 # SGX trusted files
 sgx.trusted_files.runtime = "file:{{ graphene.runtimedir() }}/"

diff --git a/Examples/tensorflow/README.md b/Examples/tensorflow/README.md
@@ -1,7 +1,7 @@
 ## Inference on TensorFlow BERT and ResNet50 models
 This directory contains steps and artifacts to run inference with TensorFlow BERT and ResNet50
 sample workloads on Graphene. Specifically, both these examples use pre-trained models to run
-inference. We tested this on Ubuntu 18.04 and uses the package version with Python 3.6.
+inference.
 
 ### Bidirectional Encoder Representations from Transformers (BERT):
 BERT is a method of pre-training language representations and then use that trained model for
@@ -15,15 +15,7 @@ ResNet50 is a convolutional neural network that is 50 layers deep.
 In this ResNet50(v1.5) sample, we use a pre-trained model and perform int8 inference.
 More details about ResNet50 can be found at https://github.com/IntelAI/models/tree/icx-launch-public/benchmarks/image_recognition/tensorflow/resnet50v1_5.
 
-## Pre-System setting
-Linux systems have CPU frequency scaling governor that helps the system to scale the CPU frequency
-to achieve best performance or to save power based on the requirement.
-To achieve the best peformance, please set the CPU frequency scaling governor to performance mode.
-
-``for ((i=0; i<$(nproc); i++)); do echo 'performance' > /sys/devices/system/cpu/cpu$i/cpufreq/scaling_governor; done``
-
 ## Pre-requisites
-- Install python3.6.
 - Upgrade pip/pip3.
 - Install tensorflow using ``pip install intel-tensorflow-avx512==2.4.0`` or by downloading whl
 package from https://pypi.org/project/intel-tensorflow-avx512/2.4.0/#files.
@@ -36,10 +28,13 @@ package from https://pypi.org/project/intel-tensorflow-avx512/2.4.0/#files.
 - To build the SGX version, do ``make PYTHONDISTPATH=path_to_python_dist_packages/ SGX=1``
 - Typically, path_to_python_dist_packages is '/usr/local/lib/python3.6/dist-packages', but can
 change based on python's installation directory.
->**WARNING:** Building BERT sample downloads about 5GB of data.
+- Keras settings are configured in the file root/.keras/keras.json. It is configured to use
+tensorflow as backend.
+
+**WARNING:** Building BERT sample downloads about 5GB of data.
 
 ## Run inference on BERT model
-- To run int8 inference on graphene-sgx(SGX version)
+- To run int8 inference on graphene-sgx (SGX version)
 ```
 OMP_NUM_THREADS=36 KMP_AFFINITY=granularity=fine,verbose,compact,1,0 taskset -c 0-35 graphene-sgx \
 ./python models/models/language_modeling/tensorflow/bert_large/inference/run_squad.py \
@@ -57,7 +52,7 @@ OMP_NUM_THREADS=36 KMP_AFFINITY=granularity=fine,verbose,compact,1,0 taskset -c
 --inter_op_parallelism_threads=1 \
 --intra_op_parallelism_threads=36
 ```
-- To run int8 inference on graphene-direct(non-SGX version)
+- To run int8 inference on graphene-direct (non-SGX version)
 ```
 OMP_NUM_THREADS=36 KMP_AFFINITY=granularity=fine,verbose,compact,1,0 taskset -c 0-35 \
 graphene-direct ./python models/models/language_modeling/tensorflow/bert_large/inference/run_squad.py \
@@ -76,7 +71,7 @@ graphene-direct ./python models/models/language_modeling/tensorflow/bert_large/i
 --inter_op_parallelism_threads=1 \
 --intra_op_parallelism_threads=36
 ```
-- To run int8 inference on native baremetal(outside graphene)
+- To run int8 inference on native baremetal (outside Graphene)
 ```
 OMP_NUM_THREADS=36 KMP_AFFINITY=granularity=fine,verbose,compact,1,0 taskset -c 0-35 python3.6 \
 models/models/language_modeling/tensorflow/bert_large/inference/run_squad.py \
@@ -95,18 +90,20 @@ models/models/language_modeling/tensorflow/bert_large/inference/run_squad.py \
 --intra_op_parallelism_threads=36
 ```
 - Above commands are for a 36 core system. Please set the following options accordingly for optimal
-  performance.
-    - OMP_NUM_THREADS='Core(s) per socket', OMP_NUM_THREADS sets the maximum number of threads to
-      use for OpenMP parallel regions.
-    - taskset to 'Core(s) per socket'
-    - intra_op_parallelism_threads='Core(s) per socket'
-    - If hyperthreading is enabled : use ``KMP_AFFINITY=granularity=fine,verbose,compact,1,0``
-    - If hyperthreading is disabled : use ``KMP_AFFINITY=granularity=fine,verbose,compact``
-    - KMP_AFFINITY binds OpenMP threads to physical processing units.
->**NOTE:** To get 'Core(s) per socket', do ``lscpu | grep 'Core(s) per socket'``
+  performance:
+    - Assuming that X is the number of cores per socket, set `OMP_NUM_THREADS=X`
+      and `intra_op_parallelism_threads=X`.
+    - Specify the whole range of cores available on one of the sockets in `taskset`.
+    - If hyperthreading is enabled: use ``KMP_AFFINITY=granularity=fine,verbose,compact,1,0``
+    - If hyperthreading is disabled: use ``KMP_AFFINITY=granularity=fine,verbose,compact``
+    - Note that `OMP_NUM_THREADS` sets the maximum number of threads to
+      use for OpenMP parallel regions, and `KMP_AFFINITY` binds OpenMP threads
+      to physical processing units.
+
+**NOTE:** To get number of cores per socket, do ``lscpu | grep 'Core(s) per socket'``.
 
 ## Run inference on ResNet50 model
-- To run inference on graphene-sgx(SGX version)
+- To run inference on graphene-sgx (SGX version)
 ```
 OMP_NUM_THREADS=36 KMP_AFFINITY=granularity=fine,verbose,compact,1,0 taskset -c 0-35 graphene-sgx \
 ./python models/models/image_recognition/tensorflow/resnet50v1_5/inference/eval_image_classifier_inference.py \
@@ -117,7 +114,7 @@ OMP_NUM_THREADS=36 KMP_AFFINITY=granularity=fine,verbose,compact,1,0 taskset -c
 --warmup-steps=50 \
 --steps=500
 ```
-- To run inference on graphene-direct(non-SGX version)
+- To run inference on graphene-direct (non-SGX version)
 ```
 OMP_NUM_THREADS=36 KMP_AFFINITY=granularity=fine,verbose,compact,1,0 taskset -c 0-35 graphene-direct \
 ./python models/models/image_recognition/tensorflow/resnet50v1_5/inference/eval_image_classifier_inference.py \
@@ -128,7 +125,7 @@ OMP_NUM_THREADS=36 KMP_AFFINITY=granularity=fine,verbose,compact,1,0 taskset -c
 --warmup-steps=50 \
 --steps=500
 ```
-- To run inference on native baremetal(outside graphene)
+- To run inference on native baremetal (outside Graphene)
 ```
 OMP_NUM_THREADS=36 KMP_AFFINITY=granularity=fine,verbose,compact,1,0 taskset -c 0-35 python3.6 \
 models/models/image_recognition/tensorflow/resnet50v1_5/inference/eval_image_classifier_inference.py \
@@ -140,36 +137,40 @@ models/models/image_recognition/tensorflow/resnet50v1_5/inference/eval_image_cla
 --steps=500
 ```
 - Above commands are for a 36 core system. Please set the following options accordingly for optimal
-  performance.
-    - OMP_NUM_THREADS='Core(s) per socket', OMP_NUM_THREADS sets the maximum number of threads to
-      use for OpenMP parallel regions.
-    - taskset to 'Core(s) per socket'
-    - num-intra-threads='Core(s) per socket'
-    - If hyperthreading is enabled : use ``KMP_AFFINITY=granularity=fine,verbose,compact,1,0``
-    - If hyperthreading is disabled : use ``KMP_AFFINITY=granularity=fine,verbose,compact``
-    - KMP_AFFINITY binds OpenMP threads to physical processing units.
+  performance:
+    - Assuming that X is the number of cores per socket, set `OMP_NUM_THREADS=X`
+      and `num_intra_threads=X`.
+    - Specify the whole range of cores available on one of the sockets in `taskset`.
+    - If hyperthreading is enabled: use ``KMP_AFFINITY=granularity=fine,verbose,compact,1,0``
+    - If hyperthreading is disabled: use ``KMP_AFFINITY=granularity=fine,verbose,compact``
+    - Note that `OMP_NUM_THREADS` sets the maximum number of threads to
+      use for OpenMP parallel regions, and `KMP_AFFINITY` binds OpenMP threads
+      to physical processing units.
     - The options batch-size, warmup-steps and steps can be varied.
->**NOTE:** To get 'Core(s) per socket', do ``lscpu | grep 'Core(s) per socket'``
+
+**NOTE:** To get number of cores per socket, do ``lscpu | grep 'Core(s) per socket'``.
 
 ## Performance considerations
+- Linux systems have CPU frequency scaling governor that helps the system to scale the CPU frequency
+to achieve best performance or to save power based on the requirement.
+To set the CPU frequency scaling governor to performance mode:
+
+  - ``for ((i=0; i<$(nproc); i++)); do echo 'performance' > /sys/devices/system/cpu/cpu$i/cpufreq/scaling_governor; done``
+
 - Preheat manifest option pre-faults the enclave memory and moves the performance penalty to
 graphene-sgx invocation (before the workload starts execution).
-To use preheat option, add ``sgx.preheat_enclave = 1`` to the manifest template.
+To use preheat option, add ``sgx.preheat_enclave = true`` to the manifest template.
 - TCMalloc and mimalloc are memory allocator libraries from Google and Microsoft that can help
   improve performance significantly based on the workloads. At any point, only one of these
   allocators can be used.
   - TCMalloc (Please update the binary location and name if different from default)
-    - Install tcmalloc : ``sudo apt-get install google-perftools``
+    - Install tcmalloc: ``sudo apt-get install google-perftools``
     - Add the following lines in the manifest template and rebuild the sample.
-```
-loader.env.LD_PRELOAD = "/usr/lib/x86_64-linux-gnu/libtcmalloc.so.4"
-sgx.trusted_files.libtcmalloc = "file:/usr/lib/x86_64-linux-gnu/libtcmalloc.so.4"
-sgx.trusted_files.libunwind = "file:/usr/lib/x86_64-linux-gnu/libunwind.so.8"
-```
+        - ``loader.env.LD_PRELOAD = "/usr/lib/x86_64-linux-gnu/libtcmalloc.so.4"``
+        - ``sgx.trusted_files.libtcmalloc = "file:/usr/lib/x86_64-linux-gnu/libtcmalloc.so.4"``
+        - ``sgx.trusted_files.libunwind = "file:/usr/lib/x86_64-linux-gnu/libunwind.so.8"``
   - mimalloc (Please update the binary location and name if different from default)
     - Install mimalloc using the steps from https://github.com/microsoft/mimalloc
     - Add the following lines in the manifest template and rebuild the sample.
-```
-loader.env.LD_PRELOAD = "/usr/local/lib/mimalloc-1.7/libmimalloc.so.1.7"
-sgx.trusted_files.libmimalloc = "file:/usr/local/lib/mimalloc-1.7/libmimalloc.so.1.7"
-```
+        - ``loader.env.LD_PRELOAD = "/usr/local/lib/mimalloc-1.7/libmimalloc.so.1.7"``
+        - ``sgx.trusted_files.libmimalloc = "file:/usr/local/lib/mimalloc-1.7/libmimalloc.so.1.7"``
diff --git a/Examples/tensorflow/ResNet50/python.manifest.template b/Examples/tensorflow/ResNet50/python.manifest.template
@@ -1,19 +1,17 @@
-# This manifest was tested on Ubuntu 18.04 with python3.6.
-
 libos.entrypoint = "{{ entrypoint }}"
 loader.preload = "file:{{ graphene.libos }}"
 
 # Graphene log level
 loader.log_level = "{{ log_level }}"
 
 # Read application arguments directly from the command line. Don't use this on production!
-loader.insecure__use_cmdline_argv = 1
+loader.insecure__use_cmdline_argv = true
 
 # Propagate environment variables from the host. Don't use this on production!
-loader.insecure__use_host_env = 1
+loader.insecure__use_host_env = true
 
 # Disable address space layout randomization. Don't use this on production!
-loader.insecure__disable_aslr = 1
+loader.insecure__disable_aslr = true
 
 # Update Library Path - overwrites environment variable
 loader.env.LD_LIBRARY_PATH = "{{ python.stdlib }}/lib:/lib:{{ arch_libdir }}:/usr/lib:/usr/{{ arch_libdir }}"
@@ -62,8 +60,8 @@ fs.mount.etc.uri = "file:/etc"
 # SGX general options
 sgx.enclave_size = "32G"
 sgx.thread_num = 300
-sgx.preheat_enclave = 1
-sgx.nonpie_binary = 1
+sgx.preheat_enclave = true
+sgx.nonpie_binary = true
 
 # SGX trusted files
 sgx.trusted_files.runtime = "file:{{ graphene.runtimedir() }}/"