Batching parameters #344

sskgit · 2017-03-06T21:23:01Z

What is the best way to use batching parameters like max_batch_size, batch_timeout_micros, num_batch_threads and other parameters? Tried using them while running the query client.

In the below example I have 100 images and I want to batch in size of 10. The query runs for all images instead of 10.
bazel-bin/tensorflow_serving/example/demo_batch --server=localhost:9000 --max_batch_size = 10

Also, for batch scheduling how to make it run every 10 secs after the first batch is done? Appreciate some ideas..

chrisolston · 2017-03-06T21:43:29Z

Check out batching/README.md.

sskgit · 2017-03-06T23:40:50Z

Thanks @chrisolston . I went thru the README.md for BatchingSession and BasicBatchScedhuler. Most of the parameters needed are in these 2 files.

However, based on query client (inception_client.py), it doesn't look like it calls basic_batch_scheduler and/or BasicBatchScheduler. So, the question is how are these session and scheduling parameters passed to the query client? Looks like I am missing something...

chrisolston · 2017-03-07T00:26:21Z

With the ModelServer binary the batching takes place in the server not the client. I just checked and unfortunately the batching parameters are currently hard-coded to the defaults. It would be easy to extend model_servers/main.cc to be able to accept the parameters via a textual proto file (analogous to model_config_file) containing a BatchingParameters proto, if you're up for making a (simple) code contribution :)

As a work-around, you can hard-code some specific values into the BatchingParameters proto in main.cc.

chrisolston · 2017-03-07T00:40:22Z

On second thought, since it's only a few lines of code I will add the flag. I'm working on the change internally at Google, and if things go smoothly it will propagate to open-source in a week or so.

sskgit · 2017-03-07T14:08:54Z

@chrisolston Thanks for the clarification. Will try the workaround for now.

abuvaneswari · 2017-09-08T18:20:58Z

As of now, tensorflow_model_server seems to support batching_parameter_file as an argument. Can someone point me to a template or specification for this file? I searched around and could not come across anything in a timely manner

chrisolston · 2017-09-08T20:41:33Z

As the flag documentation states, it's an ascii protobuf for the BatchingParameters proto (defined in session_bundle_config.proto). You can find information about the ascii protobuf format elsewhere. It basically looks like:
message {
field1: value1
field2: value2
}

abuvaneswari · 2017-09-12T17:20:41Z

What is the default setting for batching parameters when enable_batching=true? I looked at the following config file, played with the parameter settings and ran the model server against each of the settings, but the results do not match up with what I get when I do not supply any file name to batching_parameters_file (but have enable_batching=true)..
serving/tensorflow_serving/servables/tensorflow/testdata/batching_config.txt

chrisolston · 2017-09-12T17:33:03Z

The default values are in https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/batching/basic_batch_scheduler.h

sreddybr3 · 2017-11-26T03:01:14Z

I am running TFS (custom build for GPU) with standard InceptionV3 model.

Amazon EC2 P2.xlarge instance:

export CUDA_HOME=/usr/local/cuda \
       TF_NEED_CUDA=1 \
       TF_CUDA_CLANG=0 \
       GCC_HOST_COMPILER_PATH=/usr/bin/gcc \
       TF_CUDA_VERSION=8.0 \
       CUDA_TOOLKIT_PATH=/usr/local/cuda \
       TF_CUDNN_VERSION=6 \
       CUDNN_INSTALL_PATH=/usr/local/cuda \
       CC_OPT_FLAGS="-c opt --copt=-mavx --copt=-msse4.2 --copt=-msse4.1 --copt=-msse3 --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --config=opt --config=cuda" \
       PYTHON_BIN_PATH="/usr/bin/python" \
       USE_DEFAULT_PYTHON_LIB_PATH=1 \
       TF_NEED_JEMALLOC=1 \
       TF_NEED_GCP=0 \
       TF_NEED_HDFS=0 \
       TF_ENABLE_XLA=0 \
       TF_NEED_OPENCL=0 \
       TF_NEED_MKL=0 \
       TF_NEED_MPI=0 \
       TF_NEED_VERBS=0

build tensorflow_model_server

bazel build -c opt --copt=-mavx --copt=-msse4.2 --copt=-msse4.1 --copt=-msse3 --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --config=opt --config=cuda \
    --crosstool_top=@local_config_cuda//crosstool:toolchain \
    tensorflow_serving/model_servers:tensorflow_model_server

run server using batching config file

bazel-bin/tensorflow_serving/model_servers/tensorflow_model_server --port=9000 --model_name=inception --model_base_path=/inception-export --enable_batching --batching_parameters_file=batching.conf

batching.conf file contents

max_batch_size { value: 16 }
batch_timeout_micros { value: 100000 }
max_enqueued_batches { value: 1000000 }
num_batch_threads { value: 4 }

I have set num_batch_threads equal to number of CPU cores i.e. 4
Varied batch_timeout_micros between 0, 1000, 10000, 100000, 500000
varied max_batch_size between 8,16,32,64,128,256

Tensorflow Serving initialisation logs shows that GPU is visible and utilised.

nvidia-smi --query-gpu=timestamp,name,pci.bus_id,driver_version,pstate,pcie.link.gen.max,pcie.link.gen.current,temperature.gpu,utilization.gpu,utilization.memory,memory.total,memory.free,memory.used --format=csv -l 1

will show how GPU is being utilised when performance script is run.

Test 1: Single process with 100 thread(s) per process and 1 request per thread
Results:
Without batching (average response time for multiple execution is 2.75 sec):

avg. resp. time (msec) | failure rate % | model
2898.545 0.00% inception

With batching using above mentioned config (average response time for multiple execution is 1.1 sec):

avg. resp. time (msec) | failure rate % | model
1179.988 0.00% inception

Test 2: Single process with 100 thread(s) per process and 10 requests per thread
Results:
Without batching (average response time for multiple execution is 4.5 sec):

avg. resp. time (msec) | failure rate % | model
4525.389 0.00% inception

With batching using above mentioned config (average response time for multiple execution is 1.4 sec):

avg. resp. time (msec) | failure rate % | model
1377.638 0.00% inception

Numbers looks better with batching. We have TF benchmark for training: https://www.tensorflow.org/performance/benchmarks
do we any benchmark for TensorFlow serving using InceptionV3 model?

pharrellyhy · 2018-07-24T09:13:40Z

Hi @sreddybr3,
I'm trying to use batching to speed up inference. In my setting, tensorflow is not built in optimized mode but it should be ok for batching. In my test case, the input shape is [32, 112, 112, 3], so in batching.conf I set max_batch_size to 32. This will cost the same time to finish the test, say, 500 requests. While if I increase the max_batch_size, the performance is even worse. I even tweak the value of num_batch_threads which seems not helping too much. Do you have any thoughts? Thanks!

misterpeddy · 2019-06-14T00:50:25Z

This is a great discussion and points to a need for documentation of batching for modelserver binary. I have opened #1379 to add docs - if anyone has any thoughts or suggestions please comment on that issue :)

TheR3d1 · 2019-07-05T10:25:48Z

I am running TFS (custom build for GPU) with standard InceptionV3 model.

Amazon EC2 P2.xlarge instance:
export CUDA_HOME=/usr/local/cuda \
       TF_NEED_CUDA=1 \
       TF_CUDA_CLANG=0 \
       GCC_HOST_COMPILER_PATH=/usr/bin/gcc \
       TF_CUDA_VERSION=8.0 \
       CUDA_TOOLKIT_PATH=/usr/local/cuda \
       TF_CUDNN_VERSION=6 \
       CUDNN_INSTALL_PATH=/usr/local/cuda \
       CC_OPT_FLAGS="-c opt --copt=-mavx --copt=-msse4.2 --copt=-msse4.1 --copt=-msse3 --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --config=opt --config=cuda" \
       PYTHON_BIN_PATH="/usr/bin/python" \
       USE_DEFAULT_PYTHON_LIB_PATH=1 \
       TF_NEED_JEMALLOC=1 \
       TF_NEED_GCP=0 \
       TF_NEED_HDFS=0 \
       TF_ENABLE_XLA=0 \
       TF_NEED_OPENCL=0 \
       TF_NEED_MKL=0 \
       TF_NEED_MPI=0 \
       TF_NEED_VERBS=0
build tensorflow_model_server
bazel build -c opt --copt=-mavx --copt=-msse4.2 --copt=-msse4.1 --copt=-msse3 --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --config=opt --config=cuda \
    --crosstool_top=@local_config_cuda//crosstool:toolchain \
    tensorflow_serving/model_servers:tensorflow_model_server
run server using batching config file
bazel-bin/tensorflow_serving/model_servers/tensorflow_model_server --port=9000 --model_name=inception --model_base_path=/inception-export --enable_batching --batching_parameters_file=batching.conf
batching.conf file contents
max_batch_size { value: 16 }
batch_timeout_micros { value: 100000 }
max_enqueued_batches { value: 1000000 }
num_batch_threads { value: 4 }
I have set num_batch_threads equal to number of CPU cores i.e. 4
Varied batch_timeout_micros between 0, 1000, 10000, 100000, 500000
varied max_batch_size between 8,16,32,64,128,256

Tensorflow Serving initialisation logs shows that GPU is visible and utilised.
nvidia-smi --query-gpu=timestamp,name,pci.bus_id,driver_version,pstate,pcie.link.gen.max,pcie.link.gen.current,temperature.gpu,utilization.gpu,utilization.memory,memory.total,memory.free,memory.used --format=csv -l 1
will show how GPU is being utilised when performance script is run.

Test 1: Single process with 100 thread(s) per process and 1 request per thread
Results:
Without batching (average response time for multiple execution is 2.75 sec):

avg. resp. time (msec) | failure rate % | model
2898.545 0.00% inception

With batching using above mentioned config (average response time for multiple execution is 1.1 sec):

avg. resp. time (msec) | failure rate % | model
1179.988 0.00% inception

Test 2: Single process with 100 thread(s) per process and 10 requests per thread
Results:
Without batching (average response time for multiple execution is 4.5 sec):

avg. resp. time (msec) | failure rate % | model
4525.389 0.00% inception

With batching using above mentioned config (average response time for multiple execution is 1.4 sec):

avg. resp. time (msec) | failure rate % | model
1377.638 0.00% inception

Numbers looks better with batching. We have TF benchmark for training: https://www.tensorflow.org/performance/benchmarks
do we any benchmark for TensorFlow serving using InceptionV3 model?

I am trying to use a batching config file using --batching_parameters_file. Where this file has to be placed in order to be used?

aaur0 · 2019-08-18T19:37:23Z

I am running TFS (custom build for GPU) with standard InceptionV3 model.
Amazon EC2 P2.xlarge instance:
export CUDA_HOME=/usr/local/cuda \
       TF_NEED_CUDA=1 \
       TF_CUDA_CLANG=0 \
       GCC_HOST_COMPILER_PATH=/usr/bin/gcc \
       TF_CUDA_VERSION=8.0 \
       CUDA_TOOLKIT_PATH=/usr/local/cuda \
       TF_CUDNN_VERSION=6 \
       CUDNN_INSTALL_PATH=/usr/local/cuda \
       CC_OPT_FLAGS="-c opt --copt=-mavx --copt=-msse4.2 --copt=-msse4.1 --copt=-msse3 --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --config=opt --config=cuda" \
       PYTHON_BIN_PATH="/usr/bin/python" \
       USE_DEFAULT_PYTHON_LIB_PATH=1 \
       TF_NEED_JEMALLOC=1 \
       TF_NEED_GCP=0 \
       TF_NEED_HDFS=0 \
       TF_ENABLE_XLA=0 \
       TF_NEED_OPENCL=0 \
       TF_NEED_MKL=0 \
       TF_NEED_MPI=0 \
       TF_NEED_VERBS=0
build tensorflow_model_server
bazel build -c opt --copt=-mavx --copt=-msse4.2 --copt=-msse4.1 --copt=-msse3 --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --config=opt --config=cuda \
    --crosstool_top=@local_config_cuda//crosstool:toolchain \
    tensorflow_serving/model_servers:tensorflow_model_server
run server using batching config file
bazel-bin/tensorflow_serving/model_servers/tensorflow_model_server --port=9000 --model_name=inception --model_base_path=/inception-export --enable_batching --batching_parameters_file=batching.conf
batching.conf file contents
max_batch_size { value: 16 }
batch_timeout_micros { value: 100000 }
max_enqueued_batches { value: 1000000 }
num_batch_threads { value: 4 }
I have set num_batch_threads equal to number of CPU cores i.e. 4
Varied batch_timeout_micros between 0, 1000, 10000, 100000, 500000
varied max_batch_size between 8,16,32,64,128,256
Tensorflow Serving initialisation logs shows that GPU is visible and utilised.
nvidia-smi --query-gpu=timestamp,name,pci.bus_id,driver_version,pstate,pcie.link.gen.max,pcie.link.gen.current,temperature.gpu,utilization.gpu,utilization.memory,memory.total,memory.free,memory.used --format=csv -l 1
will show how GPU is being utilised when performance script is run.
Test 1: Single process with 100 thread(s) per process and 1 request per thread
Results:
Without batching (average response time for multiple execution is 2.75 sec):

avg. resp. time (msec) | failure rate % | model
2898.545 0.00% inception

With batching using above mentioned config (average response time for multiple execution is 1.1 sec):

avg. resp. time (msec) | failure rate % | model
1179.988 0.00% inception

Test 2: Single process with 100 thread(s) per process and 10 requests per thread
Results:
Without batching (average response time for multiple execution is 4.5 sec):

avg. resp. time (msec) | failure rate % | model
4525.389 0.00% inception

With batching using above mentioned config (average response time for multiple execution is 1.4 sec):

avg. resp. time (msec) | failure rate % | model
1377.638 0.00% inception

Numbers looks better with batching. We have TF benchmark for training: https://www.tensorflow.org/performance/benchmarks
do we any benchmark for TensorFlow serving using InceptionV3 model?
I am trying to use a batching config file using --batching_parameters_file. Where this file has to be placed in order to be used?

just give the absolute path of the file. In the case of docker, you will have to mount your local folder or have the file in the container itself.

mikezhang95 · 2020-05-20T10:40:34Z

Can we reload the batch config after the server starts ? I see model config has such function.

DachuanZhao · 2020-09-27T08:36:47Z

Same problem . I run tf-serving like this :

sudo docker run -p 8501:8501 -d --name="tf_serving" \
--mount type=bind,source=/mnt1/zhaodachuan/tf_model/push/lr,target=/models/push_lr \
-v /mnt1/zhaodachuan/tf-serving/config_file/batch_size.config:/models/config/batch_size.config \
-e MODEL_NAME=push_lr -t tensorflow/serving --enable_batching=true \
--batching_parameters_file=/models/config/batch_size.config

And my batch_size.config is :

max_batch_size { value: 1000000 }
batch_timeout_micros { value: 0 }
max_enqueued_batches { value: 1000000 }
num_batch_threads { value: 8 }

It works as slow as I don't use --enable_batching=true

chrisolston added the stat:contributions welcome label Mar 7, 2017

bkungfoo mentioned this issue Feb 12, 2018

Batching Configurations for TF serving in Kubeflow kubeflow/kubeflow#236

Closed

Li-Shu14 mentioned this issue Apr 5, 2018

MODEL_SERVER : Using tag batch_parameters_file when platform_config_file is not empty #836

Closed

Harshini-Gadige added the type:feature label Jan 18, 2019

wcwang07 mentioned this issue Feb 17, 2019

Benchmark against SageMaker and Tensorflow Serving ucbrise/clipper#588

Open

misterpeddy closed this as completed Jun 14, 2019

misterpeddy added the gpu utilization label Sep 23, 2019

misterpeddy added the type:performance Performance Issue label Nov 18, 2019

dmdu mentioned this issue Oct 8, 2020

TF serving NREL/rlmolecule#16

Open

mehransi mentioned this issue Jun 7, 2022

Can we reload the batch config after the server starts ? I see model config has such function. #2008

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batching parameters #344

Batching parameters #344

sskgit commented Mar 6, 2017

chrisolston commented Mar 6, 2017

sskgit commented Mar 6, 2017

chrisolston commented Mar 7, 2017

chrisolston commented Mar 7, 2017

sskgit commented Mar 7, 2017

abuvaneswari commented Sep 8, 2017

chrisolston commented Sep 8, 2017

abuvaneswari commented Sep 12, 2017

chrisolston commented Sep 12, 2017

sreddybr3 commented Nov 26, 2017 •

edited

Loading

pharrellyhy commented Jul 24, 2018

misterpeddy commented Jun 14, 2019

TheR3d1 commented Jul 5, 2019

aaur0 commented Aug 18, 2019

mikezhang95 commented May 20, 2020

DachuanZhao commented Sep 27, 2020

Batching parameters #344

Batching parameters #344

Comments

sskgit commented Mar 6, 2017

chrisolston commented Mar 6, 2017

sskgit commented Mar 6, 2017

chrisolston commented Mar 7, 2017

chrisolston commented Mar 7, 2017

sskgit commented Mar 7, 2017

abuvaneswari commented Sep 8, 2017

chrisolston commented Sep 8, 2017

abuvaneswari commented Sep 12, 2017

chrisolston commented Sep 12, 2017

sreddybr3 commented Nov 26, 2017 • edited Loading

pharrellyhy commented Jul 24, 2018

misterpeddy commented Jun 14, 2019

TheR3d1 commented Jul 5, 2019

aaur0 commented Aug 18, 2019

mikezhang95 commented May 20, 2020

DachuanZhao commented Sep 27, 2020

sreddybr3 commented Nov 26, 2017 •

edited

Loading