Question about the inference speed for using the model in real-time #20

msson · 2018-05-17T02:10:01Z

Hello,

I am trying to test models which are skipnet-mobilenet and shuffleseg with my own dataset (1024x768 resolution)

I've seen in the paper that those models' can be used in real-time situation cause the inference speed is faster than 10fps on PC.

However, when I run your code for inference with my own data, the speed is about 1.5fps and both models speed are similar even though you mentioned shuffeseg is faster than skipnet-mobilenet here.

Please give me any advice for using the model in real-time.

Thank you.

MSiam · 2018-05-17T13:42:56Z

Hi,

We have reported on 640x360 image resolution to compare with ENet and computed the number of floating point operations using the profiler tool in tensorflow. MobileNet-SkipNet and ShuffleNet-SkipNet had GFLOPs is 6.2 and 2 respectively. So ShuffleNet should be faster.

We did record the framerate but we havent reported it yet as we are still working on improving it, on TITAN-X the framerate for ShuffleNet was 143 fps and MobileNet was 141 on the same image resolution. However for an efficient inference there are a couple of tricks you need to do like NCHW optimization, there is also the use of optimize_inference tool for Tensorflow that merges some operations and does batchnorm folding. But even without these when I measured the framerate shufflenet was faster than mobilenet.

If you checked our branch optimize_inference and run ./run.sh you will get a graph.pb, then run ./optimize.sh you will get optimized_graph.pb, if you wanna measure inference we ran the infer_optimize.py. But we're still working on it , as this version doesnt have the NCHW optimization yet. Please let us know if you found any bugs in this version of the code.

hellochick · 2018-05-22T04:34:22Z

Hey @MSiam,
Could you tell me the method you calculate fps? Does the time include i/o time? Because if I use time.time() to calculate, this might include the memory time & i/o time. So, please tell me the method you use to calculate inference time, this might be really helpful to my research, thank you.

msson · 2018-05-23T07:31:57Z

@MSiam Hey,
I don't exactly know how to check the model's inference time. When I test the model, I am able to see this.

"1.21s/it" is this mean that the model inference speed is 1.21s per frame? If it is, it's too slow that I expected.

hellochick · 2018-05-23T08:14:12Z

@msson , in tf.1.8 we have tf.profiler can watch timeline, but i'm not sure what tool does he use.

msson · 2018-05-27T23:42:02Z

Does anyone who can watch shuffleseg's inference speed faster than 1.21s/it?

MSiam · 2018-05-28T06:10:53Z

@msson You have to run in inference mode in the run.sh:
python3 main.py --load_config=fcn8s_shufflenet_test.yaml inference Train FCN8sShuffleNet

This will measure the average running time. We are using TF 1.4.
It will call test_inference method in train/train.py if you want to check how its working.

@hellochick Yes we are using time.time you can check in train.py code.

msson · 2018-05-28T07:11:10Z

@MSiam Thanks for your reply. I checked shuffleseg model's running time is about 9fps (not always) with 512x384 resolution and about 3.3fps with 1024x768 resolution in inference mode.

Are Those results reasonable?
What resolution do you use for inference mode? (when you get 15fps speed using shuffleseg)

The reason why I am asking these is that I saw in the paper you mentioned shuffleseg's speed is about 15fps.

If my current results are reasonable, do I have to do NCHW optimization that you mentioned above for making the model speed to 15fps?
Is there any other method to reduce running time on your shuffleseg code?

Thanks.

MSiam · 2018-05-30T10:58:00Z

What platform are you running on, are you running on the Jetson TX1? Cause otherwise this is much slower than what we reported.
The optimized version as I mentioned earlier is in optimize_inference branch.
1- Run ./run.sh which will create a graph.pb
2- Run ./optimize.sh that will generate graph_optimized.pb
3- python infer_optimize.py --graph graph_optimized.pb this should give you the running time.

143 fps the result we got on TITAN-X and around 16 fps on TX2 (this is without NCHW optimization) on 640x360 resolution. If you need more use NCHW optimization and u can check TensorRT.

msson · 2018-06-07T08:48:40Z

@MSiam Thanks I successfully checked the running time using graph_optimized.pb

zeroAska · 2018-06-07T22:07:27Z

Hi @msson, if possible, would you like to share with me how do you obtain graph_optimized.pb? Also, are you training shuffleseg from scratch? Thanks!

msson · 2018-06-07T23:52:07Z

@zeroAska Hi, yes I trained shuffleseg from scratch. And the way I got graph_optimized.pb is that I just simply followed author's mention below. (you should check and download optimize.sh and infer_optimize.py on their branch.)
1- Run ./run.sh which will create a graph.pb
2- Run ./optimize.sh that will generate graph_optimized.pb
3- python infer_optimize.py --graph graph_optimized.pb this should give you the running time.

harsh-agar · 2018-07-26T10:52:58Z

Hi @MSiam, I get a inference time of about 60 FPS with image resolution 640x360 and around 70 FPS with image resolution 512 x 256 on GTX 1080 GPU, when using pre-trained fcn8s_shufflenet model weights, with master branch (without optimization).

Is this speed normal or slow?

MSiam · 2018-07-26T17:48:23Z

We have reported above the one after optimization this is whats measured to compare with ENet since he mentions in his paper he was also fusing operations and batchnorm folding. It was arnd 143 fps on a TITAN-X. I am not sure hw was it before. Let us know if u faced issues with the optimization.

Also another diff that ENEt is using torch which by default has CHW order while our initial implementation in TF was HWC which is slower u would have to use the CHW implementation if u want it even faster. However the number of operations should be a more stable way to compare as it doesnt have dependnlencies on the env. Thats what we were reporting in the paper.

harsh-agar · 2018-07-30T10:08:03Z

After using optimize_inference branch I get a speed of around 67.5 FPS which is still quite low on GTX 1080 considering you got 143 FPS.

Anything I can do to improve the speed?

Thanks!

MSiam · 2018-11-09T15:11:04Z

I just realized an issue in the inference time we earlier reported it was 143 fps but on a TITAN-X pascal architecture and I contacted ENet author he is working with Maxwell architecture. We're still trying to get access to a GPU with Maxwell architecture to measure the inference time and finalize the optimized + NCHW implementation. However the reported GFLOPs in the paper is correct no issues with that.

Meanwhile I compared against this caffe implementation for ENet thats referred to by the original repo:
https://github.com/TimoSaemann/ENet
They report 20 ms & 50 fps with 1024x512 resolution on a TITAN-X Pascal architecture without applying any optimization (such as batchnorm folding,..). SkipNet ShuffleNet with NCHW without optimization as well needs 16 ms & 62.5 fps on the same resolution and GPU.

Update:
After running on Maxwell Architecture the best I got with the optimized version is 88 fps. This depends then on the kind of optimizations ENet was performing. But in terms of GFLops and Inference time before optim. ShuffleNet is better.

bloonz · 2019-01-02T13:17:34Z

Hello @MSiam,
I've tried to measure the inference speed of ShuffleSeg on Nvidia Jetson TX2 (input image size = 512x512) following these instructions:

1- Run ./run.sh which will create a graph.pb
2- Run ./optimize.sh that will generate graph_optimized.pb
3- python3 infer_optimize.py --graph graph_optimized.pb

The problem is that when I ran the infer_optimize.py several times, the FPS was different for every run, ranging from 8 FPS to 16 FPS.

Do you have any suggestions as to why that might occur?

Thank you.

msson closed this as completed Jun 7, 2018

MSiam mentioned this issue Jul 26, 2018

How many fps? #34

Closed

sfdn123 mentioned this issue Aug 31, 2018

shape of output data #43

Closed

MSiam mentioned this issue Nov 15, 2018

4ps on GTX 1080 Ti #55

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about the inference speed for using the model in real-time #20

Question about the inference speed for using the model in real-time #20

msson commented May 17, 2018

MSiam commented May 17, 2018 •

edited

Loading

hellochick commented May 22, 2018

msson commented May 23, 2018

hellochick commented May 23, 2018

msson commented May 27, 2018

MSiam commented May 28, 2018 •

edited

Loading

msson commented May 28, 2018 •

edited

Loading

MSiam commented May 30, 2018

msson commented Jun 7, 2018

zeroAska commented Jun 7, 2018

msson commented Jun 7, 2018

harsh-agar commented Jul 26, 2018 •

edited

Loading

MSiam commented Jul 26, 2018

harsh-agar commented Jul 30, 2018

MSiam commented Nov 9, 2018 •

edited

Loading

bloonz commented Jan 2, 2019

Question about the inference speed for using the model in real-time #20

Question about the inference speed for using the model in real-time #20

Comments

msson commented May 17, 2018

MSiam commented May 17, 2018 • edited Loading

hellochick commented May 22, 2018

msson commented May 23, 2018

hellochick commented May 23, 2018

msson commented May 27, 2018

MSiam commented May 28, 2018 • edited Loading

msson commented May 28, 2018 • edited Loading

MSiam commented May 30, 2018

msson commented Jun 7, 2018

zeroAska commented Jun 7, 2018

msson commented Jun 7, 2018

harsh-agar commented Jul 26, 2018 • edited Loading

MSiam commented Jul 26, 2018

harsh-agar commented Jul 30, 2018

MSiam commented Nov 9, 2018 • edited Loading

bloonz commented Jan 2, 2019

MSiam commented May 17, 2018 •

edited

Loading

MSiam commented May 28, 2018 •

edited

Loading

msson commented May 28, 2018 •

edited

Loading

harsh-agar commented Jul 26, 2018 •

edited

Loading

MSiam commented Nov 9, 2018 •

edited

Loading