Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about the inference speed for using the model in real-time #20

Closed
msson opened this issue May 17, 2018 · 16 comments
Closed

Question about the inference speed for using the model in real-time #20

msson opened this issue May 17, 2018 · 16 comments

Comments

@msson
Copy link

msson commented May 17, 2018

Hello,

I am trying to test models which are skipnet-mobilenet and shuffleseg with my own dataset (1024x768 resolution)

I've seen in the paper that those models' can be used in real-time situation cause the inference speed is faster than 10fps on PC.

However, when I run your code for inference with my own data, the speed is about 1.5fps and both models speed are similar even though you mentioned shuffeseg is faster than skipnet-mobilenet here.

Please give me any advice for using the model in real-time.

Thank you.

@MSiam
Copy link
Owner

MSiam commented May 17, 2018

Hi,

We have reported on 640x360 image resolution to compare with ENet and computed the number of floating point operations using the profiler tool in tensorflow. MobileNet-SkipNet and ShuffleNet-SkipNet had GFLOPs is 6.2 and 2 respectively. So ShuffleNet should be faster.

We did record the framerate but we havent reported it yet as we are still working on improving it, on TITAN-X the framerate for ShuffleNet was 143 fps and MobileNet was 141 on the same image resolution. However for an efficient inference there are a couple of tricks you need to do like NCHW optimization, there is also the use of optimize_inference tool for Tensorflow that merges some operations and does batchnorm folding. But even without these when I measured the framerate shufflenet was faster than mobilenet.

If you checked our branch optimize_inference and run ./run.sh you will get a graph.pb, then run ./optimize.sh you will get optimized_graph.pb, if you wanna measure inference we ran the infer_optimize.py. But we're still working on it , as this version doesnt have the NCHW optimization yet. Please let us know if you found any bugs in this version of the code.

@hellochick
Copy link

Hey @MSiam,
Could you tell me the method you calculate fps? Does the time include i/o time? Because if I use time.time() to calculate, this might include the memory time & i/o time. So, please tell me the method you use to calculate inference time, this might be really helpful to my research, thank you.

@msson
Copy link
Author

msson commented May 23, 2018

@MSiam Hey,
I don't exactly know how to check the model's inference time. When I test the model, I am able to see this.
1234
"1.21s/it" is this mean that the model inference speed is 1.21s per frame? If it is, it's too slow that I expected.

@hellochick
Copy link

@msson , in tf.1.8 we have tf.profiler can watch timeline, but i'm not sure what tool does he use.

@msson
Copy link
Author

msson commented May 27, 2018

Does anyone who can watch shuffleseg's inference speed faster than 1.21s/it?

@MSiam
Copy link
Owner

MSiam commented May 28, 2018

@msson You have to run in inference mode in the run.sh:
python3 main.py --load_config=fcn8s_shufflenet_test.yaml inference Train FCN8sShuffleNet

This will measure the average running time. We are using TF 1.4.
It will call test_inference method in train/train.py if you want to check how its working.

@hellochick Yes we are using time.time you can check in train.py code.

@msson
Copy link
Author

msson commented May 28, 2018

@MSiam Thanks for your reply. I checked shuffleseg model's running time is about 9fps (not always) with 512x384 resolution and about 3.3fps with 1024x768 resolution in inference mode.

  1. Are Those results reasonable?
  2. What resolution do you use for inference mode? (when you get 15fps speed using shuffleseg)

The reason why I am asking these is that I saw in the paper you mentioned shuffleseg's speed is about 15fps.

  1. If my current results are reasonable, do I have to do NCHW optimization that you mentioned above for making the model speed to 15fps?

  2. Is there any other method to reduce running time on your shuffleseg code?

Thanks.

@MSiam
Copy link
Owner

MSiam commented May 30, 2018

What platform are you running on, are you running on the Jetson TX1? Cause otherwise this is much slower than what we reported.
The optimized version as I mentioned earlier is in optimize_inference branch.
1- Run ./run.sh which will create a graph.pb
2- Run ./optimize.sh that will generate graph_optimized.pb
3- python infer_optimize.py --graph graph_optimized.pb this should give you the running time.

143 fps the result we got on TITAN-X and around 16 fps on TX2 (this is without NCHW optimization) on 640x360 resolution. If you need more use NCHW optimization and u can check TensorRT.

@msson
Copy link
Author

msson commented Jun 7, 2018

@MSiam Thanks I successfully checked the running time using graph_optimized.pb

@msson msson closed this as completed Jun 7, 2018
@zeroAska
Copy link

zeroAska commented Jun 7, 2018

Hi @msson, if possible, would you like to share with me how do you obtain graph_optimized.pb? Also, are you training shuffleseg from scratch? Thanks!

@msson
Copy link
Author

msson commented Jun 7, 2018

@zeroAska Hi, yes I trained shuffleseg from scratch. And the way I got graph_optimized.pb is that I just simply followed author's mention below. (you should check and download optimize.sh and infer_optimize.py on their branch.)
1- Run ./run.sh which will create a graph.pb
2- Run ./optimize.sh that will generate graph_optimized.pb
3- python infer_optimize.py --graph graph_optimized.pb this should give you the running time.

@harsh-agar
Copy link

harsh-agar commented Jul 26, 2018

Hi @MSiam, I get a inference time of about 60 FPS with image resolution 640x360 and around 70 FPS with image resolution 512 x 256 on GTX 1080 GPU, when using pre-trained fcn8s_shufflenet model weights, with master branch (without optimization).

Is this speed normal or slow?

@MSiam
Copy link
Owner

MSiam commented Jul 26, 2018

We have reported above the one after optimization this is whats measured to compare with ENet since he mentions in his paper he was also fusing operations and batchnorm folding. It was arnd 143 fps on a TITAN-X. I am not sure hw was it before. Let us know if u faced issues with the optimization.

Also another diff that ENEt is using torch which by default has CHW order while our initial implementation in TF was HWC which is slower u would have to use the CHW implementation if u want it even faster. However the number of operations should be a more stable way to compare as it doesnt have dependnlencies on the env. Thats what we were reporting in the paper.

@MSiam MSiam mentioned this issue Jul 26, 2018
@harsh-agar
Copy link

After using optimize_inference branch I get a speed of around 67.5 FPS which is still quite low on GTX 1080 considering you got 143 FPS.

Anything I can do to improve the speed?

Thanks!

@MSiam
Copy link
Owner

MSiam commented Nov 9, 2018

I just realized an issue in the inference time we earlier reported it was 143 fps but on a TITAN-X pascal architecture and I contacted ENet author he is working with Maxwell architecture. We're still trying to get access to a GPU with Maxwell architecture to measure the inference time and finalize the optimized + NCHW implementation. However the reported GFLOPs in the paper is correct no issues with that.

Meanwhile I compared against this caffe implementation for ENet thats referred to by the original repo:
https://github.com/TimoSaemann/ENet
They report 20 ms & 50 fps with 1024x512 resolution on a TITAN-X Pascal architecture without applying any optimization (such as batchnorm folding,..). SkipNet ShuffleNet with NCHW without optimization as well needs 16 ms & 62.5 fps on the same resolution and GPU.

Update:
After running on Maxwell Architecture the best I got with the optimized version is 88 fps. This depends then on the kind of optimizations ENet was performing. But in terms of GFLops and Inference time before optim. ShuffleNet is better.

@bloonz
Copy link

bloonz commented Jan 2, 2019

Hello @MSiam,
I've tried to measure the inference speed of ShuffleSeg on Nvidia Jetson TX2 (input image size = 512x512) following these instructions:

1- Run ./run.sh which will create a graph.pb
2- Run ./optimize.sh that will generate graph_optimized.pb
3- python3 infer_optimize.py --graph graph_optimized.pb

The problem is that when I ran the infer_optimize.py several times, the FPS was different for every run, ranging from 8 FPS to 16 FPS.

Do you have any suggestions as to why that might occur?

Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants