-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Query regarding Performance of ssd_mobilenet in Xavier #73
Comments
Hi @Niran89,
MAX-N mode is the setting of the maximum number of cores and clock(Hz) of CPU and GPU. jetson_clocks.sh sets the CPU and GPU to the maximum clock.
split model of nms_v1 targets These models have the same ssd mobilenet v1 part, but non-maximum suppression part is different. About split model: |
Hi Naisy, Thank you for the reply and sorry for the delayed response from my side. FYI, I have already set the clocks in Xavier using "sudo ./jetson_clocks.sh". But even after setting the clock and running the Xavier in Max-N mode, I still see 10 fps drop while running "ssd_mobilenet_v1_coco_2018_01_28" model with the car image (544x288 resolution) you have used. From your github, I understand that you have achieved almost 52fps in Xavier with the car image. But I could attain only 40fps. Can you let me know what I am missing more. Below is the configuration I have used before running the model, In config.yml: Other configurations are still same as yours. Once these configurations are set, I ran My current Setup in Xavier, Jetpack - 4.1 Appreciate your help. Thanks & Regards, |
Hi @Niran89, run_image.py and run_video.py are slower than run_stream.py. Can you try with usb webcam? |
Hi @naisy , Thank you once again... As suggested, I ran the algorithm with a stream from a USB camera. But still I could see 10fps drop. The configuration I used is, Hardware : Xavier From your performance table, for the above configuration, the FPS is supposed to be 48 in Xavier. Note : I tried with multiple objects in the scene. For any number of objects and even with no objects in the scene, the fps is still 39. Appreciate your help... Thanks & Regards, |
Hi @Niran89, Can you show tegrastats? |
Hi @naisy , PFA for the tegrastats recorded while running the object detection code in Xavier. The configuration is same as above. Also you could see the FPS recorded in the same image. |
Hi @Niran89, Looking at the tegrastats results, it seems that both the GPU and the CPU are doing enough work. If you are running over the network, it will be slower. For example, |
Hi @naisy , Thank you for the reply. I am sure that the code is not run over the network as you mentioned above. I am directly running it as mentioned in your README file. Anyways thanks for your support so far. Kindly let me know if you get any pointers regarding the performance drop in future. Looking forward. Thanks & Regards, |
Dear naisy,
Thank you for your work. I am currently working in porting ssd mobilenet Object Detection algorithm in Xavier platform.
The current configuration used in Xavier,
Jetpack - 4.1
Tensorflow - 1.12.0 with gpu support
TensorRT - 5.0.3 with cuda 10
I have set up your project in Xavier and I am able to run it successfully. I have made the below observations w.r.t the performance by running a video stream (1280*720) as input. PFA.
Note : During all these experiments, the Xavier is set to Max-N mode.
Can you help me to get a clarity on the following queries,
Is my observation on the performance correct? Is this the expected performance in Xavier ? I have refered the "Current Max Performance of ssd_mobilenet_v1_coco_2018_01_28" table given in the README.md
From the table I understand that in "Xavier with Max-N mode with visualiztion, for input of 1280x720, the FPS should be 48fps". But I have got only 35fps (nms_v2,ssd_mobilenet_v1_coco_2018_01_28).
I am not sure why there is almost 10+fps drop when compared to the expected one. I havent made any changes in the code.
As per my understanding, TRT model is supposed to perform better than the normal TF model. But from my above observation, I see a drop in fps by using TRT (comparison: trt_v1 vs nms_v2)
From the overall understanding, nms_v2 is performing better than nms_v1 and trt_v1. What is the major difference between nms_v1, nms_v2, trt_v1
Appreciate your help.
Regards,
Niran
The text was updated successfully, but these errors were encountered: