OK, I think I got it sorted out. I left protobuf 2.6.1 almost untouched, just installed 3.6.1 from sources with cpp implementation next to it and set the symlinks in a way that 3.6.1 is the default one. Now after:
export PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION = cpp
I found that this update tends to break pip, so simply updated it with:
wget http: //se.archive.ubuntu.com/ubuntu/pool/universe/p/python-pip/python3-pip_9.0.1-2_all.deb
wget http: //se.archive.ubuntu.com/ubuntu/pool/universe/p/python-pip/python-pip-whl_9.0.1-2_all.deb
sudo dpkg - i * .deb
I'm experiencing extremely long load times anycodings_tensorrt for TensorFlow graphs optimized with anycodings_tensorrt TensorRT. Non-optimized ones load quickly anycodings_tensorrt but loading optimized ones takes over 10 anycodings_tensorrt minutes by the very same code:,And here's the way I convert non-optimized anycodings_tensorrt models to TRT ones:,How to ignore or remove null element from an array inside collection,Tested on ssd_mobilenet_v1_coco, anycodings_tensorrt ssd_mobilenet_v2_coco and anycodings_tensorrt ssd_inception_v2_coco from the model zoo, anycodings_tensorrt all behave it the same way - downloaded pb anycodings_tensorrt file loads in seconds, TRT-optimized - well anycodings_tensorrt over 10 minutes. What's wrong? Has anyone anycodings_tensorrt experienced the same and has any hints how anycodings_tensorrt to fix it?
I'm experiencing extremely long load times anycodings_tensorrt for TensorFlow graphs optimized with anycodings_tensorrt TensorRT. Non-optimized ones load quickly anycodings_tensorrt but loading optimized ones takes over 10 anycodings_tensorrt minutes by the very same code:
trt_graph_def = tf.GraphDef()
with tf.gfile.GFile(pb_path, 'rb') as pf:
trt_graph_def.ParseFromString(pf.read())
I'm on NVIDIA Drive PX 2 device (if that anycodings_tensorrt matters), with TensorFlow 1.12.0 built from anycodings_tensorrt sources, CUDA 9.2 and TensorRT 4.1.1. Due anycodings_tensorrt to the fact that it gets stuck on anycodings_tensorrt ParseFromString() I'm suspecting protobuf so anycodings_tensorrt here's its config:
$ dpkg - l | grep protobuf ii libmirprotobuf3: arm64 0.26 .3 + 16.04 .20170605 - 0 ubuntu1 .1 arm64 Display server for Ubuntu - RPC definitions ii libprotobuf - dev: arm64 2.6 .1 - 1.3 arm64 protocol buffers C++library(development files) ii libprotobuf - lite9v5: arm64 2.6 .1 - 1.3 arm64 protocol buffers C++library(lite version) ii libprotobuf9v5: arm64 2.6 .1 - 1.3 arm64 protocol buffers C++library ii protobuf - compiler 2.6 .1 - 1.3 arm64 compiler for protocol buffer definition files $ pip3 freeze | grep protobuf protobuf == 3.6 .1
And here's the way I convert non-optimized anycodings_tensorrt models to TRT ones:
def get_frozen_graph(graph_file):
""
"Read Frozen Graph file from disk."
""
with tf.gfile.FastGFile(graph_file, "rb") as f:
graph_def = tf.GraphDef()
graph_def.ParseFromString(f.read())
return graph_def
print("Load frozen graph from disk")
frozen_graph = get_frozen_graph(DATA_DIR + MODEL + '.pb')
print("Optimize the model with TensorRT")
trt_graph = trt.create_inference_graph(
input_graph_def = frozen_graph,
outputs = output_names,
max_batch_size = 1,
max_workspace_size_bytes = 1 << 26,
precision_mode = 'FP16',
minimum_segment_size = 2
)
print("Write optimized model to the file")
with open(DATA_DIR + MODEL + '_fp16_trt.pb', 'wb') as f:
f.write(trt_graph.SerializeToString())
OK, I think I got it sorted out. I left anycodings_tensorflow protobuf 2.6.1 almost untouched, just anycodings_tensorflow installed 3.6.1 from sources with cpp anycodings_tensorflow implementation next to it and set the anycodings_tensorflow symlinks in a way that 3.6.1 is the anycodings_tensorflow default one. Now after:
export PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION = cpp
I found that this update tends to break anycodings_tensorflow pip, so simply updated it with:
wget http: //se.archive.ubuntu.com/ubuntu/pool/universe/p/python-pip/python3-pip_9.0.1-2_all.deb
wget http: //se.archive.ubuntu.com/ubuntu/pool/universe/p/python-pip/python-pip-whl_9.0.1-2_all.deb
sudo dpkg - i * .deb
Does anyone experience extremely long load times for TensorFlow frozen graphs optimized with TensorRT? Non-optimized ones load quickly but loading optimized ones takes over 10 minutes by the very same code:,This is confusing because I used Jetpack as the installer and the NVidia provided installers for Tensorflow on those platforms. Why is Nvidia using a setup that results in a suboptimal protobuf version being installed? In my case, the graph loading time went down from ~5 min to ~10 s.,It’s most likely a protobuf issue, please read carefully what I wrote in https://devtalk.nvidia.com/default/topic/1046492/tensorrt/extremely-long-time-to-load-trt-optimized-frozen-tf-graphs/post/5313240/#5313240,Side question: is the way I build TensorRT models described above in build.py (so with TFT-TRT api) the correct one or shall I rather go through UFF? So far I see no improvement in the inference time comparing to the original models; so far I tried on
Does anyone experience extremely long load times for TensorFlow frozen graphs optimized with TensorRT? Non-optimized ones load quickly but loading optimized ones takes over 10 minutes by the very same code:
trt_graph_def = tf.GraphDef()
with tf.gfile.GFile(pb_path, 'rb') as pf:
trt_graph_def.ParseFromString(pf.read())
I’m on Drive PX 2 device, with TensorFlow 1.12.0, CUDA 9.2 and TensorRT 4.1.1.
I’m suspecting protobuf so here’s it’s config:
$ dpkg - l | grep protobuf ii libmirprotobuf3: arm64 0.26 .3 + 16.04 .20170605 - 0 ubuntu1 .1 arm64 Display server for Ubuntu - RPC definitions ii libprotobuf - dev: arm64 2.6 .1 - 1.3 arm64 protocol buffers C++library(development files) ii libprotobuf - lite9v5: arm64 2.6 .1 - 1.3 arm64 protocol buffers C++library(lite version) ii libprotobuf9v5: arm64 2.6 .1 - 1.3 arm64 protocol buffers C++library ii protobuf - compiler 2.6 .1 - 1.3 arm64 compiler for protocol buffer definition files $ pip3 freeze | grep protobuf protobuf == 3.6 .1
Here’s the way I convert non-optimized models to TRT ones:
def get_frozen_graph(graph_file):
""
"Read Frozen Graph file from disk."
""
with tf.gfile.FastGFile(graph_file, "rb") as f:
graph_def = tf.GraphDef()
graph_def.ParseFromString(f.read())
return graph_def
print("Load frozen graph from disk")
frozen_graph = get_frozen_graph(DATA_DIR + MODEL + '.pb')
print("Optimize the model with TensorRT")
trt_graph = trt.create_inference_graph(
input_graph_def = frozen_graph,
outputs = output_names,
max_batch_size = 1,
max_workspace_size_bytes = 1 << 26,
precision_mode = 'FP16',
minimum_segment_size = 2
)
print("Write optimized model to the file")
with open(DATA_DIR + MODEL + '_fp16_trt.pb', 'wb') as f:
f.write(trt_graph.SerializeToString())
mkdir trt_test
cd trt_test
wget http: //download.tensorflow.org/models/object_detection/ssd_mobilenet_v2_coco_2018_03_29.tar.gz
tar xzf ssd_mobilenet_v2_coco_2018_03_29.tar.gz--strip - components = 1 - C. / ssd_mobilenet_v2_coco_2018_03_29 / frozen_inference_graph.pb
mv frozen_inference_graph.pb ssd_mobilenet_v2_coco.pb
python3 build.py
Script content:
# build.py # The script to build TRT - optimized graph from a given non - optimized one import os import tensorflow.contrib.tensorrt as trt import tensorflow as tf DATA_DIR = './' MODEL = 'ssd_mobilenet_v2_coco' TRT_SUFFIX = '_fp16_trt' BOXES_NAME = 'detection_boxes' CLASSES_NAME = 'detection_classes' SCORES_NAME = 'detection_scores' NUM_DETECTIONS_NAME = 'num_detections' output_names = [BOXES_NAME, CLASSES_NAME, SCORES_NAME, NUM_DETECTIONS_NAME] print("------------- Load frozen graph from disk -------------") with tf.gfile.GFile(DATA_DIR + MODEL + '.pb', "rb") as f: graph_def = tf.GraphDef() graph_def.ParseFromString(f.read()) print("------------- Optimize the model with TensorRT -------------") trt_graph = trt.create_inference_graph( input_graph_def = graph_def, outputs = output_names, max_batch_size = 1, max_workspace_size_bytes = 1 << 26, precision_mode = 'FP16', minimum_segment_size = 2 ) print("------------- Write optimized model to the file -------------") with open(DATA_DIR + MODEL + TRT_SUFFIX + '.pb', 'wb') as f: f.write(trt_graph.SerializeToString()) print("------------- DONE! -------------")
And here’s the output from both scripts on my side. I was lucky this time and the TRT-optimized model loaded in just 7 minutes :)
# <b>FROM load.py</b>
------------- Load the TF graph from the pre-build pb file: ./ssd_mobilenet_v2_coco.pb -------------
------------- Load time: 8.19 sec
------------- Load the TF graph from the pre-build pb file: ./ssd_mobilenet_v2_coco_fp16_trt.pb -------------
------------- Load time: 421.29 sec
I’m not seeing the performance issue described above:
root @67ad5eeeaa9d: /home/scratch.zhenyi_sw / repro2490943 / trt_test # python load.py -- -- -- -- -- -- - Load the TF graph from the pre - build pb file: . / ssd_mobilenet_v2_coco.pb-- -- -- -- -- -- - -- -- -- -- -- -- - Load time: 0.17 sec -- -- -- -- -- -- - Load the TF graph from the pre - build pb file: . / ssd_mobilenet_v2_coco_fp16_trt.pb-- -- -- -- -- -- - -- -- -- -- -- -- - Load time: 0.26 sec
I’m using TensorFlow 1.12.0.
root @0760a6daacdc: /home/scratch.zhenyi_sw / repro2490943 # pip show protobuf Name: protobuf Version: 3.6 .1 Summary: Protocol Buffers Home - page: https: //developers.google.com/protocol-buffers/ Author: None Author - email: None License: 3 - Clause BSD License Location: /usr/local / lib / python3 .5 / dist - packages Requires: six, setuptools Required - by: tensorflow - gpu, tensorboard
However, this breaks my OpenCV installation (and I have an app which uses both TensorFlow and OpenCV). Apparently OpenCV’s GTK support is linked with libmirc which is built with protobuf 2.6.1. When I tried using OpenCV initially built before switching the protobuf - it simply segfaults after ‘import cv2’. So I tried rebuilding OpenCV with just protobuf 3.6.1 available - it failed due to missing dependencies for libmirclient. When I put back the libprotobuf-lite.so.9.0.1 (so from v2.6.1) then OpenCV builds fine but fails in runtime with:
>>>
import cv2
[libprotobuf FATAL google / protobuf / stubs / common.cc: 79] This program was compiled against version 2.6 .1 of the Protocol Buffer runtime library, which is not compatible with the installed version(3.6 .1).Contact the program author
for an update.If you compiled the program yourself, make sure that your headers are from the same version of Protocol Buffers as your link - time library.(Version verification failed in "/build/mir-k3D1Zt/mir-0.26.3+16.04.20170605/obj-aarch64-linux-gnu/src/protobuf/mir_protobuf.pb.cc".)
terminate called after throwing an instance of 'google::protobuf::FatalException'
what(): This program was compiled against version 2.6 .1 of the Protocol Buffer runtime library, which is not compatible with the installed version(3.6 .1).Contact the program author
for an update.If you compiled the program yourself, make sure that your headers are from the same version of Protocol Buffers as your link - time library.(Version verification failed in "/build/mir-k3D1Zt/mir-0.26.3+16.04.20170605/obj-aarch64-linux-gnu/src/protobuf/mir_protobuf.pb.cc".)
Aborted(core dumped)
OK, I think I got it sorted out. I left protobuf 2.6.1 almost untouched, just installed 3.6.1 next to it and set the symlinks in a way that 3.6.1 is the default one. I rebuilt OpenCV with the following options:
-D WITH_PROTOBUF = OFF\ - D BUILD_PROTOBUF = OFF\ - D PROTOBUF_UPDATE_FILES = OFF\
and everything seems fine. After:
export PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION = cpp