-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Batch size not changing #1904
Comments
Hello @aurelien-m , for the performance issue, according to pytorch documentation, we need insert event to measure the performance, else we only get the API launch latency, but not the GPU workload, see https://pytorch.org/docs/stable/notes/cuda.html#asynchronous-execution Thanks! |
Hey, thanks for trying to help me out! I think I was timing it incorrectly, and now it's indeed much better. I was also using the wrong batch size when initially converting to a ONNX model, my bad about that too. However, I tried using a dynamic batch size but I'm having troubles getting it to work. I'm looking at this documentation. I built my ONNX model using: torch.onnx.export(
retinanet,
example, # (2, 3, 1024, 1024)
onnx_model_path,
verbose=False,
opset_version=11,
input_names=["input"],
dynamic_axes={
"input": {0: "batch_size"},
"output": {0: "batch_size"}
}
) And i'm building my TensorRT engine like this: onnx_model = onnx.load(onnx_model_path)
onnx.checker.check_model(onnx_model)
builder = trt.Builder(TRT_LOGGER)
builder.max_batch_size = batch_size
network = builder.create_network(
1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH))
parser = trt.OnnxParser(network, TRT_LOGGER)
network.add_input("input", trt.float32, (-1, 3, 1024, 1024)) # crashes here
success = parser.parse_from_file(onnx_model_path)
for idx in range(parser.num_errors):
print(parser.get_error(idx))
if not success:
raise RuntimeError('Failed to parse the ONNX file') # not able to read the model
config = builder.create_builder_config()
config.max_workspace_size = 1 << 30
config.flags = 1 << int(trt.BuilderFlag.FP16)
# config.set_memory_pool_limit(trt.MemoryPoolType.WORKSPACE, 1 << 30)
profile = builder.create_optimization_profile()
profile.set_shape("input", (1, 3, 1024, 1024), (2, 3, 1024, 1024), (2, 3, 1024, 1024))
config.add_optimization_profile(profile)
serialized_engine = builder.build_serialized_network(network, config)
with open(f'{PROJECT_PATH}/models/{model_filename}.engine', 'wb') as f:
f.write(serialized_engine) But I'm getting:
I'm not sure what I missed here |
Hello @aurelien-m , sorry for the late response. |
Hey, thanks for the help. I'm still looking at this documentation because I want to implement a dynamic batch size. If I'm not wrong, the example you have given doesn't implement a dynamic batch size. Now, I removed this line: network.add_input("input", trt.float32, (-1, 3, 1024, 1024)) And I'm now able to build the engine successfully. I also tried building with: I'm using the following code to load my engine: torch_stream = torch.cuda.Stream()
runtime = trt.Runtime(TRT_LOGGER)
with open(tensorrt_model_path, "rb") as f:
serialized_engine = f.read()
engine = runtime.deserialize_cuda_engine(serialized_engine)
context = engine.create_execution_context()
batch_size = 1 # changing it here, 2 works fine (which is what I've used for the max and optimal shape)
if -1 in engine.get_binding_shape(input_index):
context.set_optimization_profile_async(0, torch_stream.cuda_stream)
context.set_binding_shape(0, (batch_size, 3, 1024, 1024))
buffers = ... But when inferering: torch.cuda.synchronize()
...
buffers[input_idenx] = torch.stack(batch).data_ptr()
context.execute_async_v2(buffers, torch_stream.cuda_stream) When using
Any ideas? 😄 |
After looking at my network, I think it's just because my network is not compatible with dynamic batch sizes... I'm using RetinaNet and bypassing some layers brake the dynamic batch size feature. I'm closing it because it's not a problem from TensoRT but from the network and the ONNX graph. |
Description
I'm trying to convert a RetinaNet model taken from torchvision, but I'm unable to use it with a batch size higher than 1. For a set batch size of 2, here is what my output looks like (
batch_size
is 2):Output:
With TensoRT engine:
Output:
I'm also getting worst results on TensoRT than a typical PyTorch inference:
Any ideas? 😄
Environment
TensorRT Version: 8.2.1.8
NVIDIA GPU: Jetson Xavier NX
NVIDIA Driver Version:
CUDA Version: 10.2.300
CUDNN Version: 8.2.1.32
Operating System: Jetpack L4T 32.7.1
Python Version (if applicable): 3.6.9
PyTorch Version (if applicable): 1.10.0
The text was updated successfully, but these errors were encountered: