Fix TensorRT runtime input buffer lifetimes by SandSnip3r · Pull Request #4247 · pytorch/TensorRT

SandSnip3r · 2026-05-11T20:55:29Z

Description

This fixes a C++ runtime lifetime issue where temporary formatted input buffers, especially .contiguous() copies of non-contiguous inputs, could be destroyed before TensorRT finished using the bound input addresses. In the failing case, the CUDA caching allocator could then reuse that freed input storage for an output buffer with the same shape and dtype, causing input/output aliasing and large numerical corruption in monolithic TRT engines such as FLUX.2-klein-9B.

The fix moves per-execution input and shape-tensor storage onto the TRTEngine so the buffers remain alive through enqueueV3, records CUDA stream usage for active input tensors, and clears the retained references only after execution has been launched. This also makes the runtime behavior more robust for non-contiguous inputs across the standard C++ runtime path, CUDA graphs, and output allocator mode.

Added regression coverage exercises a non-contiguous bf16 input whose output has the same shape, matching the allocator/lifetime pattern that exposed the original bug. Verification included the new runtime test plus the FLUX transformer numerical repro and full image-generation path.

Type of change

Bug fix

Checklist:

My code follows the style guidelines of this project (You can use the linters)
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas and hacks
I have made corresponding changes to the documentation
I have added tests to verify my fix or my feature
New and existing unit tests pass locally with my changes
I have added the relevant labels to my PR in so that relevant reviewers are notified

narendasan

I think this makes sense to me to use the engine object to manage lifetimes. We might want to make the active buffer management functions methods of the engine object.

Also it would be good to make sure the rename works correctly with IOutputAllocator / pre_allocated_outputs which I think reuse some of the cudagraphs system

narendasan · 2026-05-11T23:15:43Z

    uint64_t outputs = _out_binding_names.size();
    out_binding_names.resize(outputs);
-    output_buffers.resize(outputs);
+    cudagraph_output_staging_buffers.resize(outputs);


Does this system also get used for the pre_allocated outputs / output allocator?

narendasan · 2026-05-11T23:17:47Z

+  compiled_engine->active_shape_tensor_values.clear();
+}
+
+void reset_active_input_tensors(c10::intrusive_ptr<TRTEngine> compiled_engine) {


Do you want to just make this a method of the engine object. I dont think it needs to get lifted into python or anything

narendasan · 2026-05-11T23:17:57Z

  return false;
 }

+void clear_active_input_tensors(c10::intrusive_ptr<TRTEngine> compiled_engine) {


SandSnip3r · 2026-05-11T23:33:27Z

Also it would be good to make sure the rename works correctly with IOutputAllocator / pre_allocated_outputs which I think reuse some of the cudagraphs system

I think that path uses compiled_engine->output_allocator->getBuffers() which is not going to use these cuda graph staging buffers.

…ge for non-contiguous inputs aliasing matching-shape outputs.

meta-cla Bot added the cla signed label May 11, 2026

github-actions Bot added component: tests Issues re: Tests component: core Issues re: The core compiler component: runtime labels May 11, 2026

github-actions Bot requested a review from cehongwang May 11, 2026 20:55

SandSnip3r changed the title ~~Fix TensorRT runtime input buffer lifetimes and add regression covera…~~ Fix TensorRT runtime input buffer lifetimes May 11, 2026

SandSnip3r force-pushed the fix-runtime-buffer-lifetime-v2 branch 4 times, most recently from e887772 to 5c38ea1 Compare May 11, 2026 23:08

narendasan reviewed May 11, 2026

View reviewed changes

SandSnip3r force-pushed the fix-runtime-buffer-lifetime-v2 branch from 5c38ea1 to d8fa084 Compare May 11, 2026 23:33

Fix TensorRT runtime input buffer lifetimes and add regression covera…

f062cec

…ge for non-contiguous inputs aliasing matching-shape outputs.

SandSnip3r force-pushed the fix-runtime-buffer-lifetime-v2 branch from d8fa084 to f062cec Compare May 12, 2026 00:05

narendasan added the needs-release-cherrypick label May 12, 2026

github-actions Bot requested a review from lanluo-nvidia May 12, 2026 17:06

SandSnip3r mentioned this pull request May 12, 2026

Fix tensor lifetime issue #4228

Closed

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix TensorRT runtime input buffer lifetimes#4247

Fix TensorRT runtime input buffer lifetimes#4247
SandSnip3r wants to merge 1 commit into
pytorch:mainfrom
SandSnip3r:fix-runtime-buffer-lifetime-v2

SandSnip3r commented May 11, 2026 •

edited

Loading

Uh oh!

narendasan left a comment

Uh oh!

narendasan May 11, 2026

Uh oh!

SandSnip3r May 11, 2026

Uh oh!

narendasan May 11, 2026

Uh oh!

SandSnip3r May 11, 2026

Uh oh!

narendasan May 11, 2026

Uh oh!

SandSnip3r commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

SandSnip3r commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Checklist:

Uh oh!

narendasan left a comment

Choose a reason for hiding this comment

Uh oh!

narendasan May 11, 2026

Choose a reason for hiding this comment

Uh oh!

SandSnip3r May 11, 2026

Choose a reason for hiding this comment

Uh oh!

narendasan May 11, 2026

Choose a reason for hiding this comment

Uh oh!

SandSnip3r May 11, 2026

Choose a reason for hiding this comment

Uh oh!

narendasan May 11, 2026

Choose a reason for hiding this comment

Uh oh!

SandSnip3r commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

SandSnip3r commented May 11, 2026 •

edited

Loading