Add ViT Attention Plugin Support for Qwen, Mllama, and SigLIP Visual Models by micwill755 · Pull Request #4241 · pytorch/TensorRT

micwill755 · 2026-05-08T18:33:16Z

Summary

This PR adds ViT attention plugin integration and validation support to the TensorRT Dynamo examples/tooling path. It wires ViTAttentionPlugin conversion through the Torch-TensorRT/Dynamo flow, supports Qwen-style packed/windowed attention metadata via cu_seqlens and max_seq_len, and adds end-to-end visual model validation for Qwen2.5-VL, Llama 3.2 Vision/Mllama, and GR00T/Eagle/SigLIP-style models.

Changes

Added ViT attention plugin converter support.
Added mask_type and max_seq_len plugin field propagation.
Added support for dense additive masks and packed cu_seqlens inputs.
Added Qwen2.5-VL visual attention handling with window attention metadata.
Added support for Qwen2.5-VL head_dim=80.
Updated plugin loading to locate the TensorRT-Edge-LLM plugin .so.
Updated standalone ViT attention plugin example for correctness validation.
Added/updated end-to-end ViT visual model example covering:
- Qwen2.5-VL
- Llama 3.2 Vision/Mllama
- GR00T/Eagle/SigLIP-style visual model

Testing

Ran ViT attention plugin correctness example.
Ran end-to-end ViT attention plugin visual benchmark.
Confirmed output shapes match PyTorch references across tested visual models.
Validated Qwen2.5-VL visual path through Torch-TensorRT/Dynamo with the custom ViT attention plugin.
Validated Llama 3.2 Vision/Mllama visual path through Torch-TensorRT/Dynamo.
Validated GR00T/Eagle/SigLIP-style visual path through Torch-TensorRT/Dynamo.

narendasan

Are we able to use: https://huggingface.co/docs/transformers/v5.5.0/en/serialization#exporting-to-production to avoid as much patching on the model side?

…into TensorRT-Edge-LLM under vitAttentionKernels

narendasan

How does the new plugin operator get inserted into the graph?

narendasan · 2026-05-12T19:53:49Z

    position_ids = torch.arange(input_embeds.shape[1]).unsqueeze(0).to(device)

    use_fp32_acc = False
+    use_explicit_typing = False


Enabled precision is deprecated in TRT 10.16 and will be removed in the next version so we dont need this code path

Will do. I’ll clean this up by removing.

micwill755 · 2026-05-13T01:19:23Z

How does the new plugin operator get inserted into the graph?

It follows the same pattern as the existing AttentionPlugin integration.

At a high level, we insert a Torch custom op into the Dynamo graph by wrapping/replacing the model attention module. That custom op is only a graph marker on the PyTorch side. During Torch-TensorRT conversion, the registered converter lowers that marker to the real TensorRT plugin layer by looking up the plugin creator and calling add_plugin_v2.

PyTorch model attention module
replace_attention_with_plugin(...)
Wrapper calls a registered torch custom op
Dynamo captures that op in the FX graph
Torch-TensorRT converter sees the custom op
converter creates TensorRT AttentionPlugin
TRT network gets a plugin layer via add_plugin_v2(...)

… compile the vision tower through the ViT plugin path, compile the LM separately, insert both back into the VLM structure, and generation succeeds with sensible output. We also verified the vision path at several levels: reconstructed PyTorch visual matches direct HF visual, individual attention/plugin checks pass, and semantic generation is mostly aligned with small FP16 drift

model agnostic vit - tested Qwen, Groot, llama Vision on RTX 5090

62c2b54

meta-cla Bot added the cla signed label May 8, 2026

narendasan reviewed May 8, 2026

View reviewed changes

micwill755 added 3 commits May 9, 2026 17:35

Refactored end to end vit

58e4c16

Added fall back for GROOT

2dd02dc

Tested Generated ViT FMHA cubins from TensorRT-LLM and embedded them …

d8aecc9

…into TensorRT-Edge-LLM under vitAttentionKernels

micwill755 changed the title ~~model agnostic vit - tested Qwen, Groot, llama Vision on RTX 5090~~ Add ViT Attention Plugin Support for Qwen, Mllama, and SigLIP Visual Models May 11, 2026

narendasan reviewed May 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ViT Attention Plugin Support for Qwen, Mllama, and SigLIP Visual Models#4241

Add ViT Attention Plugin Support for Qwen, Mllama, and SigLIP Visual Models#4241
micwill755 wants to merge 5 commits into
pytorch:mainfrom
micwill755:4108-vit

micwill755 commented May 8, 2026 •

edited

Loading

Uh oh!

narendasan left a comment

Uh oh!

narendasan left a comment

Uh oh!

narendasan May 12, 2026

Uh oh!

micwill755 May 13, 2026

Uh oh!

micwill755 commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

micwill755 commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

narendasan left a comment

Choose a reason for hiding this comment

Uh oh!

narendasan left a comment

Choose a reason for hiding this comment

Uh oh!

narendasan May 12, 2026

Choose a reason for hiding this comment

Uh oh!

micwill755 May 13, 2026

Choose a reason for hiding this comment

Uh oh!

micwill755 commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

micwill755 commented May 8, 2026 •

edited

Loading