Track Purity DNN for Phase-2 HLT by jchismar · Pull Request #51084 · cms-sw/cmssw

jchismar · 2026-05-28T16:44:07Z

Implementation of a track purity DNN used for high purity selection for HLT tracks. Initial results were presented at the tracking POG meeting on 15 Dec 2025. Since then, the model has been retrained with the latest version of LST, and a separate threshold has been implemented for displaced tracks (|dxy| > 0.5) to improve displaced track efficiency. This threshold is set at a target recall of 99.5% calculated on tracks with |dxy| > 0.5. For tracks with |dxy| $\le$ 0.5, the threshold is set at a target recall of 99.5% calculated on all tracks. Additionally, the number of input features has been reduced from 29 to 15 with no loss of performance. The DNN is configured to run in the HLTInitialStepSequence after the hltInitialStepTracks step when the trackTorchClassifier procModifier is used.

MTV performance on TT+PU=200 is shown below.

Co-authored-by: Jade Chismar <jchismar@ucsd.edu>

cmsbuild · 2026-05-28T16:44:31Z

cms-bot internal usage

cmsbuild · 2026-05-28T16:46:55Z

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-51084/49552

There are other open Pull requests which might conflict with changes you have proposed:
- File RecoTracker/FinalTrackSelectors/BuildFile.xml modified in PR(s): [NGT] DNN for HP Pixel track selection #51042
- File RecoTracker/FinalTrackSelectors/plugins/BuildFile.xml modified in PR(s): [NGT] DNN for HP Pixel track selection #51042, [RECONSTRUCTION] Drop Geometry/CommonDetUnit package #51076

cmsbuild · 2026-05-28T16:47:23Z

A new Pull Request was created by @jchismar for master.

It involves the following packages:

Configuration/ProcessModifiers (operations)
HLTrigger/Configuration (hlt)
RecoTracker/FinalTrackSelectors (reconstruction)

@Martin-Grunewald, @Moanwar, @cmsbuild, @davidlange6, @fabiocos, @ftenchini, @jfernan2, @mandrenguyen, @mmusich, @srimanob can you please review it and eventually sign? Thanks.
@GiacomoSguazzoni, @Martin-Grunewald, @SohamBhattacharya, @VinInn, @VourMa, @dgulhan, @elusian, @fabiocos, @felicepantaleo, @gpetruc, @makortel, @missirol, @mmasciov, @mmusich, @mtosi, @rovere this is something you requested to watch as well.
@ftenchini, @mandrenguyen, @sextonkennedy you are the release manager for this.

cms-bot commands are listed here

Moanwar · 2026-05-28T17:04:20Z

Hi @jchismar , thanks, which workflows needed to test this PR ?

mmusich · 2026-05-28T17:05:36Z

for the record, the needed model is at cms-data/RecoTracker-FinalTrackSelectors#15 (it wold be nice to link the two)

mmusich · 2026-05-28T17:06:24Z

+import FWCore.ParameterSet.Config as cms
+
+# This modifier sets the use of a deep neural network for high purity track selection  
+trackTorchClassifier = cms.Modifier()


for my understanding why is this proposed via a modifier and not directly in the "production" workflow?

mmusich · 2026-05-28T17:07:02Z

+HLTInitialStepHPSelectionSequence = cms.Sequence(
+    hltInitialStepTrackCutClassifier
+    +hltInitialStepTrackSelectionHighPurity
+)


nit: missing newline.

mmusich · 2026-05-28T17:07:14Z

+    +hltInitialStepTrackTorchClassifierOutput 
+    +hltInitialStepTrackCutClassifier
+    +hltInitialStepTrackSelectionHighPurity
+)


nit: missing newline.

mmusich · 2026-05-28T17:10:34Z

@jchismar what is the cost in terms of timing and GPU memory consumption of these developments?
Given they are not run in the "production" workflow we cannot test it via the bot.

slava77 · 2026-05-28T17:39:08Z

the modifier solution was mainly motivated by much earlier interpretation that PyTorchAlpaka carries a significant memory cost (a significant fraction of 1 GB). It sounds from the pixel track DNN that the cost is much smaller.

The timing costs were rather small (○ Adds ~7ms (on GPU, ~20ms on CPU) to the HLT timing. from https://indico.cern.ch/event/1688301/#5-round-robin-talk-on-dpgpog-s)

@mmusich
to minimize the edits I think it would be practical to add a (temporary) commit in

cmssw/Configuration/Eras/python/Era_Phase2_cff.py

Lines 24 to 25 in 2cc100f

    
           Phase2 = cms.ModifierChain(Run3_noMkFit.copyAndExclude([phase1Pixel,trackingPhase1,seedingDeepCore,displacedRegionalTracking,ctpps_2022,dd4hep]),  
        
                                      phase2_common, phase2_tracker, trackingPhase2PU140, phase2_ecal, phase2_hcal, phase2_hgcal, phase2_muon, phase2_GEM, hcalHardcodeConditions, phase2_timing, phase2_timing_layer, phase2_trigger, trackingMkFitProdPhase2)

and add the modifier here.
Once the tests run we can decide if it's OK to move on for production or keep as a modifier

mmusich · 2026-05-28T18:23:44Z

The timing costs were rather small (○ Adds ~7ms (on GPU, ~20ms on CPU) to the HLT timing. from https://indico.cern.ch/event/1688301/#5-round-robin-talk-on-dpgpog-s)

Thanks @slava77

to minimize the edits I think it would be practical to add a (temporary) commit in ...
and add the modifier here.
Once the tests run we can decide if it's OK to move on for production or keep as a modifier

FWIW, that is fine with me.

cmsbuild · 2026-06-04T23:22:12Z

Pull request #51084 was updated. @Martin-Grunewald, @Moanwar, @cmsbuild, @davidlange6, @fabiocos, @ftenchini, @jfernan2, @mandrenguyen, @mmusich, @srimanob can you please check and sign again.

fwyzard · 2026-06-05T02:52:54Z

+
+namespace ALPAKA_ACCELERATOR_NAMESPACE {
+
+  class TrackFeatureExtractor : public stream::FixedQueueEDProducer<> {


Why FixedQueueEDProducer ?

fwyzard · 2026-06-05T02:54:14Z

+      // Create device collection and copy from host
+      TrackFeaturesDeviceCollection features_device(iEvent.queue(), nTracks);
+      alpaka::memcpy(iEvent.queue(), features_device.buffer(), features_host.const_buffer());


Rather than making the device copy explicitly, it's usually better to let the framework take care of that.

Simply producing the host collection (and making sure the definition for the device collection is available) should be enough.

This avoids making an extra copy when running on the CPU.

Rather than making the device copy explicitly, it's usually better to let the framework take care of that.

doesn't this proposal increase the CPU memory use when running on a GPU backend?
Here the CPU/host side disappears after the module is done with ::produce; in the other case it stays until the event is reset.
Is it practical to ifdef here and bypass the copy on the CPU backend?

fwyzard · 2026-06-05T02:57:19Z

@@ -170,6 +171,7 @@
 fragment.load("HLTrigger/Configuration/HLT_75e33/psets/seedFromProtoTracks_cfi")
 fragment.load("HLTrigger/Configuration/HLT_75e33/psets/SiStripClusterChargeCutLoose_cfi")
 fragment.load("HLTrigger/Configuration/HLT_75e33/psets/SiStripClusterChargeCutNone_cfi")
+fragment.load("HLTrigger/Configuration/HLT_75e33/services/PyTorchService_cfi")


Wouldn't it be better to move this together with the other services ?

fwyzard · 2026-06-05T03:02:41Z

This threshold is set at a target recall of 99.5% calculated on tracks with |dxy| > 0.5.

What does "recall" mean in this context?

fwyzard · 2026-06-05T03:06:40Z

Are there plans to make the input features available directly on GPU?

Otherwise, if it is expected that the input features will always be available only on CPU, would it make more sense to merge the three modules into one?

slava77 · 2026-06-05T12:48:23Z

This threshold is set at a target recall of 99.5% calculated on tracks with |dxy| > 0.5.

What does "recall" mean in this context?

recall (TP/(TP+FN)) is what we/HEP call "efficiency"

slava77 · 2026-06-05T13:23:02Z

Are there plans to make the input features available directly on GPU?

unlikely in the context of the full/final tracks ( or not for a long while); this implies having full track fit to run on GPU.
Also, the goal is to try this in the offline with minimal changes.
There may still be a benefit to try the same scoring for output tracks already on GPU.

Otherwise, if it is expected that the input features will always be available only on CPU, would it make more sense to merge the three modules into one?

"modularity" seems to be an answer to motivate staying with 3 modules here.

The TrackTorchClassifierFromSoA implemented here is now more suitable for HLT (only HP tracks selected/passed). For the offline use multiple score flags and no full track copy is more appropriate (although that's probably resolvable by adding produceFilteredTracks and dealing with score->purity flag conversion later; but that seems like a premature optimization).

slava77 · 2026-06-05T13:25:23Z

@cmsbuild please test

mmusich · 2026-06-05T15:24:22Z

Tests have not fully completed yet, but I see:

the new module hltInitialStepTrackTorchClassifier takes 217ms on GPU, and 13ms on CPU

This PR jchismar@`93e8c78` (on GPU backend)	This PR jchismar@`93e8c78` (on CPU backend)

The memory profile is peculiar, both on CPU and GPU summary

This PR jchismar@`93e8c78` (GPU memory)	This PR jchismar@`93e8c78` (CPU memory)

I wonder if:

the timing of the module on GPU is expected and if not, would it make sense to enforce the CPU backend using the alpaka_serial_sync:: version of the module (until this is resolved)
the "spiky" behaviour at the beginning of the job could be mitigated in the same way that @EmanueleCoradin did at 7893376.

?

cmsbuild · 2026-06-05T16:17:05Z

+1

Size: This PR adds an extra 16KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-983f0c/53692/summary.html
COMMIT: 93e8c78
CMSSW: CMSSW_17_0_X_2026-06-05-1100/el8_amd64_gcc13
Additional Tests: HLT_P2_INTEGRATION,HLT_P2_TIMING
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/51084/53692/install.sh to create a dev area with all the needed externals and cmssw changes.

HLT P2 Timing: chart

Comparison Summary

Summary:

You potentially added 19 lines to the logs
ROOTFileChecks: Some differences in event products or their sizes found
Reco comparison results: 2 differences found in the comparisons
DQMHistoTests: Total files compared: 69
DQMHistoTests: Total histograms compared: 4949314
DQMHistoTests: Total failures: 14916
DQMHistoTests: Total nulls: 5
DQMHistoTests: Total successes: 4934373
DQMHistoTests: Total skipped: 20
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 68 files compared)
Checked 291 log files, 251 edm output root files, 69 DQM output files
TriggerResults: found differences in 16 / 67 workflows

Max Memory Comparisons exceeding threshold

@cms-sw/core-l2 , I found 20 workflow step(s) with memory usage exceeding the error threshold:

Expand to see workflows ...

Error: Workflow 34434.0_TTbar_14TeV+Run4D121 step2 max memory diff 61.6 exceeds +/- 30.0 MiB
Error: Workflow 34434.75_TTbar_14TeV+Run4D121_HLT75e33Timing step2 max memory diff 61.7 exceeds +/- 30.0 MiB
Error: Workflow 34434.7501_TTbar_14TeV+Run4D121_HLT75e33TrackingOnly step2 max memory diff 61.7 exceeds +/- 30.0 MiB
Error: Workflow 34434.7502_TTbar_14TeV+Run4D121_HLT75e33TrackingNtuple step2 max memory diff 61.7 exceeds +/- 30.0 MiB
Error: Workflow 34434.7503_TTbar_14TeV+Run4D121_HLTHeterogeneousValid step2 max memory diff 61.6 exceeds +/- 30.0 MiB
Error: Workflow 34434.751_TTbar_14TeV+Run4D121_HLT75e33TimingAlpaka step2 max memory diff 61.7 exceeds +/- 30.0 MiB
Error: Workflow 34434.7521_TTbar_14TeV+Run4D121_HLT75e33TimingTiclV5TrackLinkGNN step2 max memory diff 61.8 exceeds +/- 30.0 MiB
Error: Workflow 34434.753_TTbar_14TeV+Run4D121_HLT75e33TimingLegacyTracking step2 max memory diff 61.7 exceeds +/- 30.0 MiB
Error: Workflow 34434.754_TTbar_14TeV+Run4D121_HLT75e33TimingLegacyTrackingPatatrackQuads step2 max memory diff 61.6 exceeds +/- 30.0 MiB
Error: Workflow 34434.755_TTbar_14TeV+Run4D121_HLT75e33TimingLST step2 max memory diff 61.7 exceeds +/- 30.0 MiB
Error: Workflow 34434.756_TTbar_14TeV+Run4D121_HLT75e33TimingTrimmedTracking step2 max memory diff 61.7 exceeds +/- 30.0 MiB
Error: Workflow 34434.757_TTbar_14TeV+Run4D121_HLT75e33TimingMkFitFit step2 max memory diff 63.7 exceeds +/- 30.0 MiB
Error: Workflow 34434.758_TTbar_14TeV+Run4D121_HLT75e33TimingTiclBarrel step2 max memory diff 61.7 exceeds +/- 30.0 MiB
Error: Workflow 34434.759_TTbar_14TeV+Run4D121_HLTPhase2WithNano step2 max memory diff 61.7 exceeds +/- 30.0 MiB
Error: Workflow 34434.7591_TTbar_14TeV+Run4D121_HLTPhase2WithNanoValid step2 max memory diff 61.7 exceeds +/- 30.0 MiB
Error: Workflow 34434.775_TTbar_14TeV+Run4D121_NGTScoutingCAExtensionMergeT5 step2 max memory diff 51.5 exceeds +/- 30.0 MiB
Error: Workflow 34434.911_TTbar_14TeV+Run4D121_DD4hep step2 max memory diff 61.7 exceeds +/- 30.0 MiB
Error: Workflow 34496.0_CloseByPGun_CE_E_Front_120um+Run4D121 step2 max memory diff 61.4 exceeds +/- 30.0 MiB
Error: Workflow 34500.0_CloseByPGun_CE_H_Coarse_Scint+Run4D121 step2 max memory diff 61.4 exceeds +/- 30.0 MiB
Error: Workflow 34634.999_TTbar_14TeV+Run4D121PU_PMXS1S2PR step3 max memory diff 57.9 exceeds +/- 30.0 MiB

cmsbuild · 2026-06-05T16:38:03Z

Milestone for this pull request has been moved to CMSSW_20_0_X. Please open a backport if it should also go in to CMSSW_17_0_X.

slava77 · 2026-06-05T17:21:37Z

I wonder if:

* the timing of the module on GPU is expected and if not, would it make sense to enforce the CPU backend using the `alpaka_serial_sync::` version of the module (until this is resolved)

not expected. This same setup running in a single job up to 32 threads/streams on L4 GPU runs much faster, 10-20 ms (@jchismar has numbers)

mmusich · 2026-06-05T17:46:19Z

not expected. This same setup running in a single job up to 32 threads/streams on L4 GPU runs much faster, 10-20 ms

do we have a timing server based measurement that can be verified by experts?

slava77 · 2026-06-05T17:54:52Z

do we have a timing server based measurement that can be verified by experts?

does the timing server accept cms-sw and cms-data modifications for a test job submission?

mmusich · 2026-06-05T18:01:00Z

does the timing server accept cms-sw and cms-data modifications for a test job submission?

yes, feel free to follow-up in the timing @ HLT mattermost channel for details.

makortel · 2026-06-05T18:29:45Z

I assume this PR does not need a backport to 17_0_X (Run 3 legacy). Although if you want to continue running the benchmarks as part of the PR tests, one option (until 20_0_0_pre1 RelVal samples arrive) would be to continue testing in 17_0_X.

mmusich · 2026-06-06T07:35:59Z

not expected. This same setup running in a single job up to 32 threads/streams on L4 GPU runs much faster, 10-20 ms

for the record, I manually repeated the benchmark on one node in the NGT farm equipped with 4 L40s cards, using 16 jobs, 16 streams and 16 threads, that pretty much confirms the findings from the bot:

This PR jchismar@93e8c78 (on GPU backend): link, TrackTorchClassifierAlpaka@alpaka takes 149.4 ms
This PR jchismar@93e8c78 (on CPU backend): link TrackTorchClassifierAlpaka@alpaka takes 10.2ms

I wonder how the benchmark mentioned above was carried out.

Add pytorch model for phase-2 hlt track classification

4986d26

Co-authored-by: Jade Chismar <jchismar@ucsd.edu>

cmsbuild added this to the CMSSW_17_0_X milestone May 28, 2026

cmsbuild added reconstruction-pending hlt-pending operations-pending pending-signatures tests-pending orp-pending code-checks-pending tracking changes-dataformats labels May 28, 2026

cmsbuild added code-checks-approved and removed code-checks-pending labels May 28, 2026

mmusich reviewed May 28, 2026

View reviewed changes

This was referenced May 28, 2026

[NGT] DNN for HP Pixel track selection #51042

Draft

Rpc digi dev v9 #47447

Open

jchismar added 2 commits June 1, 2026 11:17

Add missing newlines

4026bba

Add modifier to Phase 2 modifier chain

21877eb

cmsbuild added code-checks-pending and removed code-checks-approved labels Jun 1, 2026

cmsbuild added code-checks-approved and removed code-checks-pending labels Jun 4, 2026

fwyzard reviewed Jun 5, 2026

View reviewed changes

cmsbuild added tests-started and removed tests-pending labels Jun 5, 2026

cmsbuild added operations-approved tests-approved and removed operations-pending tests-started labels Jun 5, 2026

cmsbuild modified the milestones: CMSSW_17_0_X, CMSSW_20_0_X Jun 5, 2026


		namespace ALPAKA_ACCELERATOR_NAMESPACE {

		class TrackFeatureExtractor : public stream::FixedQueueEDProducer<> {

Conversation

jchismar commented May 28, 2026

Uh oh!

cmsbuild commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cmsbuild commented May 28, 2026

Uh oh!

cmsbuild commented May 28, 2026

Uh oh!

Moanwar commented May 28, 2026

Uh oh!

mmusich commented May 28, 2026

Uh oh!

mmusich May 28, 2026

Choose a reason for hiding this comment

Uh oh!

mmusich May 28, 2026

Choose a reason for hiding this comment

Uh oh!

mmusich May 28, 2026

Choose a reason for hiding this comment

Uh oh!

mmusich commented May 28, 2026

Uh oh!

slava77 commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mmusich commented May 28, 2026

Uh oh!

cmsbuild commented Jun 4, 2026

Uh oh!

fwyzard Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

fwyzard Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

slava77 Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

fwyzard Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

fwyzard commented Jun 5, 2026

Uh oh!

fwyzard commented Jun 5, 2026

Uh oh!

slava77 commented Jun 5, 2026

Uh oh!

slava77 commented Jun 5, 2026

Uh oh!

slava77 commented Jun 5, 2026

Uh oh!

mmusich commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cmsbuild commented Jun 5, 2026

Comparison Summary

Max Memory Comparisons exceeding threshold

Uh oh!

cmsbuild commented Jun 5, 2026

Uh oh!

slava77 commented Jun 5, 2026

Uh oh!

mmusich commented Jun 5, 2026

Uh oh!

slava77 commented Jun 5, 2026

Uh oh!

mmusich commented Jun 5, 2026

Uh oh!

makortel commented Jun 5, 2026

Uh oh!

mmusich commented Jun 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

cmsbuild commented May 28, 2026 •

edited

Loading

slava77 commented May 28, 2026 •

edited

Loading

mmusich commented Jun 5, 2026 •

edited

Loading