Add coreml_compute_plan.py: report which CoreML ops dispatch to ANE / GPU / CPU by john-rocky · Pull Request #19252 · pytorch/executorch

john-rocky · 2026-05-01T05:53:25Z

Summary

CoreML decides at compile/load time which device each MIL operation will
execute on, and coremltools 9.0+ exposes that through MLComputePlan.
The recurring question on the issue tracker is "why isn't my model
running fully on the ANE?" — for example:

llama model is not fully lowered to ANE (coreml backend) #4091 — llama model is not fully lowered to ANE
CoreML model is crashing on iPhone GPU, but not on iPhone CPU or macOS GPU #11541 — CoreML model is crashing on iPhone GPU, but not on iPhone CPU or macOS GPU
ANE compile OOMs on certain input shapes #8439 — ANE compile OOMs on certain input shapes
CPU Overhead After ANE Execution #8445 — CPU Overhead After ANE Execution

Today the only way for an ExecuTorch user to answer it is to break out
Swift / Xcode. This PR adds a Python wrapper around MLComputePlan so
the answer is one shell command:

$ python coreml_compute_plan.py --model_path my_model.mlpackage \
      --compute_units cpu_and_ne --show_non_ane

=== my_model.mlpackage ===
  ANE:   412 / 480 ( 85.8%)
  CPU:    68 / 480 ( 14.2%)

  Non-ANE op types:
       32  ios17.cast
       18  ios17.gather
       12  ios17.reshape
        6  ios17.constexpr_blockwise_shift_scale

Inputs supported:

Input	Behavior
`.pte`	Extract every Core ML partition into a tempdir, then analyze each.
`.mlpackage`	Compile to `.mlmodelc` in a tempdir, then analyze.
`.mlmodelc`	Analyze directly.

The PTE path reuses the same JSON/named-data extraction logic that
extract_coreml_models.py uses, and is inlined into the script so it can
be run against a plain CoreML model without depending on the executorch
package.

Test plan

Added test_coreml_compute_plan.py covering:

_device_name(...) for None and a stub MLNeuralEngineComputeDevice.
_COMPUTE_UNIT_CHOICES mapping (cpu_and_ne / all).
analyze_one(...) end-to-end on a tiny relu(x @ x.T) + x.sum()
mlpackage built with coremltools.convert(...): returns rows for
every dispatched op, with a main function and the expected MIL op
types (matmul, relu, add, reduce_sum).

$ python -m pytest examples/apple/coreml/scripts/test_coreml_compute_plan.py -v
============================== 7 passed in 3.68s ===============================

I also ran the script against a few hand-built .mlpackage and
.mlmodelc files on macOS 26 with coremltools 9.0 and verified the
output matches what MLComputePlan returns directly.

Authored with Claude.

cc @kimishpatel @YifanShenSZ @cymbalrush @metascroy

CoreML decides at compile/load time which device each MIL operation will execute on; that decision is exposed through MLComputePlan in coremltools 9.0+. This script wraps it so users can answer 'why isn't my model running on the ANE?' without writing Swift, which is the recurring question behind issues like pytorch#4091, pytorch#11541, and pytorch#8439. Inputs supported: * .pte — extracts every Core ML partition first. * .mlpackage — compiles to .mlmodelc in a tempdir. * .mlmodelc — analyzed directly. Reports per-op dispatch (ANE / GPU / CPU), an aggregate breakdown, and optionally the op types that did not get assigned to the ANE (--show_non_ane). Authored with Claude.

pytorch-bot · 2026-05-01T05:53:29Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19252

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

Run pull request jobs on OSDC runners in shadow mode

This comment was automatically generated by Dr. CI and updates every 15 minutes.

john-rocky · 2026-05-02T15:38:09Z

@pytorchbot label "release notes: apple"

kimishpatel · 2026-05-04T20:13:21Z

I thought what op runs where is decided at compile time. Is this being extracted from AOT compile or just from lowering?

john-rocky · 2026-05-04T20:33:23Z

@kimishpatel Compile time, not runtime — MLComputePlan is a static analysis hook that coremltools 9.0+ exposes around the same dispatch decision the CoreML runtime would make. It loads from a compiled .mlmodelc (the script compiles a .mlpackage to a tempdir if needed) and reports preferred_compute_device per MIL operation without actually running predictions.

Quoting coremltools' own docs:

Represents the plan for executing a model. The application can use the plan to estimate the necessary cost and resources of the model before running the predictions.

So no AOT extraction or runtime profiling involved; just a wrapper that walks MLModelStructureProgram and calls get_compute_device_usage_for_mlprogram_operation per op. That's why this works against any of .pte, .mlpackage, or .mlmodelc — they all reduce to a compiled program the framework can plan against.

metascroy · 2026-05-14T23:34:58Z

+    )
+
+
+def _extract_models_from_pte(pte_path: str, out_dir: str) -> List[str]:


Can we reuse utilties in the extract_coreml_model script in the same folder?

Done — _extract_models_from_pte is gone. extract_coreml_models.extract_coreml_models now takes an optional out_dir and returns the list of extracted paths, and coreml_compute_plan.py imports it via the executorch.examples.apple.coreml.scripts namespace.

metascroy · 2026-05-14T23:40:37Z

I thought what op runs where is decided at compile time. Is this being extracted from AOT compile or just from lowering?

I think it's being extracted from modelc compilation (so won't work on linux), see the _ensure_compile call.

metascroy · 2026-05-14T23:42:23Z

+import coremltools as ct
+import torch
+
+sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))


Can we avoid this?

Done — the sys.path.insert is gone. The test imports through the same namespace path (from executorch.examples.apple.coreml.scripts.coreml_compute_plan import ...).

metascroy · 2026-05-14T23:44:20Z

Looks good @john-rocky ! Main comment is to reuse or consolidate the existing utilities in extract_coreml_models script in the same folder.

Can you also test this works with multi-function?

Move the .pte extraction logic into extract_coreml_models.extract_coreml_models, which now takes an optional out_dir and returns the list of extracted paths; coreml_compute_plan.py imports and uses it instead of carrying its own copy. Multifunction .mlpackage inputs are now analyzed function-by-function: each function is projected as the `main` of a temp single-function copy so MLComputePlan.load_from_path covers it (coremltools 9.0 only exposes the plan for the default function otherwise). test_coreml_compute_plan.py uses the executorch.examples.apple.coreml.scripts namespace import directly instead of mutating sys.path, and adds two tests confirming both functions of a multifunction package are surfaced.

john-rocky · 2026-05-15T03:44:52Z

Thanks for the review! Addressed both inline comments and added real multifunction support (rather than just a test that pins the limitation).

What changed (216256a)

_extract_models_from_pte is removed. extract_coreml_models.extract_coreml_models now takes an optional out_dir and returns the list of extracted partition paths; coreml_compute_plan.py imports it via the executorch.examples.apple.coreml.scripts namespace.
test_coreml_compute_plan.py drops the sys.path.insert and imports through the same namespace path.
analyze_one now handles multifunction .mlpackage inputs. MLComputePlan.load_from_path in coremltools 9.0 only exposes the plan for the default function — for multifunction inputs, each function is re-projected as main of a temp single-function copy via MultiFunctionDescriptor / save_multifunction, and rows are gathered back under the original function names.

Tests

$ python -m unittest -v executorch.examples.apple.coreml.scripts.test_coreml_compute_plan
... 9 tests, all OK ...
TestAnalyzeOneMultifunction.test_reports_every_function ............. ok
TestAnalyzeOneMultifunction.test_each_function_lowers_the_same_ops .. ok

The two new tests build a multifunction package (prefill + decode sharing a tiny relu/matmul/add body), run analyze_one, and assert both function names — plus their MIL ops — appear in the result.

Net diff: +154 / −122 across 3 files.

john-rocky requested a review from metascroy as a code owner May 1, 2026 05:53

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 1, 2026

john-rocky mentioned this pull request May 1, 2026

Add Gemma 4 text-decoder export to CoreML #19253

Closed

pytorch-bot Bot added the release notes: apple Changes to the Apple backend delegate label May 2, 2026

nil-is-all added the module: coreml Issues related to Apple's Core ML delegation and code under backends/apple/coreml/ label May 4, 2026

msluszniak mentioned this pull request May 7, 2026

Adapt profiling script to handle multiple backends software-mansion/react-native-executorch#764

Open

metascroy reviewed May 14, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add coreml_compute_plan.py: report which CoreML ops dispatch to ANE / GPU / CPU#19252

Add coreml_compute_plan.py: report which CoreML ops dispatch to ANE / GPU / CPU#19252
john-rocky wants to merge 2 commits into
pytorch:mainfrom
john-rocky:coreml/compute-plan-analyzer

john-rocky commented May 1, 2026 •

edited by pytorch-bot Bot

Loading

Uh oh!

pytorch-bot Bot commented May 1, 2026 •

edited

Loading

Uh oh!

john-rocky commented May 2, 2026

Uh oh!

kimishpatel commented May 4, 2026

Uh oh!

john-rocky commented May 4, 2026

Uh oh!

metascroy May 14, 2026

Uh oh!

john-rocky May 15, 2026

Uh oh!

metascroy commented May 14, 2026

Uh oh!

metascroy May 14, 2026

Uh oh!

john-rocky May 15, 2026

Uh oh!

metascroy commented May 14, 2026

Uh oh!

john-rocky commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		)


		def _extract_models_from_pte(pte_path: str, out_dir: str) -> List[str]:

Conversation

john-rocky commented May 1, 2026 • edited by pytorch-bot Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

pytorch-bot Bot commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19252

❗ 1 Active SEVs

Uh oh!

john-rocky commented May 2, 2026

Uh oh!

kimishpatel commented May 4, 2026

Uh oh!

john-rocky commented May 4, 2026

Uh oh!

metascroy May 14, 2026

Choose a reason for hiding this comment

Uh oh!

john-rocky May 15, 2026

Choose a reason for hiding this comment

Uh oh!

metascroy commented May 14, 2026

Uh oh!

metascroy May 14, 2026

Choose a reason for hiding this comment

Uh oh!

john-rocky May 15, 2026

Choose a reason for hiding this comment

Uh oh!

metascroy commented May 14, 2026

Uh oh!

john-rocky commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

john-rocky commented May 1, 2026 •

edited by pytorch-bot Bot

Loading

pytorch-bot Bot commented May 1, 2026 •

edited

Loading