Skip to content

Use TwoQubitPeepholeOptimization in preset pass managers#16136

Open
mtreinish wants to merge 89 commits intoQiskit:mainfrom
mtreinish:use-peephole-in-the-default
Open

Use TwoQubitPeepholeOptimization in preset pass managers#16136
mtreinish wants to merge 89 commits intoQiskit:mainfrom
mtreinish:use-peephole-in-the-default

Conversation

@mtreinish
Copy link
Copy Markdown
Member

Following on from #13419 which added a new optimization pass
TwoQubitPeepholeOptimization which was designed to replace the
pair of ConsolidateBlocks and UnitarySynthesis for the optimization
stage after we have a physical circuit. That PR however did not update
the preset pass managers to concentrate the review on just adding the
new pass. This continues off from there by updating the preset pass
managers to use the new pass in optimization levels 2 and 3 replacing
those levels' optimization stage's previous usage of ConsolidateBlocks
and UnitarySynthesis to achieve the same goal. This should result in
both a runtime performance and transpilation quality improvement as the
new pass is both faster and should produce better fidelity circuits
than the previous peephole optimization.

The tests updates that are made in this PR are because the peephole
optimization is changing the transpilation output of various test
circuits. These were all verified to be valid outputs and in all cases
a "better" output than before. Specifically, for the tests updated
these were the changes in output and why they occurred:

  • The two tests in
    test.python.circuit.test_scheduled_circuit.TestScheduledCircuit were
    the single CX gate in the output circuit was flipped from (0, 1) to
    (1, 0) because in the target the error rate for the (0, 1) direction
    was higher than the extra error cost of 3 sx gates (the rz gates have
    0 error).
  • In test_unroll_only_if_not_gates_in_basis from
    test.python.transpiler.test_preset_passmanagers.TestPresetPassManager
    we no longer run ConsolidateBlocks in the optimization loop so we no
    longer need to add the 2 executions from the init and translation
    stages. The test is updated to count the new peephole pass which is
    the intent of the count check, to check the pass in the optimization
    loop.
  • In test_2q_circuit_5q_backend_v2 from
    test.python.transpiler.test_vf2_post_layout.TestVF2PostLayoutUndirected
    had the same cx gate flipping because the error rate in the original
    layout for the reverse direction was 0.000779905 vs 0.00163587 in the
    original direction. So the new pass was correctly flipping the cx gate
    resulting in a different circuit that vf2 couldn't place anywhere
    better. To fix this the test sets a fixed layout on worse qubits so
    that vf2 will have to place it somewhere better.
  • For test_layout_tokyo_fully_connected_cx_4_3 from
    test.python.transpiler.test_preset_passmanagers.TestFinalLayouts the
    output circuit has a better estimated fidelity (although more gates in
    general). The transpiler output goes from an estimated fidelity of
    0.9526614226294913 before the new pass was used to an estimated fidelity
    of 0.961996188569715 after the new pass is used. This new circuit with a
    better fidelity has a different initial layout set now, so the test
    is updated to use the new layout.

This PR is based on top of #13419 and will need to be rebased after that merges. In the meantime you can view the contents of just this PR by looking at the HEAD commit:

471d199

AI/LLM disclosure

  • I didn't use LLM tooling, or only used it privately.
  • I used the following tool to help write this PR description:
  • I used the following tool to generate or modify code:

This commit adds a new transpiler pass for physical optimization,
TwoQubitPeepholeOptimization. This replaces the use of Collect2qBlocks,
ConsolidateBlocks, and UnitarySynthesis in the optimization stage for
a default pass manager setup. The pass logically works the same way
where it analyzes the dag to get a list of 2q runs, calculates the matrix
of each run, and then synthesizes the matrix and substitutes it inplace.
The distinction this pass makes though is it does this all in a single
pass and also parallelizes the matrix calculation and synthesis steps
because there is no data dependency there.

This new pass is not meant to fully replace the Collect2qBlocks,
ConsolidateBlocks, or UnitarySynthesis passes as those also run in
contexts where we don't have a physical circuit. This is meant instead
to replace their usage in the optimization stage only. Accordingly this
new pass also changes the logic on how we select the synthesis to use
and when to make a substituion. Previously this logic was primarily done
via the ConsolidateBlocks pass by only consolidating to a UnitaryGate if
the number of basis gates needed based on the weyl chamber coordinates
was less than the number of 2q gates in the block (see Qiskit#11659 for
discussion on this). Since this new pass skips the explicit
consolidation stage we go ahead and try all the available synthesizers

Right now this commit has a number of limitations, the largest are:

- Only supports the target
- It doesn't support any synthesizers besides the TwoQubitBasisDecomposer,
  because it's the only one in rust currently.

For plugin handling I left the logic as running the three pass series,
but I'm not sure this is the behavior we want. We could say keep the
synthesis plugins for `UnitarySynthesis` only and then rely on our
built-in methods for physical optimiztion only. But this also seems less
than ideal because the plugin mechanism is how we support synthesizing
to custom basis gates, and also more advanced approximate synthesis
methods. Both of those are things we need to do as part of the synthesis
here.

Additionally, this is currently missing tests and documentation and while
running it manually "works" as in it returns a circuit that looks valid,
I've not done any validation yet. This also likely will need several
rounds of performance optimization and tuning. t this point this is
just a rough proof of concept and will need a lof refinement along with
larger changes to Qiskit's rust code before this is ready to merge.

Fixes Qiskit#12007
Fixes Qiskit#11659
Since Qiskit#13139 merged we have another two qubit decomposer available to
run in rust, the TwoQubitControlledUDecomposer. This commit updates the
new TwoQubitPeepholeOptimization to call this decomposer if the target
supports appropriate 2q gates.
Clippy is correctly warning that the size difference between the two
decomposer types in the TwoQubitDecomposer enumese two types is large.
TwoQubitBasisDecomposer is 1640 bytes and TwoQubitControlledUDecomposer
is only 24 bytes. This means each element of ControlledU is wasting
> 1600 bytes. However, in this case that is acceptable in order to
avoid a layer of pointer indirection as these are stored temporarily
in a vec inside a thread to decompose a unitary. A trait would be more
natural for this to define a common interface between all the two qubit
decomposers but since we keep them instantiated for each edge in a Vec
they need to be sized and doing something like
`Box<dyn TwoQubitDecomposer>` (assuming a trait `TwoQubitDecomposer`
instead of a enum) to get around this would have additional runtime
overhead. This is also considering that TwoQubitControlledUDecomposer
has far less likelihood in practice as it only works with some targets
that have RZZ, RXX, RYY, or RZX gates on an edge which is less common.
Also don't run scoring more than needed.
The priority for the two qubit peephole pass should be decreasing the 2q
gate count. The error rate heuristic should only matter if the 2q counts
are the same. This commit flips the heuristic to first check the 2q gate
count so the first priority is reducing the 2q gate count.
This commit removes the unitary synthesis plugin mechanism from the
pass. This was a layer violation to support this when the pass logic
doesn't actually support using the plugin interface. It is easier and
more clear that if the plugin interface usage is desired to handle that
in the pass manager construction rather than have this pass internally
build a pass manager and execute other passes to emulate behavior it
doesn't have.
There were two issues identified by the testing which required fixing
and adjusting the tests based on limitations in the pass. The first
issue was the parameters for the target gate was not handled correctly.
In the case of using the Controlled U decomposer we were not passing the
computed parameter value correctly to the output circuit and instead the
ParameterExpression from the target was being used. Then in the case of
controlled gates (not supercontrolled) that had a fixed angle that are
normally intended for the xx decomposer were incorrectly being passed to
the TwoQubitBasisDecomposer which can't work with them. This was
resulting in invalid circuit outputs. The use of the
TwoQubitBasisDecomposer is now correctly filtering to only be run with
supercontrolled gates. The tests were adjusted for this limitation
because they were mostly copied from the UnitarySynthesis tests which
supports xx decomposer.
mtreinish added 18 commits April 2, 2026 08:57
We don't want to spend time reconstructing an exact copy of the dag if
there are no substituions needed. Prior to using a vec for tracking the
run indices that nodes are part of we would check if that map was empty.
The vec is always populated and to determine if there are no entries
we'd have to do a worse case O(n) lookup to determine if any entries are
set. To avoid that this overhead but keeping the check this adds an atomic
bool that is used to track whether we've substituted any blocks. If this
is not set to true we can just exit early since there are no
substitutions to make.
Previously there was a mismatch between the scoring of synthesis
results and the peephole pass's comparison with the original block. The
pass is documented as using the tuple (num_2q_gates, error, num_gates)
and picking the min of all the choices. But, when we called the unitary
synthesis function that selects the best synthesis outcome it was
maximizing the estimated fidelity but not considering the gate counts
like the pass is documented as doing. This corrects this mismatch by
updating the function doing the synthesis to be generic on score type
and taking a scorer callback. This lets the peephole pass control the
heuristic used for selecting the best score.
In testing the pass in the full pass manager there is an underlying
issue with Python defined gates in the circuit and the GIL handling. In
the presence of those gates the thread doing the synthesis and analysis
will need the GIL to get the matrix of the Python gate. But in the
previous version of this pass the parent thread retained the GIL while
the parallel workers ran. This would cause a deadlock because the
worker threads would never be able to acquire the GIL when they tried to
do so during the execution of the synthesis. This commit attempts to fix
this by splitting out the parallel portion from the serial portion of
the function. The serial portion rebuilds the dag from the analysis
results and needs the GIL to copy any python operations in the circuit
as it's rebuilding. So we re-attach the GIL prior to running the serial
portion.

The unitary synthesis decomposer handling needed to be updated as well
because there was implicit usage of the GIL from the py-clone feature
around handling custom rxx equivalent gates for the controlled u
decomposer. This was not correct in a threaded context where the GIL
might be released and would cause a panic. This is updated to explicitly
attach to the python interpreter and handle the python copy with the py
token explicitly. There were places already doing this in the decomposer
handling code but because of the py-clone feature the clone() calls were
missed.
Following on from Qiskit#13419 which added a new optimization pass
TwoQubitPeepholeOptimization which was designed to replace the
pair of ConsolidateBlocks and UnitarySynthesis for the optimization
stage after we have a physical circuit. That PR however did not update
the preset pass managers to concentrate the review on just adding the
new pass. This continues off from there by updating the preset pass
managers to use the new pass in optimization levels 2 and 3 replacing
those levels' optimization stage's previous usage of ConsolidateBlocks
and UnitarySynthesis to achieve the same goal. This should result in
both a runtime performance and transpilation quality improvement as the
new pass is both faster and should produce better fidelity circuits
than the previous peephole optimization.

The tests updates that are made in this PR are because the peephole
optimization is changing the transpilation output of various test
circuits. These were all verified to be valid outputs and in all cases
a "better" output than before. Specifically, for the tests updated
these were the changes in output and why they occurred:

* The two tests in
  test.python.circuit.test_scheduled_circuit.TestScheduledCircuit were
  the single CX gate in the output circuit was flipped from (0, 1) to
  (1, 0) because in the target the error rate for the (0, 1) direction
  was higher than the extra error cost of 3 sx gates (the rz gates have
  0 error).
* In test_unroll_only_if_not_gates_in_basis from
  test.python.transpiler.test_preset_passmanagers.TestPresetPassManager
  we no longer run ConsolidateBlocks in the optimization loop so we no
  longer need to add the 2 executions from the init and translation
  stages. The test is updated to count the new peephole pass which is
  the intent of the count check, to check the pass in the optimization
  loop.
* In test_2q_circuit_5q_backend_v2 from
  test.python.transpiler.test_vf2_post_layout.TestVF2PostLayoutUndirected
  had the same cx gate flipping because the error rate in the original
  layout for the reverse direction was 0.000779905 vs 0.00163587 in the
  original direction. So the new pass was correctly flipping the cx gate
  resulting in a different circuit that vf2 couldn't place anywhere
  better. To fix this the test sets a fixed layout on worse qubits so
  that vf2 will have to place it somewhere better.
* For test_layout_tokyo_fully_connected_cx_4_3 from
  test.python.transpiler.test_preset_passmanagers.TestFinalLayouts the
  output circuit has a better estimated fidelity (although more gates in
  general). The transpiler output goes from an estimated fidelity of
  0.9526614226294913 before the new pass was used to an estimated fidelity
  of 0.961996188569715 after the new pass is used. This new circuit with a
  better fidelity has a different initial layout set now, so the test
  is updated to use the new layout.
@mtreinish mtreinish added this to the 2.5.0 milestone May 4, 2026
@mtreinish mtreinish requested a review from a team as a code owner May 4, 2026 22:10
@mtreinish mtreinish requested a review from raynelfss May 4, 2026 22:10
@mtreinish mtreinish added on hold Can not fix yet performance Changelog: Performance Performance improvements without API and semantic changes. labels May 4, 2026
@github-project-automation github-project-automation Bot moved this to Ready in Qiskit 2.5 May 4, 2026
@qiskit-bot
Copy link
Copy Markdown
Collaborator

One or more of the following people are relevant to this code:

  • @Qiskit/terra-core
  • @levbishop

@coveralls
Copy link
Copy Markdown

Coverage Report for CI Build 25346270286

Coverage increased (+0.01%) to 87.572%

Details

  • Coverage increased (+0.01%) from the base build.
  • Patch coverage: 35 uncovered changes across 4 files (306 of 341 lines covered, 89.74%).
  • 8 coverage regressions across 5 files.

Uncovered Changes

File Changed Covered %
crates/transpiler/src/passes/two_qubit_peephole.rs 239 210 87.87%
crates/synthesis/src/two_qubit_decompose/basis_decomposer.rs 7 4 57.14%
crates/transpiler/src/passes/unitary_synthesis/mod.rs 53 51 96.23%
crates/pyext/src/lib.rs 4 3 75.0%

Coverage Regressions

8 previously-covered lines in 5 files lost coverage.

File Lines Losing Coverage Coverage
crates/qasm2/src/lex.rs 3 92.8%
crates/circuit/src/parameter/symbol_expr.rs 2 74.01%
crates/circuit/src/dag_circuit.rs 1 84.7%
crates/circuit/src/parameter/parameter_expression.rs 1 90.53%
crates/synthesis/src/euler_one_qubit_decomposer.rs 1 91.46%

Coverage Stats

Coverage Status
Relevant Lines: 122105
Covered Lines: 106930
Line Coverage: 87.57%
Coverage Strength: 974128.97 hits per line

💛 - Coveralls

@mtreinish
Copy link
Copy Markdown
Member Author

Hmm, I ran asv transpile runtime benchmarks on this PR and it is showing a regression on the hwb benchmarks which is unexpected. I'll have to do some profiling of why the pass is slower, it does potentially do more work to get the better quality results (it always synthesizes the 2q block's unitary), but in my earlier testing this was always offset by the use of parallelism which resulted in a ~2x speedup for transpilation. It's also showing a speedup in places I wouldn't have expected because they're smaller circuits and I'd expect the extra overhead of the parallelism to limit the speedup.

Benchmarks that have improved:

| Change   | Before [ba656ec3]    | After [471d1995]    |   Ratio | Benchmark (Parameter)                                                    |
|----------|----------------------|---------------------|---------|--------------------------------------------------------------------------|
| -        | 13.8±0.1ms           | 12.3±0.08ms         |    0.89 | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_qv_14_x_14(2) |
| -        | 28.9±0.5ms           | 25.1±0.4ms          |    0.87 | utility_scale.UtilityScaleBenchmarks.time_bv_100('cx')                   |

Benchmarks that have stayed the same:

| Change   | Before [ba656ec3]    | After [471d1995]    |   Ratio | Benchmark (Parameter)                                                                           |
|----------|----------------------|---------------------|---------|-------------------------------------------------------------------------------------------------|
|          | 11.2±0.06ms          | 12.1±0.09ms         |    1.08 | utility_scale.UtilityScaleBenchmarks.time_circSU2('cx')                                         |
|          | 103±0.6ms            | 109±0.6ms           |    1.06 | utility_scale.UtilityScaleBenchmarks.time_qaoa('cx')                                            |
|          | 26.0±0.2ms           | 27.3±0.4ms          |    1.05 | utility_scale.UtilityScaleBenchmarks.time_bv_100('ecr')                                         |
|          | 12.6±0.1ms           | 13.1±0.2ms          |    1.04 | utility_scale.UtilityScaleBenchmarks.time_circSU2('ecr')                                        |
|          | 123±0.8ms            | 128±0.9ms           |    1.04 | utility_scale.UtilityScaleBenchmarks.time_qaoa('ecr')                                           |
|          | 13.2±0.1ms           | 13.5±0.1ms          |    1.03 | utility_scale.UtilityScaleBenchmarks.time_parse_square_heisenberg_n100('ecr')                   |
|          | 321±0.6ms            | 330±0.9ms           |    1.03 | utility_scale.UtilityScaleBenchmarks.time_qv('ecr')                                             |
|          | 5.24±0.1ms           | 5.35±0.2ms          |    1.02 | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_qv_14_x_14(0)                        |
|          | 12.5±0.04ms          | 12.8±0.1ms          |    1.02 | utility_scale.UtilityScaleBenchmarks.time_circSU2('cz')                                         |
|          | 2.87±0.04s           | 2.93±0.06s          |    1.02 | utility_scale.UtilityScaleBenchmarks.time_circSU2_89('cz')                                      |
|          | 218±4ms              | 223±1ms             |    1.02 | utility_scale.UtilityScaleBenchmarks.time_parse_hwb12('cx')                                     |
|          | 3.76±0.05ms          | 3.82±0.06ms         |    1.02 | utility_scale.UtilityScaleBenchmarks.time_parse_qaoa_n100('cx')                                 |
|          | 3.75±0.05ms          | 3.82±0.01ms         |    1.02 | utility_scale.UtilityScaleBenchmarks.time_parse_qaoa_n100('cz')                                 |
|          | 13.3±0.2ms           | 13.5±0.08ms         |    1.02 | utility_scale.UtilityScaleBenchmarks.time_parse_square_heisenberg_n100('cx')                    |
|          | 13.3±0.1ms           | 13.5±0.09ms         |    1.02 | utility_scale.UtilityScaleBenchmarks.time_parse_square_heisenberg_n100('cz')                    |
|          | 3.01±0.02ms          | 3.03±0.03ms         |    1.01 | utility_scale.UtilityScaleBenchmarks.time_bvlike('cx')                                          |
|          | 12.4±0s              | 12.5±0s             |    1.01 | utility_scale.UtilityScaleBenchmarks.time_hwb12('cz')                                           |
|          | 219±3ms              | 222±1ms             |    1.01 | utility_scale.UtilityScaleBenchmarks.time_parse_hwb12('cz')                                     |
|          | 218±3ms              | 221±0.9ms           |    1.01 | utility_scale.UtilityScaleBenchmarks.time_parse_hwb12('ecr')                                    |
|          | 40.9±0.4ms           | 41.1±0.2ms          |    1.01 | utility_scale.UtilityScaleBenchmarks.time_parse_qft_n100('cx')                                  |
|          | 40.8±0.2ms           | 41.1±0.07ms         |    1.01 | utility_scale.UtilityScaleBenchmarks.time_parse_qft_n100('cz')                                  |
|          | 40.9±0.4ms           | 41.2±0.2ms          |    1.01 | utility_scale.UtilityScaleBenchmarks.time_parse_qft_n100('ecr')                                 |
|          | 294±0.7ms            | 296±1ms             |    1.01 | utility_scale.UtilityScaleBenchmarks.time_qv('cx')                                              |
|          | 35.7±0.1ms           | 36.1±0.2ms          |    1.01 | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('cx')                               |
|          | 7.72±0.06ms          | 7.73±0.05ms         |    1    | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_qv_14_x_14(1)                        |
|          | 3.01±0.02ms          | 3.02±0.02ms         |    1    | utility_scale.UtilityScaleBenchmarks.time_bvlike('cz')                                          |
|          | 3.01±0.02ms          | 3.02±0.02ms         |    1    | utility_scale.UtilityScaleBenchmarks.time_bvlike('ecr')                                         |
|          | 2.83±0.06s           | 2.82±0.08s          |    1    | utility_scale.UtilityScaleBenchmarks.time_circSU2_89('cx')                                      |
|          | 3.77±0.09ms          | 3.78±0.02ms         |    1    | utility_scale.UtilityScaleBenchmarks.time_parse_qaoa_n100('ecr')                                |
|          | 128±0.4ms            | 129±0.8ms           |    1    | utility_scale.UtilityScaleBenchmarks.time_qaoa('cz')                                            |
|          | 202±0.6ms            | 202±0.3ms           |    1    | utility_scale.UtilityScaleBenchmarks.time_qft('ecr')                                            |
|          | 3.55±0.03ms          | 3.51±0.01ms         |    0.99 | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm(0)                   |
|          | 5.41±0.03ms          | 5.34±0.03ms         |    0.99 | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm(1)                   |
|          | 4.85±0.02ms          | 4.79±0.02ms         |    0.99 | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm(2)                   |
|          | 5.01±0.01ms          | 4.97±0.03ms         |    0.99 | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm(3)                   |
|          | 13.4±0.05ms          | 13.2±0.1ms          |    0.99 | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm_backend_with_prop(0) |
|          | 17.6±0.06ms          | 17.5±0.08ms         |    0.99 | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm_backend_with_prop(1) |
|          | 15.5±0.2ms           | 15.3±0.2ms          |    0.99 | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_qv_14_x_14(3)                        |
|          | 40.8±0.2ms           | 40.4±0.5ms          |    0.99 | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('ecr')                              |
|          | 3.62±0.02ms          | 3.54±0.02ms         |    0.98 | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm_backend_with_prop(2) |
|          | 26.5±0.1ms           | 25.9±0.3ms          |    0.98 | utility_scale.UtilityScaleBenchmarks.time_bv_100('cz')                                          |
|          | 2.86±0.07s           | 2.81±0.07s          |    0.98 | utility_scale.UtilityScaleBenchmarks.time_circSU2_89('ecr')                                     |
|          | 3.75±0.02ms          | 3.65±0.01ms         |    0.97 | transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm_backend_with_prop(3) |
|          | 315±1ms              | 304±2ms             |    0.96 | utility_scale.UtilityScaleBenchmarks.time_qv('cz')                                              |
|          | 192±0.7ms            | 180±1ms             |    0.94 | utility_scale.UtilityScaleBenchmarks.time_qft('cz')                                             |
|          | 40.2±0.5ms           | 37.6±0.2ms          |    0.93 | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('cz')                               |

Benchmarks that have got worse:

| Change   | Before [ba656ec3]    | After [471d1995]    |   Ratio | Benchmark (Parameter)                                  |
|----------|----------------------|---------------------|---------|--------------------------------------------------------|
| +        | 9.12±0.02s           | 13.5±0.01s          |    1.48 | utility_scale.UtilityScaleBenchmarks.time_hwb12('cx')  |
| +        | 160±0.5ms            | 184±0.4ms           |    1.15 | utility_scale.UtilityScaleBenchmarks.time_qft('cx')    |
| +        | 11.8±0s              | 13.1±0s             |    1.11 | utility_scale.UtilityScaleBenchmarks.time_hwb12('ecr') |

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE DECREASED.

@mtreinish
Copy link
Copy Markdown
Member Author

Interesting looking at that example in particular (with a different seed and backends, I modified the pgo script instead of using the asv benchmark directly) the pass is significantly faster. It takes about 600ms to run while the pair of ConsolidateBlocks and UnitarySynthesis takes about 1500ms. However, this speedup is offset by the remaining passes running more slowly. For example, commutative cancellation takes about 2 seconds and it was 1.4 sec before (there are 3 loop iterations). The other thing that is suspicious is on the first iteration of the optimization loop the basis translator is being run to correct something outside the basis while it wasn't being run before. Overall the new pass is faster but it's causing more work for the later passes. I'll do more digging to better understand why this is happening in this particular case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Changelog: Performance Performance improvements without API and semantic changes. on hold Can not fix yet performance

Projects

Status: Ready

Development

Successfully merging this pull request may close these issues.

4 participants