Skip to content

DensePauli class in Rust#16137

Open
alexanderivrii wants to merge 1 commit intoQiskit:mainfrom
alexanderivrii:rust-dense-pauli
Open

DensePauli class in Rust#16137
alexanderivrii wants to merge 1 commit intoQiskit:mainfrom
alexanderivrii:rust-dense-pauli

Conversation

@alexanderivrii
Copy link
Copy Markdown
Member

Summary

This PR adds a DensePauli class in Rust.

The DensePauli class is required for extending the Litinski transform to support Pauli rotations (see #15974), and in fact the current implementation is initially based on the implementation there. The plan is to rebase #15974 on top of this PR when this merges. The DensePauli class is also required for the other current work of extending the Rust-based StabilizerTableau (Clifford) simulation to support projective Pauli measurements. It can also eventually become a viable alternative to the Python Pauli class, allowing to replace the python Pauli class by the much faster rust Pauli class.

Implementation-wise, DensePauli is as follows:

pub struct DensePauli {
    /// x-component
    pub pauli_x: FixedBitSet,
    /// z-component
    pub pauli_z: FixedBitSet,
    /// xz-phase
    pub xz_phase: u8,
}

The X- and Z- Pauli components are stored using FixedBitSet, allowing bit-packing and word-level operations, and leading to very fast commutativity and multiplication methods.

Additionally, this implements a function evolve_pauli_by_clifford to evolve a Clifford by a dense Pauli.

Co-authored with Shelly Garion.

AI/LLM disclosure

  • I didn't use LLM tooling, or only used it privately.
  • I used the following tool to help write this PR description:
  • I used the following tool to generate or modify code:

The dense Pauli class is implemented using FixedBitSet, allowing bit-packing and
word-level operations.

Co-authored-by: Shelly Garion <shelly@il.ibm.com>
@alexanderivrii alexanderivrii added this to the 2.5.0 milestone May 5, 2026
@alexanderivrii alexanderivrii requested a review from a team as a code owner May 5, 2026 08:53
@alexanderivrii alexanderivrii requested a review from Cryoris May 5, 2026 08:53
@alexanderivrii alexanderivrii added Changelog: None Do not include in the GitHub Release changelog. fault tolerance related to fault tolerance compilation labels May 5, 2026
@github-project-automation github-project-automation Bot moved this to Ready in Qiskit 2.5 May 5, 2026
@qiskit-bot
Copy link
Copy Markdown
Collaborator

One or more of the following people are relevant to this code:

  • @Qiskit/terra-core

///
/// This function will panic if the two Paulis are of different length.
pub fn commutes(&self, other: &DensePauli) -> bool {
debug_assert!(self.num_qubits() == other.num_qubits());
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I improve error-handling from debug asserts to returning a Result? Though ideally I want this particular function to be as fast as possible and would prefer to avoid doing additional checks.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a comment on this below, that should answer this question 🙂

Comment on lines +302 to +304
for i in 0..num_qubits {
parity ^= (self.pauli_z[i] & other.pauli_x[i]) ^ (self.pauli_x[i] & other.pauli_z[i]);
}
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if there might be a better way to iterate over blocks of the FixedBitSet than over individual bits. I am planning to explore this next.

One thing that I already tried was something like this

parity = (&(&self.pauli_z & &other.pauli_x) ^ &(&self.pauli_x & &other.pauli_z)).count_ones(..)

however that was significantly slower because this creates new FixedBitSets in the process.

Comment on lines +210 to +214
/// .. note::
///
/// Unlike Python-space Qiskit convention, the label is represented left-to-right,
/// for example "-iXIZY" is interpreted as `'X'` on qubit `0`, followed by `'I'` on qubit `1`,
/// and so on.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I am always getting confused with the Qiskit python convention of reading the pauli terms right-to-left, so I have made this and the from_label functions to treat labels as left-to-right. However, if you think that it's better to keep the notation consistent throughout Qiskit, I can change this.

Comment on lines 387 to 391
fn compute_phase_product_pauli(
clifford: &Clifford,
pauli_indices: &[usize],
pauli_y_count: u32,
pauli_y_count: u8,
) -> bool {
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This question is Independent of this PR, but I might as well ask it here. This function is the hot-spot for evolving Paulis by Cliffords and in particular for the LitinskiTransformation pass. I have locally tried multiple ways to reimplement it.

The following version replaces the match statement by a static table lookup.

static PHASE_TABLE: [u8; 16] = [0, 0, 0, 0, 0, 0, 3, 1, 0, 1, 0, 3, 0, 3, 1, 0];
...
let idx = (x1 as u8) | ((z1 as u8) << 1) | ((x as u8) << 2) | ((z as u8) << 3);
ifact += PHASE_TABLE[idx as usize];

It works very well for large densely populated Cliffords (resulting in about 5x performance improvement) but is apparently slightly worse than the current implementation on the existing ASV benchmarks.

I have also tried replacing the match or the table lookup by an explicit arithmetic computation, but this was consistently worse than the table lookup on all of the benchmarks.

I am wondering if there is a clever way to vectorize this computation - or any additional ideas.

Copy link
Copy Markdown
Collaborator

@Cryoris Cryoris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having a dedicated Pauli class sounds like it can be very useful, and the implementation looks good to me overall. I've left some questions below and have some other, higher-level questions on top:

  • Why was the final choice FixedBitSet? Do we have benchmarking data backing this up?
  • It would be good if we actually use these new features somewhere. E.g. the existing Litinski transform could already use it, which would help identifying any performance regressions.
  • In several places we've documented that a function would panic, e.g. if the Pauli and Clifford don't have the same size, but there's only a debug_assert in the function. Afaik, this doesn't actually panic if compiled in release mode. We should either replace this by a proper error, which the user can handle, or, if you're worried about the overhead, provide unsafe variants that require the sizes to be correct.

z: &[bool],
indices: &[u32],
phase: u8,
is_group_phase: bool,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand the is_group_phase -- what exactly does it do and what do we use it for?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two standard conventions to implement the phase of a single-qubit Pauli.

The "group phase" convention:

  • the (x, z) pair (false, false) corresponds to I
  • the (x, z) pair (true, false) corresponds to X
  • the (x, z) pair (false, true) corresponds to Z
  • the (x, z) pair (true, true) corresponds to Y

The "xz-phase" convention

  • the (x, z) pair (false, false) corresponds to I
  • the (x, z) pair (true, false) corresponds to X
  • the (x, z) pair (false, true) corresponds to Z
  • the (x, z) pair (true, true) corresponds to XZ = -iY

It is faster to check whether two Paulis commute or to multiply two Paulis using XZ-phase convention. However, when we call this function we might want to pass the group phase instead of the xz-phase.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we also have this convention of zx_phase and group_phase in the python code

Comment on lines +60 to +61
x: &[bool],
z: &[bool],
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use consistent z-x order of arguments throughout? E.g. evolve_by_... takes (z, x) but this is (x, z). Since both have the same type it seems one could easily mix up the orders. I don't have a strong preference on the order as long as its consistent, but the quantum_info module seems to consistently use (z, x).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. I will double check whether the standard name is "xz-phase" or "zx-phase" (I have probably seen both used in different packages).

/// Evolving a Pauli with a single non-identity Z-term on qubit `qbit` by the given Clifford.
///
/// Return the evolved Pauli in a sparse ZX format: (sign, z, x, indices).
pub fn get_inverse_z(&self, qbit: usize) -> (bool, Vec<bool>, Vec<bool>, Vec<u32>) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need this implementation, given that we now have evolve_single_qubit_pauli_dense?

It would be good to update the existing code paths to use the new functionality, also from a testing POV (since the new function is not used anywhere yet, right?).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the "sparse" variant of the same function. From my local experiments and an offline discussion with you, we might need both variants. Applying Litinski to a single-qubit RZ-rotation is faster with the sparse format. Applying on a multi-qubit PPR is faster with a dense format.

self.tableau[qbit + num_qubits][i + num_qubits]
^ self.tableau[qbit][i + num_qubits],
),
_ => unreachable!("This is only called for RX/RZ/RY gates."),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not unreachable in the current form. It's a pub function and calling it with with pauli_z=false, pauli_x=false is a valid input.

Comment on lines +239 to +241
0 => {
s = String::from("") + &s;
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can avoid the copy here by just return s; directly in this case

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, sily me :).

Comment on lines +295 to +297
/// Panics
///
/// This function will panic if the two Paulis are of different length.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is actually correct -- if we compile in release mode, then the debug_asserts are not included and this would not panic but pass silently if self.num_qubits < other.num_qubits. Could we replace the debug assert with an actual check and an error that we can handle?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you have seen from one of the comments, I am not happy with the debug asserts either. Possibly the additional check will not be too costly. Looking at the existing sparse pauli class we can provide both checked and unchecked method if we think this is performance-critical.

Comment on lines +66 to +73
debug_assert!(
x.len() == indices.len(),
"x and indices must have the same length"
);
debug_assert!(
z.len() == indices.len(),
"z and indices must have the same length"
);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we replace this with a proper error that is returned?

xz_phase = xz_phase.wrapping_add(1) & 3;
}
_ => {
panic!("Incorrect label");
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also here, could we use an error? 🙂

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You convinced me, I should provide proper error handling throughout the code.

s = String::from("-i") + &s;
}
_ => {
panic!("we should never get this")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a bit more descriptive would be nice

Suggested change
panic!("we should never get this")
panic!("DensePauli phases are constraint in {0, 1, 2, 3}.")

/// This function will panic if the Clifford and the Pauli do not have the
/// same number of qubits.
pub fn evolve_pauli_by_clifford(pauli: &DensePauli, cliff: &Clifford) -> DensePauli {
debug_assert!(pauli.num_qubits() == cliff.num_qubits);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, this does not get compiled into the program in release mode. If we want an actual check then we should just return an error if this fails.

@coveralls
Copy link
Copy Markdown

Coverage Report for CI Build 25366950547

Coverage increased (+0.03%) to 87.604%

Details

  • Coverage increased (+0.03%) from the base build.
  • Patch coverage: 40 uncovered changes across 2 files (405 of 445 lines covered, 91.01%).
  • 7 coverage regressions across 3 files.

Uncovered Changes

File Changed Covered %
crates/quantum_info/src/dense_pauli.rs 385 346 89.87%
crates/quantum_info/src/clifford.rs 60 59 98.33%

Coverage Regressions

7 previously-covered lines in 3 files lost coverage.

File Lines Losing Coverage Coverage
crates/qasm2/src/lex.rs 4 92.54%
crates/circuit/src/parameter/symbol_expr.rs 2 74.17%
crates/circuit/src/parameter/parameter_expression.rs 1 90.53%

Coverage Stats

Coverage Status
Relevant Lines: 122237
Covered Lines: 107084
Line Coverage: 87.6%
Coverage Strength: 956486.23 hits per line

💛 - Coveralls

@alexanderivrii
Copy link
Copy Markdown
Member Author

Julien, your additional questions:

Why was the final choice FixedBitSet? Do we have benchmarking data backing this up?

There are multiple questions here. First, do we need a "dense" Pauli class in addition to the sparse Pauli class we already have? I think we do. Second, what is the best way to implement the "dense" class? Checking commutativity/multiplying two Paulis represented using FixedBitSet is much faster than when represented using Vec<bool>, I can pull some older benchmarking data.

It would be good if we actually use these new features somewhere. E.g. the existing Litinski transform could already use it, which would help identifying any performance regressions.

I am not sure I fully agree with this. Having a single PR that adds a DensePauli class, changes the LitinskiTransform to use this class, and further extends the LitinskiTransform to handle Clifford-like PPR gates (as per discussion on #15974) is just too much of a change. I think a more appropriate division is to having a working DensePauli class as a separate PR. But I am adding more Rust tests to make sure the functionality works as expected.

@Cryoris
Copy link
Copy Markdown
Collaborator

Cryoris commented May 5, 2026

It would be good if we actually use these new features somewhere. E.g. the existing Litinski transform could already use it, which would help identifying any performance regressions.

I am not sure I fully agree with this. Having a single PR that adds a DensePauli class, changes the LitinskiTransform to use this class, and further extends the LitinskiTransform to handle Clifford-like PPR gates (as per discussion on #15974) is just too much of a change. I think a more appropriate division is to having a working DensePauli class as a separate PR. But I am adding more Rust tests to make sure the functionality works as expected.

The main reason I'm suggesting this is to track the performance impact. If you integrated it locally or, even better, could open a follow-up PR that uses this, and could run asv or some benchmarks, that would also be enough 🙂

pub fn count_y(&self) -> u8 {
let num_qubits = self.num_qubits();
let mut cnt_y = 0;
for i in 0..num_qubits {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did you change it to a for loop instead of an iterator?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have experimented with multiple versions of each function, so this is possibly just the last version I tried (but I don't think that this had any effect on performance). As I mentioned somewhere, I am planning to see if we can speed this up by iterating over blocks of the FixedBitSet rather than individual bits: this may give a performance benefit if the compiler does not already optimize this for us.

z: &[bool],
indices: &[u32],
phase: u8,
is_group_phase: bool,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we also have this convention of zx_phase and group_phase in the python code

debug_assert!(self.num_qubits() == other.num_qubits());
let mut xz_phase = self.xz_phase + other.xz_phase;
let num_qubits = self.num_qubits();
for i in 0..num_qubits {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is there a for loop here and not an iterator?

debug_assert!(self.num_qubits() == other.num_qubits());
let num_qubits = self.num_qubits();
self.xz_phase += other.xz_phase;
for i in 0..num_qubits {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question on the for loop

fn test_evolve_2_qubits() {
use ndarray::Array2;

// Random Clifford created from Python (with seed=1234).
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generated using the Python code:
random_clifford(2, seed=1234)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Changelog: None Do not include in the GitHub Release changelog. fault tolerance related to fault tolerance compilation

Projects

Status: Ready

Development

Successfully merging this pull request may close these issues.

5 participants