Improve 2q block collection via 1q quaternion-based collection #13649

jakelishman · 2025-01-10T13:47:38Z

Summary

This is a small series of patches, which could be split into 2-3 separate PRs if preferred. There are two main goals:

introduce a quaternion-based mechanism for working with the $U(2)$ group (1q gates) matrix-free (uses 5 floats, rather than 8 + nd-matrix overhead).
improve the collection speed of ConsolidateBlocks

This doesn't do everything that could be done for ConsolidateBlocks, but I've stopped at the point where the most natural changes to me now are quite a bit larger.

Using this script:

from qiskit import transpile
from qiskit.converters import circuit_to_dag
from qiskit.circuit import library as lib
from qiskit.transpiler.passes import ConsolidateBlocks

pass_ = ConsolidateBlocks(basis_gates=["rz", "sx", "ecr"], force_consolidate=True)

# Some arbitary circuit with lots of runs that use both 1q runs and 2q runs
# with gates in both directions (transpile will put them all in one direction).
qc_base = transpile(
    lib.quantum_volume(100, 100, seed=0),
    basis_gates=["rz", "sx", "ecr"],
    seed_transpiler=0,
)
qc = qc_base.copy_empty_like()
flip = False
for inst in qc_base.data:
    if len(inst.qubits) == 2:
        if flip:
            inst = inst.replace(qubits=inst.qubits[::-1])
        flip = not flip
    qc._append(inst)

dags = [circuit_to_dag(qc, copy_operations=False) for _ in [None]*100]
%time for dag in dags: pass_.run(dag)

I see a modest (~10%) improvement in the pass time, going from ~117ms to ~106ms.

Details and comments

Individual commit notes:

Add versor-based representation of 1q gates

1q gates are members of the group U(2), which we can represent as a scalar phase term and a member of SU(2). The members of SU(2) can be represented by versors (also called unit quaternions, but I got tired of typing that all the time...).

This adds a representation of versors and the group action to the Rust code, and ways to convert from matrix-based forms to the them.

This commit introduces nalgebra as a dependency, to use its quaternion logic. This is a relatively heavy dependency, especially for something as simple as quaternions, but some of this is in anticipation of moving more matrix code to the static matrices of nalgebra, rather than the too-dynamic-for-our-needs ones of ndarray; faer also offers static matrices, but its APIs continue to heavily fluctuate between versions, and it requires ever higher MSRVs.

Use quaternions in 1q block collection

Switch the inner algorithm of ConsolidateBlocks to use the quaternion form for single-qubit matrix multiplications. This offered a few percentage-points speedup for the block collection for typical rz-sx-rz-sx-rz-type runs in quantum-volume-like collections.

Avoid unnecessary allocations in qargs lookup

Switch the lookup logic for the qargs in the block collector to determine the qargs ordering without additional heap allocations. This offers another modest (~4%) improvement in collection performance for large runs.

Avoid allocations in simple matrix operations

Producing the Kronecker product of the two single-qubit matrices from the versor representation is trivially calculable, and can be written into an existing allocation. Similarly, switching the qubit order of a 2q matrix involves only six swaps, and does not need to allocate a new matrix if one is already available.

There are lots of places remaining in this code where more matrix allocations could be avoided. If nothing else, it should be possible to allocate only three 2q matrices in total, and keep shuffling the labelling of them when doing A.B -> C. ndarray does not make this easy, though; nalgebra and faer both have better interfaces for doing this, but currently all our matrix code in the Operation trait is in terms of ndarray.

1q gates are members of the group U(2), which we can represent as a scalar phase term and a member of SU(2). The members of SU(2) can be represented by versors (also called unit quaternions, but I got tired of typing that all the time...). This adds a representation of versors and the group action to the Rust code, and ways to convert from matrix-based forms to the them. This commit introduces `nalgebra` as a dependency, to use its quaternion logic. This is a relatively heavy dependency, especially for something as simple as quaternions, but some of this is in anticipation of moving more matrix code to the static matrices of `nalgebra`, rather than the too-dynamic-for-our-needs ones of `ndarray`; `faer` also offers static matrices, but its APIs continue to heavily fluctuate between versions, and it requires ever higher MSRVs.

Switch the inner algorithm of `ConsolidateBlocks` to use the quaternion form for single-qubit matrix multiplications. This offered a few percentage-points speedup for the block collection for typical `rz-sx-rz-sx-rz`-type runs in quantum-volume-like collections.

Switch the lookup logic for the qargs in the block collector to determine the qargs ordering without additional heap allocations. This offers another modest (~4%) improvement in collection performance for large runs.

Producing the Kronecker product of the two single-qubit matrices from the versor representation is trivially calculable, and can be written into an existing allocation. Similarly, switching the qubit order of a 2q matrix involves only six swaps, and does not need to allocate a new matrix if one is already available. There are lots of places remaining in this code where more matrix allocations could be avoided. If nothing else, it should be possible to allocate only three 2q matrices in total, and keep shuffling the labelling of them when doing `A.B -> C`. `ndarray` does not make this easy, though; `nalgebra` and `faer` both have better interfaces for doing this, but currently all our matrix code in the `Operation` trait is in terms of `ndarray`.

qiskit-bot · 2025-01-10T13:47:44Z

One or more of the following people are relevant to this code:

@Qiskit/terra-core

jakelishman · 2025-01-10T13:48:53Z

There's also more places we could make use of the versor representation, like in 1q gate optimisation, but that pass currently involves passing matrices through many different parts of its interface with itself, so it'll be more complicated to modify.

... which were necessary because I'd borked the matrix calculations and forgotten to write any tests of them.

mtreinish

Just some quick high level comments. I missed the update and don't want these lost by github weirdness. I'll review in more depth later.

mtreinish · 2025-01-10T17:37:55Z

crates/accelerate/src/qi/mod.rs

@@ -0,0 +1,18 @@
+// This code is part of Qiskit.


Should we call this directory quantum_info to match what we do in python?

I don't mind particularly - I read quantum_info in my head as qi most of the time anyway haha. I don't think we necessarily must match Python space, but if you prefer it for consistency I don't have any issues changing it.

mtreinish · 2025-01-10T17:39:37Z

crates/accelerate/src/qi/mod.rs

+// copyright notice, and modified files need to carry a notice indicating
+// that they have been altered from the originals.
+
+//! Quantum-information and linear-algebra related functionality, typically used as drivers for


We might want to move over some linear algebra functionality like: https://github.com/Qiskit/qiskit/blob/main/crates/accelerate/src/utils.rs and https://github.com/Qiskit/qiskit/blob/main/crates/accelerate/src/synthesis/linear/utils.rs (although that first one might not be needed anymore).

Yeah, I'm happy in a follow-up to move a few other bits over. I think there's other loose files and bits and bobs that could probably move into it too, just to keep things a bit more localised.

mtreinish · 2025-01-10T17:43:11Z

crates/accelerate/src/qi/versor_gate.rs

+const COS_PI_8: f64 = 0.9238795325112867;
+const SIN_PI_8: f64 = 0.3826834323650898;


Heh, I thought you didn't like PI_8 variable naming for PI / 8 :)

I was aiming for consistency, really, but looking again I should have called it COS_FRAC_PI_8, perhaps?

Yeah, that's probably the more consistent name with the built-in f64 consts and is harder to mess up by mistake.

coveralls · 2025-01-10T18:55:34Z

Pull Request Test Coverage Report for Build 12718409257

Details

290 of 332 (87.35%) changed or added relevant lines in 3 files are covered.
10 unchanged lines in 3 files lost coverage.
Overall coverage decreased (-0.01%) to 88.906%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
crates/accelerate/src/convert_2q_block_matrix.rs	78	81	96.3%
crates/accelerate/src/qi/versor_gate.rs	211	250	84.4%

Files with Coverage Reduction	New Missed Lines	%
crates/accelerate/src/unitary_synthesis.rs	1	93.18%
crates/qasm2/src/lex.rs	3	92.98%
crates/qasm2/src/parse.rs	6	97.15%

Totals
Change from base Build 12710188694:	-0.01%
Covered Lines:	79660
Relevant Lines:	89600

💛 - Coveralls

jakelishman added 4 commits January 10, 2025 13:47

Avoid unnecessary allocations in qargs lookup

a65684e

Switch the lookup logic for the qargs in the block collector to determine the qargs ordering without additional heap allocations. This offers another modest (~4%) improvement in collection performance for large runs.

jakelishman added performance mod: quantum info Related to the Quantum Info module (States & Operators) Changelog: None Do not include in changelog Rust This PR or issue is related to Rust code in the repository mod: transpiler Issues and PRs related to Transpiler labels Jan 10, 2025

jakelishman added this to the 2.0.0 milestone Jan 10, 2025

jakelishman requested a review from a team as a code owner January 10, 2025 13:47

Add direct Rust VersorGate tests

df1a3c2

... which were necessary because I'd borked the matrix calculations and forgotten to write any tests of them.

mtreinish reviewed Jan 10, 2025

View reviewed changes

Tidy up qi exports

e2e9aaa

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve 2q block collection via 1q quaternion-based collection #13649

Improve 2q block collection via 1q quaternion-based collection #13649

jakelishman commented Jan 10, 2025

qiskit-bot commented Jan 10, 2025

jakelishman commented Jan 10, 2025

mtreinish left a comment

mtreinish Jan 10, 2025

jakelishman Jan 10, 2025

mtreinish Jan 10, 2025

jakelishman Jan 10, 2025

mtreinish Jan 10, 2025

jakelishman Jan 10, 2025

mtreinish Jan 14, 2025

coveralls commented Jan 10, 2025 •

edited

Loading

		const COS_PI_8: f64 = 0.9238795325112867;
		const SIN_PI_8: f64 = 0.3826834323650898;

Improve 2q block collection via 1q quaternion-based collection #13649

Are you sure you want to change the base?

Improve 2q block collection via 1q quaternion-based collection #13649

Conversation

jakelishman commented Jan 10, 2025

Summary

Details and comments

Add versor-based representation of 1q gates

Use quaternions in 1q block collection

Avoid unnecessary allocations in qargs lookup

Avoid allocations in simple matrix operations

qiskit-bot commented Jan 10, 2025

jakelishman commented Jan 10, 2025

mtreinish left a comment

Choose a reason for hiding this comment

mtreinish Jan 10, 2025

Choose a reason for hiding this comment

jakelishman Jan 10, 2025

Choose a reason for hiding this comment

mtreinish Jan 10, 2025

Choose a reason for hiding this comment

jakelishman Jan 10, 2025

Choose a reason for hiding this comment

mtreinish Jan 10, 2025

Choose a reason for hiding this comment

jakelishman Jan 10, 2025

Choose a reason for hiding this comment

mtreinish Jan 14, 2025

Choose a reason for hiding this comment

coveralls commented Jan 10, 2025 • edited Loading

Pull Request Test Coverage Report for Build 12718409257

Details

💛 - Coveralls

coveralls commented Jan 10, 2025 •

edited

Loading