Co-design of quantum hardware and compilers should be evaluated with metrics that connect physical device behavior to compiled circuit performance, enabling fair comparisons and practical engineering trade-offs. Effective metrics must measure not only raw qubit quality but also how compilation choices amplify or mitigate hardware limitations.
Technical metrics
At the device level, coherence times (T1 and T2), single- and two-qubit gate fidelities, and crosstalk remain primary indicators because they determine how long and how accurately states persist. These quantities are commonly assessed with randomized benchmarking and tomography; John Preskill, California Institute of Technology, has emphasized the practical constraints they impose in the NISQ era. Holistic measures such as quantum volume, introduced by IBM Research, capture combined effects of qubit count, connectivity, and error rates and are useful when comparing platforms. Complementing these hardware-oriented measures are compiler-focused metrics: compiled circuit depth, two-qubit gate count after routing, and compiler-induced fidelity loss, which quantify how much overhead compilation adds. A metric like logical error rate per algorithm bridges both domains by estimating end-to-end success probability for a target algorithm, reflecting error correction or mitigation overheads that Peter Shor, Massachusetts Institute of Technology, made foundationally relevant by showing asymptotic algorithmic needs.
Systemic and societal metrics
Beyond physics and software, resource efficiency matters: cryogenic power per operation and qubit yield influence environmental and deployment choices. Cryogenic energy consumption and manufacturing yield affect where facilities locate and who can access them, creating territorial imbalances in capability. Cultural and economic factors appear in metrics such as toolchain portability and open-source reproducibility, which shape community knowledge flow and workforce development; Jay Gambetta, IBM Research, has discussed the role of standardized benchmarks in promoting interoperability.
Relevance stems from consequences: high compiled-depth for modest hardware leads to infeasible runs, while tight hardware–compiler co-design can reduce qubit overhead for error correction, improving scalability and lowering environmental cost. Causes include limited connectivity, slow calibration cycles, and immature compiler heuristics; consequences manifest as reduced algorithmic reach, higher operational cost, and concentrated access in well-resourced regions. Evaluations should therefore combine device-level physics, compiler overhead, end-to-end algorithmic success probability, and systemic impact metrics to guide research priorities and policy decisions. Only by measuring across these axes can co-design efforts produce quantum systems that are performant, equitable, and sustainable.