
7 CONCLUSIONS
In this work, we have evaluated the benefits of par-
allelizing pairwise tensor contractions on both multi-
core CPUs and GPUs. Our experimental results, ob-
tained using three Julia packages, show that exploit-
ing this level of parallelism can significantly accel-
erate the tensor network contraction process. In par-
ticular, we observed that the massive data parallelism
provided by GPUs significantly outperforms the per-
formance of multicore CPUs.
For our experiments we used the QXTools pack-
age, which relies on OMEinsum for pairwise tensor
contraction on the CPU. Our results indicate that
OMEinsum gets limited speedups from parallel exe-
cution in most cases. In contrast, BliContractor
achieves a more effective use of first-level paral-
lelism. However, since BliContractor is consid-
erably slower in sequential contraction, its overall
performance only slightly exceeds that of OMEinsum
when using parallel execution.
Finally, we observed that parallel performance is
strongly influenced by the structure of the circuit and
by whether first-level parallelism is applied to the
contraction of all tensor pairs or is restricted to the
contraction of the reduced network formed by the
communities detected in the initial tensor network.
These results highlight the importance of choosing an
appropriate contraction strategy based on the circuit
properties to maximise computational efficiency.
ACKNOWLEDGEMENTS
This research was funded by the project
PID2023-146569NB-C22 supported by MI-
CIU/AEI/10.13039/501100011033 and ERDF/UE.
REFERENCES
Arute, F., Arya, K., Babbush, R., et al. (2019). Quan-
tum supremacy using a programmable superconduct-
ing processor. Nature, 574(7779):505–510.
Bayraktar, H., Charara, A., Clark, D., Cohen, S., Costa,
T., Fang, Y.-L. L., Gao, Y., Guan, J., Gunnels, J.,
Haidar, A., et al. (2023). cuQuantum SDK: A high-
performance library for accelerating quantum science.
In 2023 IEEE International Conference on Quantum
Computing and Engineering (QCE), volume 1, pages
1050–1061. IEEE.
Brennan, J., Allalen, M., Brayford, D., Hanley, K.,
Iapichino, L., O’Riordan, L. J., Doyle, M., and Moran,
N. (2021). Tensor network circuit simulation at exas-
cale. In 2021 IEEE/ACM Second International Work-
shop on Quantum Computing Software (QCS), pages
20–26. IEEE.
Brennan, J., O’Riordan, L., Hanley, K., Doyle, M., Allalen,
M., Brayford, D., Iapichino, L., and Moran, N. (2022).
Qxtools: A julia framework for distributed quantum
circuit simulation. Journal of Open Source Software,
7(70):3711.
Girvan, M. and Newman, M. E. (2002). Community struc-
ture in social and biological networks. Proceedings of
the national academy of sciences, 99(12):7821–7826.
Gray, J. and Kourtis, S. (2021). Hyper-optimized tensor
network contraction. Quantum, 5:410.
Huang, C., Zhang, F., Newman, M., Ni, X., Ding, D., Cai,
J., Gao, X., Wang, T., Wu, F., Zhang, G., et al. (2021).
Efficient parallelization of tensor network contraction
for simulating quantum computation. Nature Compu-
tational Science, 1(9):578–587.
Lyakh, D. I., Nguyen, T., Claudino, D., Dumitrescu, E.,
and McCaskey, A. J. (2022). ExaTN: Scalable GPU-
accelerated high-performance processing of general
tensor networks at exascale. Frontiers in Applied
Mathematics and Statistics, 8:838601.
Markov, I. L. and Shi, Y. (2008). Simulating quantum com-
putation by contracting tensor networks. SIAM Jour-
nal on Computing, 38(3):963–981.
Matthews, D. A. (2018). High-performance tensor contrac-
tion without transposition. SIAM Journal on Scientific
Computing, 40(1):C1–C24.
Nielsen, M. A. and Chuang, I. L. (2010). Quantum compu-
tation and quantum information. Cambridge univer-
sity press, New York.
NVIDIA (2023). cuBLAS Library User Guide.
https://docs.nvidia.com/cuda/cublas/index.html.
NVIDIA (2024). cuTENSOR: A High-Performance CUDA
Library For Tensor Primitives. https://docs.nvidia.
com/cuda/cutensor.
Pan, F., Gu, H., Kuang, L., Liu, B., and Zhang, P. (2024).
Efficient quantum circuit simulation by tensor net-
work methods on modern gpus. ACM Transactions
on Quantum Computing, 5(4):1–26.
Pastor, A. M., Badia, J. M., and Castillo, M. (2025). A com-
munity detection-based parallel algorithm for quan-
tum circuit simulation using tensor networks. The
Journal of Supercomputing, 81. Art. no. 450.
Quantiki (2023). List of QC simulators. https://quantiki.
org/wiki/list-qc-simulators.
Springer, P. and Bientinesi, P. (2018). Design of a high-
performance GEMM-like tensor–tensor multiplica-
tion. ACM Transactions on Mathematical Software
(TOMS), 44(3):1–29.
Villalonga, B., Boixo, S., Nelson, B., Henze, C., Rieffel, E.,
Biswas, R., and Mandr
`
a, S. (2019). A flexible high-
performance simulator for verifying and benchmark-
ing quantum circuits implemented on real hardware.
npj Quantum Information, 5(1):86.
Vincent, T., O’Riordan, L. J., Andrenkov, M., Brown, J.,
Killoran, N., Qi, H., and Dhand, I. (2022). Jet: Fast
quantum circuit simulations with parallel task-based
tensor-network contraction. Quantum, 6:709.
Parallel Tensor Network Contraction for Efficient Quantum Circuit Simulation on Multicore CPUs and GPUs
127