Design of quantum algorithms and codes on hybrid CPU-QPU architectures
B. Chichereau, S. Vialle, and P. Carribault. Experimenting with hybrid quantum optimization in HPC soft- ware stack for CPU register allocation. In Third International Workshop on Integrating High-Performance and Quantum Computing (WIHPQC 2023) in IEEE Quantum Week 2023 (QCE), Bellevue, United States, Sept. 2023. HAL hal-04272048
Context
Quantum processing units (QPUs) are not yet "reliable products", but working prototypes are available. Quantum computing is becoming a reality, and QPUs should become readily available coprocessors within a few years. It's time to learn how to design hybrid CPU-QPU algorithms and codes, implement quantum circuits and use QPUs efficiently, with a performance-oriented approach (to process faster, or bigger, or with greater precision).
Contribution
In 2023, we began integrating quantum computing into the HPC software stack by releasing quantum computing routines from classical C/C++ code in the GCC compiler, to improve register allocation (article). We are currently improving our algorithm and our quantum code, and increasing the integration of C/C++ and quantum code.
Also in 2023, we began designing, implementing and experimenting with data clustering on QPUs, based on generative network models and hybrid CPU-QPU learning algorithms. Initial results were obtained and published in spring 2024.
Impact
With 2 PhD thesis launched in Quantum Computing (one funded by CEA DAM and one by ANR Quantum Paris-Saclay) and 3 exploratory projects in industrial collaboration with CA-CIB (carried out at CentraleSupélec), we have acquired our first concrete experience in Quantum Computing. This has enabled us to build a teaching module in Quantum Computing (M1 level) with new industrial partners, in order to train more students in this emerging field.
On the scientific front, we are now able to use IBM's QPUs more efficiently for Quantum Machine Learning, and we also reach better register allocation than in 2023. Our know-how is progressing in the design of quantum algorithms and codes.
Design and implementation of scalable tensor algorithms for distributed CPU/GPU architectures
M. Baboulin, O. Kaya, T. Mary, and M. Robeyns. Mixed precision iterative refinement for low-rank matrix and tensor approximations. Working paper or preprint, June 2023. HAL hal-04115337
Context
Tensor computations have gained a tremendous popularity recently as it provides a natural and effective way of processing multidimensional data, which arises in numerous application domains ranging from deep neural networks and quantum simulations to scientific simulations and signal processing. However, operating on tensors is computationally challenging due to a tensor having an exponential size in data dimensionality. Therefore, it is indispensable to provide parallel algorithm and libraries providing high performance implementations to render such computations usable in real-world applications.
Contribution
The main design objective is to leverage what is called "task-based parallelism" based on the runtime system StarPU that enables leveraging all resources (GPU, CPU, accelerators) in compute nodes of a supercomputer. There is a very tightly wide community around task-parallel matrix computations in France; we are pioneering the transition from matrices to tensors in this research direction.
In parallel, we are also developing mixed-precision tensor as well as matrix numerical methods that can leverage mixed-precision AI accelerators introduced in the recent hardware (article)
Both these activities have been funded by ANR JCJC SELESTE and continue to be funded by PEPR Numpex.
Impact
The parallel and mixed-precision tensor algorithms and libraries that we develop are aimed to accelerate all application domains dealing with high-dimensional data or problems and can use tensor computations, decompositions, or networks. The list of such application domains are growing very rapidly in all scientific domains.
Theoretical study of distributed dynamic networks
K. Winkler, A. Paz, H. Rincon Galeana, S. Schmid, and U. Schmid. The Time Complexity of Consensus Under Oblivious Message Adversaries. Algorithmica (2024) 86:1830–1861 DOI 10.1007/s00453-024-01209-4
Originally presented in Innovations in Theoretical Computer Science Conference (ITCS23), Cambridge, United States, Jan. 2023. Schloss Dagstuhl - Leibniz-Zentrum für Informatik. DOI 10.4230/LIPIcs.ITCS.2023.100 - HAL hal-04288670
Context
A significant body of work in the field of distributed graph algorithms concentrates on solving problems on static graphs, such as finding spanning trees or creating proper node coloring, in a distributed manner. Conversely, the theory of dynamic graph algorithms explores data structures that can maintain similar structures in graphs that undergo changes, albeit in a centralized manner. Our objective is to formulate a theory concerning algorithms that possess both features concurrently: distributed and applicable in a dynamic setting.
Contribution
In recent years, we have studied consensus, a classical distributed computing problem with numerous applications, in the distributed dynamic context. We consider networks prone to message omissions, in which the omissions are limited to specific patterns (e.g., between data centers but not inside them).
In the ITCS'23 conference, we presented a study of consensus solvability and time complexity as a function of the network's possible failure patterns. This work was recently published in Algorithmica journal (article).
Impact
The discussed work gives a more straightforward explanation of the prior work of Coulouma, Godard, and Peters (TCS'15). It presents further bounds, thus improving our understanding of the fundamental problem of consensus in the described setting. We are now working on two follow-up projects, one that better explains the current model using tools from combinatorial topology and another that considers permanent node failures instead of benign link failures.
This particular work is part of a research frontier on distributed networks that continuously undergo changes. This topic stands at the center of a recently submitted ANR project.
Theoretical study of resource-constrained, failure-prone distributed systems inspired by natural phenomena
E. Bampas, J. Beauquier, J. Burman, and W. Guy-Obé. Treasure Hunt with Volatile Pheromones. In 37th International Symposium on Distributed Computing (DISC 2023), pages 8:1–8:21, L’Aquila, Italy, Oct. 2023. DOI 10.4230/LIPIcs.DISC.2023.8 - HAL hal-04470589
Context
This is one of the main axes of the ParSys team focusing on the development of theoretical foundations (models, algorithms, analysis, feasibility and lower bound results) for future and nowadays resource constrained failure-prone distributed systems. To deal with their challenges, we frequently take our inspiration from robust and efficient distributed systems in nature (biological and chemical phenomenon) and we apply advanced algorithmic and mathematical methods, like communication complexity, graph theory, computability and complexity, fault-tolerance, self-stabilization, etc.
Contribution
Mobile agent algorithms inspired by nature article 1: Inspired by ant colonies, we propose very weak communication model of mobile agents communicating only by pheromone dropping and sensing. At the difference of previous studies, pheromone evaporates over time. We study the impact of this restricted communication on the treasure hunt (grid exploration) problem, proposing efficient solutions (optimizing resources of time, pheromone and agent utilization) while assuming extremely fast pheromone evaporation.
Population Protocols: In article 2, we design the first time-optimal leader election algorithm tolerating transient faults (by self-stabilization) in the model of population protocols with random interactions. While in article 3, we conduct a study of population protocols with non-uniformly random interactions and derive lower bounds and efficient solutions to the problem of data collection.
Beeping communication model: In this model, nodes communicate by local broadcasts of interfering unary signals called bips (like in some biological or low-power radio networks). In this costly communication model, we propose the first communication primitives (for local send and receive of messages) in a difficult setting of non-synchronized node wake-ups article 4.
Thesis and internships: 1 thesis completed in 2020, 1 ongoing thesis, 1 M2 internship, 5 M1 internships
Impact
These theoretic studies deriving feasibility results and efficient solutions are destined to serve in the design and development of future distributed systems, by establishing possibilities and limitations of their characteristics.
Design of language and library for parallel programming
S. Joube, H. Grasland, D. Chamont, and J. Falcou. Kiwaku, a C++20 library for multidimensional arrays. Application to ACTS tracking. In 26th International Conference on Computing in High Energy & Nuclear Physics (CHEP 2023), Norfolk, United States, May 2023. HAL hal-04401240
Context
This research activity is at the crossroads of both parallel programming, software design and compilation. Without going into the well known "End of the Moore's Law" line of reasoning, scientists are dealing with relatively complex computer hardware that promise to deliver unprecedented number of operations per second. However, accessing this computing power is out of reach for scientists that are not also expert in software development.
If sub-optimal solutions exist in the form of a plethora of hastily drafted Python wrappers of dubious quality around complex solvers or simulation kernel, those usually let a lot of performances down and are still not usable by the majority of scientists. Our goal is to provide properly designed, high ergonomics API using native language like C++ to get the most out of those hardware systems while not sacrificing performances.
Contribution
Kiwaku a C++20 library for multidimensional arrays that use geenrative programmign to auto-adapt to the needs of the users.
ctbench, a set of tools for compile-time benchmarking and analysis in C++ which helps drive the development of other libraries
2 thesis currently being managed, 1 internship, 1 visiting professor.