LISN Portfolio - HCERES 2024

Design of quantum algorithms and codes on hybrid CPU-QPU architectures

B. Chichereau, S. Vialle, and P. Carribault. Experimenting with hybrid quantum optimization in HPC soft- ware stack for CPU register allocation. In Third International Workshop on Integrating High-Performance and Quantum Computing (WIHPQC 2023) in IEEE Quantum Week 2023 (QCE), Bellevue, United States, Sept. 2023. HAL hal-04272048

Context

Quantum processing units (QPUs) are not yet "reliable products", but working prototypes are available. Quantum computing is becoming a reality, and QPUs should become readily available coprocessors within a few years. It's time to learn how to design hybrid CPU-QPU algorithms and codes, implement quantum circuits and use QPUs efficiently, with a performance-oriented approach (to process faster, or bigger, or with greater precision).

Contribution

In 2023, we began integrating quantum computing into the HPC software stack by releasing quantum computing routines from classical C/C++ code in the GCC compiler, to improve register allocation (article). We are currently improving our algorithm and our quantum code, and increasing the integration of C/C++ and quantum code.

Also in 2023, we began designing, implementing and experimenting with data clustering on QPUs, based on generative network models and hybrid CPU-QPU learning algorithms. Initial results were obtained and published in spring 2024.

Impact

With 2 PhD thesis launched in Quantum Computing (one funded by CEA DAM and one by ANR Quantum Paris-Saclay) and 3 exploratory projects in industrial collaboration with CA-CIB (carried out at CentraleSupélec), we have acquired our first concrete experience in Quantum Computing. This has enabled us to build a teaching module in Quantum Computing (M1 level) with new industrial partners, in order to train more students in this emerging field.

On the scientific front, we are now able to use IBM's QPUs more efficiently for Quantum Machine Learning, and we also reach better register allocation than in 2023. Our know-how is progressing in the design of quantum algorithms and codes.

Design and implementation of scalable tensor algorithms for distributed CPU/GPU architectures

M. Baboulin, O. Kaya, T. Mary, and M. Robeyns. Mixed precision iterative refinement for low-rank matrix and tensor approximations. Working paper or preprint, June 2023. HAL hal-04115337

Context

Tensor computations have gained a tremendous popularity recently as it provides a natural and effective way of processing multidimensional data, which arises in numerous application domains ranging from deep neural networks and quantum simulations to scientific simulations and signal processing. However, operating on tensors is computationally challenging due to a tensor having an exponential size in data dimensionality. Therefore, it is indispensable to provide parallel algorithm and libraries providing high performance implementations to render such computations usable in real-world applications.

Contribution

The main design objective is to leverage what is called "task-based parallelism" based on the runtime system StarPU that enables leveraging all resources (GPU, CPU, accelerators) in compute nodes of a supercomputer. There is a very tightly wide community around task-parallel matrix computations in France; we are pioneering the transition from matrices to tensors in this research direction.

In parallel, we are also developing mixed-precision tensor as well as matrix numerical methods that can leverage mixed-precision AI accelerators introduced in the recent hardware (article)

Both these activities have been funded by ANR JCJC SELESTE and continue to be funded by PEPR Numpex.

Impact

The parallel and mixed-precision tensor algorithms and libraries that we develop are aimed to accelerate all application domains dealing with high-dimensional data or problems and can use tensor computations, decompositions, or networks. The list of such application domains are growing very rapidly in all scientific domains.

Theoretical study of distributed dynamic networks

K. Winkler, A. Paz, H. Rincon Galeana, S. Schmid, and U. Schmid. The Time Complexity of Consensus Under Oblivious Message Adversaries. Algorithmica (2024) 86:1830–1861 DOI 10.1007/s00453-024-01209-4

Originally presented in Innovations in Theoretical Computer Science Conference (ITCS23), Cambridge, United States, Jan. 2023. Schloss Dagstuhl - Leibniz-Zentrum für Informatik. DOI 10.4230/LIPIcs.ITCS.2023.100 - HAL hal-04288670

Context

A significant body of work in the field of distributed graph algorithms concentrates on solving problems on static graphs, such as finding spanning trees or creating proper node coloring, in a distributed manner. Conversely, the theory of dynamic graph algorithms explores data structures that can maintain similar structures in graphs that undergo changes, albeit in a centralized manner. Our objective is to formulate a theory concerning algorithms that possess both features concurrently: distributed and applicable in a dynamic setting.

Contribution

In recent years, we have studied consensus, a classical distributed computing problem with numerous applications, in the distributed dynamic context. We consider networks prone to message omissions, in which the omissions are limited to specific patterns (e.g., between data centers but not inside them).

In the ITCS'23 conference, we presented a study of consensus solvability and time complexity as a function of the network's possible failure patterns. This work was recently published in Algorithmica journal (article).

Impact

The discussed work gives a more straightforward explanation of the prior work of Coulouma, Godard, and Peters (TCS'15). It presents further bounds, thus improving our understanding of the fundamental problem of consensus in the described setting. We are now working on two follow-up projects, one that better explains the current model using tools from combinatorial topology and another that considers permanent node failures instead of benign link failures.

This particular work is part of a research frontier on distributed networks that continuously undergo changes. This topic stands at the center of a recently submitted ANR project.

Theoretical study of resource-constrained, failure-prone distributed systems inspired by natural phenomena

E. Bampas, J. Beauquier, J. Burman, and W. Guy-Obé. Treasure Hunt with Volatile Pheromones. In 37th International Symposium on Distributed Computing (DISC 2023), pages 8:1–8:21, L’Aquila, Italy, Oct. 2023. DOI 10.4230/LIPIcs.DISC.2023.8 - HAL hal-04470589

Context

This is one of the main axes of the ParSys team focusing on the development of theoretical foundations (models, algorithms, analysis, feasibility and lower bound results) for future and nowadays resource constrained failure-prone distributed systems. To deal with their challenges, we frequently take our inspiration from robust and efficient distributed systems in nature (biological and chemical phenomenon) and we apply advanced algorithmic and mathematical methods, like communication complexity, graph theory, computability and complexity, fault-tolerance, self-stabilization, etc.

Contribution

Mobile agent algorithms inspired by nature article 1: Inspired by ant colonies, we propose very weak communication model of mobile agents communicating only by pheromone dropping and sensing. At the difference of previous studies, pheromone evaporates over time. We study the impact of this restricted communication on the treasure hunt (grid exploration) problem, proposing efficient solutions (optimizing resources of time, pheromone and agent utilization) while assuming extremely fast pheromone evaporation.
Population Protocols: In article 2, we design the first time-optimal leader election algorithm tolerating transient faults (by self-stabilization) in the model of population protocols with random interactions. While in article 3, we conduct a study of population protocols with non-uniformly random interactions and derive lower bounds and efficient solutions to the problem of data collection.
Beeping communication model: In this model, nodes communicate by local broadcasts of interfering unary signals called bips (like in some biological or low-power radio networks). In this costly communication model, we propose the first communication primitives (for local send and receive of messages) in a difficult setting of non-synchronized node wake-ups article 4.
Thesis and internships: 1 thesis completed in 2020, 1 ongoing thesis, 1 M2 internship, 5 M1 internships

Impact

These theoretic studies deriving feasibility results and efficient solutions are destined to serve in the design and development of future distributed systems, by establishing possibilities and limitations of their characteristics.

Design of language and library for parallel programming

S. Joube, H. Grasland, D. Chamont, and J. Falcou. Kiwaku, a C++20 library for multidimensional arrays. Application to ACTS tracking. In 26th International Conference on Computing in High Energy & Nuclear Physics (CHEP 2023), Norfolk, United States, May 2023. HAL hal-04401240

Context

This research activity is at the crossroads of both parallel programming, software design and compilation. Without going into the well known "End of the Moore's Law" line of reasoning, scientists are dealing with relatively complex computer hardware that promise to deliver unprecedented number of operations per second. However, accessing this computing power is out of reach for scientists that are not also expert in software development.

If sub-optimal solutions exist in the form of a plethora of hastily drafted Python wrappers of dubious quality around complex solvers or simulation kernel, those usually let a lot of performances down and are still not usable by the majority of scientists. Our goal is to provide properly designed, high ergonomics API using native language like C++ to get the most out of those hardware systems while not sacrificing performances.

Contribution

Kiwaku a C++20 library for multidimensional arrays that use geenrative programmign to auto-adapt to the needs of the users.
ctbench, a set of tools for compile-time benchmarking and analysis in C++ which helps drive the development of other libraries
2 thesis currently being managed, 1 internship, 1 visiting professor.

Portfolio of team ParSys Parallel Systems Systèmes Parallèles

Design of quantum algorithms and codes on hybrid CPU-QPU architectures

Context

Contribution

Impact

Design and implementation of scalable tensor algorithms for distributed CPU/GPU architectures

Context

Contribution

Impact

Theoretical study of distributed dynamic networks

Context

Contribution

Impact

Theoretical study of resource-constrained, failure-prone distributed systems inspired by natural phenomena

Context

Contribution

Impact

Design of language and library for parallel programming

Context

Contribution

Portfolio of team ParSys
Parallel Systems
Systèmes Parallèles