**KHABOU Amal**
Ph.D

Group :

*Dense matrix computations: communication cost and numerical stability*
Starts on 01/10/2009

Advisor : GRIGORI, Laura

**Funding :** contrat doctoral du Ministère

**Affiliation :** Université Paris-Sud

**Laboratory :** LRI - GRAND LARGE

**Defended** on 11/02/2013, committee :

Nicholas Higham (Rapporteur), Professeur, School of Mathematics, the University of Manchester

Yves Robert (Rapporteur), Professeur, Ecole Normale Supérieure de Lyon

Iain Duff (Examinateur), Professeur, the University of Strathclde

Yannis Manoussakis (Examinateur), Professeur, Université Paris Sud

Jean-Louis Roch (Examinateur), Maître de Conférences, IMAG

Laura Grigori (Directeur de thèse), Directeur de Recherche, INRIA

**Research activities :**
**Abstract :**
This dissertation focuses on a widely used linear algebra kernel to solve linear systems, that is the LU

decomposition. Usually, to perform such a computation one uses the Gaussian elimination with partial

pivoting (GEPP). The backward stability of GEPP depends on a quantity which is referred to as the

growth factor, it is known that in general GEPP leads to modest element growth in practice. However

its parallel version does not attain the communication lower bounds. Indeed the panel factorization represents

a bottleneck in terms of communication. To overcome this communication bottleneck, Grigori

et al [60] have developed a communication avoiding LU factorization (CALU), which is asymptotically

optimal in terms of communication cost at the cost of some redundant computation. In theory, the upper

bound of the growth factor is larger than that of Gaussian elimination with partial pivoting, however

CALU is stable in practice. To improve the upper bound of the growth factor, we study a new pivoting

strategy based on strong rank revealing QR factorization. Thus we develop a new block algorithm for

the LU factorization. This algorithm has a smaller growth factor upper bound compared to Gaussian

elimination with partial pivoting. The strong rank revealing pivoting is then combined with tournament

pivoting strategy to produce a communication avoiding LU factorization that is more stable than CALU.

For hierarchical systems, multiple levels of parallelism are available. However, none of the previously

cited methods fully exploit these hierarchical systems. We propose and study two recursive algorithms

based on the communication avoiding LU algorithm, which are more suitable for architectures with

multiple levels of parallelism. For an accurate and realistic cost analysis of these hierarchical algorithms,

we introduce a hierarchical parallel performance model that takes into account processor and

network hierarchies. This analysis enables us to accurately predict the performance of the hierarchical

LU factorization on an exascale platform.