Home page





Selected Activities


Some Recent Publications:

Improving the computing efficiency of HPC systems using a combination of proactive and preventive checkpointing: M. S. Bouguerra, A. Gainaru, F. Cappello, L. Bautista Gomez, N. Maruyama and S. Matsuoka, Proceedings of IEEE IPDPS 2013.

Towards Efficient Live Migration of I/O Intensive Workloads: A Transparent Storage Transfer Proposal, B. Nicolae, F. Cappello, Proceedings of ACM HPDC 2012.

HydEE: Failure Containment without Event Logging for Large Scale Send-Deterministic MPI Applications, A. Guermouche, T. Ropars, M. Snir, F. Cappello, Proceedings of IEEE IPDPS 2012

Taming of the Shrew: Modeling the Normal and Faulty Behavior of Large-scale HPC Systems, A. Gainaru, F. Cappello, B. Kramer, Proceedings of IEEE IPDPS 2012

FTI: high performance Fault Tolerance Interface for hybrid systems, L. Bautista Gomez; D. Komatitsch, N. Maruyama; S. Tsuboi, F. Cappello, S. Matsuoka, T Nakamura, Proceedings of IEEE/ACM SC11

Modeling and Tolerating Heterogeneous Failures in Large Parallel Systems, E. M. Heien, D. Kondo, A. Gainaru, D. Lapine, B. Kramer, F. Cappello, Proceedings of IEEE/ACM SC11

BlobCR: Efficient Checkpoint-Restart for HPC Applications on IaaS Clouds using Virtual Disk Image Snapshots, B. Nicolae, F. Cappello, Proceedings of IEEE/ACM SC11

Uncoordinated Checkpointing Without Domino Effect for Send-Deterministic Message Passing Applications, Amina Guermouche, Thomas Ropars, Elisabeth Brunet, Marc Snir, Franck Cappello, Proceedings of IPDPS 2011

Selected Publications:

Preventive Migration vs. Preventive Checkpointing for Extreme Scale Supercomputers
Franck Cappello, Henri Casanova, Yves Robert, Parallel Processing Letters 21(2): 111-132 (2011)

On Communication Determinism in Parallel HPC Applications, Franck Cappello, Amina Guermouche, Marc Snir, Proceedings of IEEE ICCCN 2010

Toward Exascale Resilience, Franck Cappello, Al Geist, Bill Gropp, Laxmikant Kale, Bill Kramer, Marc Snir, IJHPCA 23(4): 374-388 (2009)

Fault Tolerance in Petascale/ Exascale Systems: Current Knowledge, Challenges and Research Opportunities, Franck Cappello, INRIA, IJHPCA 23(3): 212-226 (2009)

Grid'5000: a large scale, reconfigurable, controlable and monitorable Grid platform, In IEEE/ACM GRID 2005, 6th International Workshop on Grid Computing, Franck Cappello, et al. [pdf]

MPI versus MPI+OpenMP on the IBM SP for the NAS Benchmarks, ACM/IEEE SC’00 “International Conference for High Performance Computing, Networking, Storage and Analysis”, 2000, Franck Cappello, Daniel Etiemble. [pdf]

MPICH-V: Toward a Scalable Fault Tolerant MPI for Volatile Nodes, ACM/IEEE SC’02 “International Conference for High Performance Computing, Networking, Storage and Analysis”, George Bosilca, Aurelien Bouteiller, Franck Cappello, Samir Djilali, Gilles Fedak, Cecile Germain, Thomas Herault, Pierre Lemarinier, Oleg Lodygensky, Frederic Magniette, Vincent Neri, Anton Selikhov, [pdf]