Français Anglais
Accueil Annuaire Plan du site
Home > Research results > Dissertations & habilitations
Research results
Ph.D de

Ph.D
Group : Parallel Systems

Lattice QCD Optimization and Polytopic Representations of Distributed Memory

Starts on 13/09/2010
Advisor : EISENBEIS, Christine

Funding : contrat doctoral INRIA
Affiliation : Université Paris-Sud
Laboratory : INRIA Alchemy

Defended on 26/09/2014, committee :
Directrice de thèse :
- Christine Eisenbeis, INRIA/LRI/U-PSud

Rapporteurs :
- Philippe Clauss, Université de Strasbourg ;
- J. Ramanujam, Louisiana State University.

Examinateurs :
- Sid Touati, Université Nice Sophia Antipolis ;
- Paul Feautrier, ENS Lyon ;
- Sylvain Conchon, Université Paris-Sud XI ;
- Olivier Pène, CNRS/LPT Orsay.

Research activities :

Abstract :
Motivated by modern day physics which in addition to experiments also
tries to verify and deduce laws of nature by simulating the
state-of-the-art physical models using oversized computers, this
thesis explores means of accelerating such simulations by improving
the simulation programs they run. The primary focus is Lattice
Quantum Chromodynamics (QCD), a branch of quantum field theory,
running on IBM newest supercomputer, the Blue Gene/Q.

In a first approach, the source code of tmLQCD, a Lattice QCD program,
is improved to run faster on the Blue Gene machine. Its most
performance-relevant operation is a 8-point stencil in 4 dimensional
space. Two different optimization strategies are perused: One with
the priority of improving spatial and temporal locality, and a second
making use of the hardware's data stream prefetcher. On Blue Gene/Q
the first strategy reaches up to 20% of the peak theoretical floating
point operation performance of that machine. The second strategy with
up to 54% of peak is much faster at the cost of using 4 times more
memory by storing the data in the order they will be used in the next
stencil operation, duplicating data where necessary.

Other techniques exploited are direct programming of the messaging
hardware (called MUSPI by IBM), a low-overhead work distribution
mechanism for threads, explicit data prefetching of data (using dcbt
instruction) and manual vectorization (using QPX; width-4 SIMD
instructions). Hardware-based list prefetching and transactional
memory - both distinct and novel features of the Blue Gene/Q system --
did not improve the program's performance.

The second approach is the newly-written LLVM compiler extension
called Molly which optimizes the program itself, specifically the
distribution of data and work between the nodes of a cluster machine
such as Blue Gene/Q. Molly represents arrays using integer polyhedra
and uses another already existing compiler extension Polly which
represents statements and loops using polyhedra. When Molly knows how
data is distributed among the nodes and where statements are executed,
it adds code that manages the data flow between the nodes. Molly can
also permute the order of data in memory.

Molly's main task is to cluster data into sets that are sent to the
same target into the same buffer because single transfers involve a
massive overhead. We present an algorithm that minimizes the number
of transfers for unparametrized loops using anti-chains of data flows.
In addition, we implement a heuristic that takes into account how the
programmer wrote the code. Asynchronous communication primitives are
inserted right after the data is available respectively just before it
is used. A runtime library implements these primitives using MPI.

Molly manages to distribute any code that is representable by the
polyhedral model, but does so best for stencils codes such as Lattice
QCD. Compiled using Molly, the Lattice QCD stencil reaches 2.5% of
the theoretical peak performance. The performance gap is mostly
because all the other optimizations are missing, such as
vectorization. Future versions of Molly may also effectively handle
non-stencil codes and use make use of all the optimizations that make
the manually optimized Lattice QCD stencil so fast.

Ph.D. dissertations & Faculty habilitations
DECODING THE PLATFORM SOCIETY: ORGANIZATIONS, MARKETS AND NETWORKS IN THE DIGITAL ECONOMY
The original manuscript conceptualizes the recent rise of digital platforms along three main dimensions: their nature of coordination devices fueled by data, the ensuing transformations of labor, and the accompanying promises of societal innovation. The overall ambition is to unpack the coordination role of the platform and where it stands in the horizon of the classical firm – market duality. It is also to precisely understand how it uses data to do so, where it drives labor, and how it accommodates socially innovative projects. I extend this analysis to show continuity between today’s society dominated by platforms and the “organizational society”, claiming that platforms are organized structures that distribute resources, produce asymmetries of wealth and power, and push social innovation to the periphery of the system. I discuss the policy implications of these tendencies and propose avenues for follow-up research.

DISTRIBUTED COMPUTING WITH LIMITED RESOURCES


VALORISATION DES DONNéES POUR LA RECHERCHE D'EMPLO