A general design for a scalable MPI-GPU Shallow Water Equations solver on a multi-resolution grid

Turchetto, Massimiliano

Please use this identifier to cite or link to this item: https://hdl.handle.net/1889/4014

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Vacondio, Renato	-
dc.contributor.advisor	Dal Palù, Alessandro	-
dc.contributor.author	Turchetto, Massimiliano	-
dc.date.accessioned	2020-04-18T07:25:55Z	-
dc.date.available	2020-04-18T07:25:55Z	-
dc.date.issued	2020-03	-
dc.identifier.uri	http://hdl.handle.net/1889/4014	-
dc.description.abstract	Questa tesi presenta un'implementazione multi-GPU di un risolutore ai volumi finiti che approssima le Shallow Water Equations (SWE) 2D al fine di sumulare fenomeni alluvionali. In letteratura è generalmente riconosciuto il fatto che tali fenomeni stiano accadendo sempre più frequentemente a causa del riscaldamento globale. L'unico modo che abbiamo per rispondere a questi fenomeni è quello di incrementare la resilienza del territorio attraverso una modellazione veloce e accurata. Una versione precedente del risolutore (o solver), rappresenta il punto di partenza di questa tesi, risolvendo le SWE su una griglia multi-risoluzione e effettuando tutti i calcoli utilizzando una Graphic Processing Unit (GPU). Nonostante le numerose ottimizzazioni compiute nel corso degli anni, tale versione non è scalabile. In particolare, non è possibile incrementare le dimensioni delle griglie di input oltre un certo limite dato dalla memoria della singola GPU. Un altro problema è rappresentato dai tempi di simulazione, che per casi realistici, possono superare le 20 ore. L'obiettivo della tesi è quello di superare queste limitazioni servendosi di un insieme di GPU per effettuare i calcoli invece che di una sola. Questo obiettivo può essere raggiunto utilizzando le tecniche note nell'ambito dell'High Performance Computing (HPC), nelle quali la griglia di input viene partizionata in diverse parti, ciascuna associata ad una GPU differente. I bordi delle partizioni adiacenti vengono comunicati utilizzando lo standard Message Passing Interface (MPI), il quale viene utilizzato anche per la riduzione del delta t. Due algoritmi di partizionamento sono stati integrati nel solver: il primo effettua una suddivisione 1D del dominio, mentre il secondo, più sofisticato, è basato sulle Curve di Hilbert. Entrambi gli algoritmi hanno mostrato un alto livello di efficienza nel Weak Scaling Test (~90%), tuttavia nello Strong Scaling Test il secondo è risultato molto più efficiente del primo, ottenendo un livello di efficienza dell'~85%. Le comunicazioni dei bordi effettuate durante ogni time-step sono state mascherate dai calcoli della GPU, guadagnando il 10% di efficienza nello Strong Scaling eseguito su 32 GPU. Un'ottimizzazione effettuata dal solver consiste nell'escludere dai calcoli del time-step le celle asciutte che non possono diventare bagnate nel seguente passo di calcolo. Se da un lato questo approccio permette di eseguire meno calcoli, dall'atro si creano degli sbilanciamenti nel carico computazionale tra le diverse partizioni. In particolare, le GPU contenenti meno celle bagnate sono computazionalmente più leggere rispetto a quelle che ne contengono di più. Per questo motivo è stata progettata e implementata all'interno del risolutore, un'euristica in grado di catturare questi sbilanciamenti durante la simulazione ed equilibrare il carico tra le divrse GPU, convergendo ad una situazione bilanciata.	it
dc.description.abstract	This thesis presents a multi-GPU implementation of a Finite-Volume solver approximating the 2D Shallow Water Equations (SWE) in order to simulate flooding events. There is a consensus in the scientific literature that such events are increasing year by year and the only feasible way to deal with them is to increase the resilience of the territory by making fast and accurate predictions on their evolution. A previous version of the solver, which is the starting point of this thesis, takes a first step in that direction by solving conservation laws across a multi-resolution grid and offloading the whole computation to a Graphic Processing Unit (GPU). Notwithstanding the many optimizations adopted inside the kernels, namely the efficient memory representation and the exclusion of the dry cells from the computational time-steps, the single GPU implementation showed evident scalability limitations on the size of the input grids and the simulation times which may take several hours in case of realistic scenarios. The goal of this thesis is to overcome such limitations by distributing the workload across multiple GPUs, each one managed by a single process. The overall design of the code follows the common approach of the High Performance Computing (HPC) applied to the field of Computational Fluid Dynamics in which the input grid is partitioned in different parts communicating their borders using the Message Passing Interface (MPI) standard and performing a global time-step reduction. Two different partitioning algorithms have been considered: the first making mono-dimensional domain subdivisions and the second, more sophisticated, based on Hilbert Space Filling Curves (HSFC). While both proved efficient in the Weak Scalability Test, showing a constant efficiency of ∼ 90% between 8 and 64 GPUs, the HSFC partitioning outperformed the 1D one in the Strong Scalability Test by reaching an efficiency of 85% on 64 GPUs. The MPI communications have been overlapped with the kernel computations, gaining an efficiency of 10% in the strong scaling up to 32 GPUs. The ever changing nature of the wet and dry fronts in realistic scenarios creates imbalances in the computational load across different MPI ranks. In particular, partitions having less wet cells spend idle times waiting for other processes to terminate their computations in order to perform the global time-step reduction. An heuristic algorithm has been designed and implemented to minimize such idle times by migrating grid portions from computationally lighter processes (having high idle times) to heavier ones. The tests showed that the heuristic works well and it is able decrease the time-step duration by gradually lowering the idle times to the order of microseconds.	it
dc.language.iso	Italiano	it
dc.publisher	Università di Parma. Dipartimento di Ingegneria e architettura	it
dc.relation.ispartofseries	Dottorato di ricerca in Ingegneria civile e architettura	it
dc.rights	© Massimiliano Turchetto, 2020	it
dc.subject	SWE, multi-GPU, multiresolution grid, CUDA, Dynamic Load Balancing, HPC	it
dc.title	A general design for a scalable MPI-GPU Shallow Water Equations solver on a multi-resolution grid	it
dc.type	Doctoral thesis	it
dc.subject.miur	ICAR/02	it
Appears in Collections:	Ingegneria civile, dell'Ambiente, del Territorio e Architettura. Tesi di dottorato

Files in This Item:

File	Description	Size	Format
Relazione-Triennale-Turchetto.pdf Until 2100-01-01	Relazione sull'attività triennale, comprensiva di certificazioni per le attività svolte.	1.23 MB	Adobe PDF	View/Open Request a copy
TurchettoPhdThesisPrintable.pdf	A PDF containing the PhD thesis of Turchetto Massimiliano	2.06 MB	Adobe PDF	View/Open

Show simple item record

DSpaceUnipr

DSpaceUnipr is the institutional repository of the University of Parma. Its aim is to give visibility to the University's scholarly content and learning material.