Multi-GPU and multi-CPU accelerated FDTD scheme for vibroacoustic applications

Francés Monllor, Jorge; Otero Calviño, Beatriz; Bleda Pérez, Sergio; Gallego Rico, Sergi; Neipp López, Cristian; Urbano-Márquez, A.; Beléndez Vazquez, Augusto

doi:10.1016/j.cpc.2015.01.017

dc.contributor.author	Francés Monllor, Jorge
dc.contributor.author	Otero Calviño, Beatriz
dc.contributor.author	Bleda Pérez, Sergio
dc.contributor.author	Gallego Rico, Sergi
dc.contributor.author	Neipp López, Cristian
dc.contributor.author	Urbano-Márquez, A.
dc.contributor.author	Beléndez Vazquez, Augusto
dc.contributor.other	Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors
dc.date.accessioned	2016-01-21T19:48:21Z
dc.date.available	2017-06-01T00:30:23Z
dc.date.issued	2015-06-01
dc.identifier.citation	Francés, J., Otero, B., Bleda, S., Gallego, S., Neipp, C., Urbano-Márquez, A., Beléndez, A. Multi-GPU and multi-CPU accelerated FDTD scheme for vibroacoustic applications. "Computer physics communications", 01 Juny 2015, vol. 191, p. 43-51.
dc.identifier.issn	0010-4655
dc.identifier.uri	http://hdl.handle.net/2117/81842
dc.description.abstract	The Finite-Difference Time-Domain (FDTD) method is applied to the analysis of vibroacoustic problems and to study the propagation of longitudinal and transversal waves in a stratified media. The potential of the scheme and the relevance of each acceleration strategy for massively computations in FDTD are demonstrated in this work. In this paper, we propose two new specific implementations of the bidimensional scheme of the FDTD method using multi-CPU and multi-GPU, respectively. In the first implementation, an open source message passing interface (OMPI) has been included in order to massively exploit the resources of a biprocessor station with two Intel Xeon processors. Moreover, regarding CPU code version, the streaming SIMD extensions (SSE) and also the advanced vectorial extensions (AVX) have been included with shared memory approaches that take advantage of the multi-core platforms. On the other hand, the second implementation called the multi-GPU code version is based on Peer-to-Peer communications available in CUDA on two GPUs (NVIDIA GTX 670). Subsequently, this paper presents an accurate analysis of the influence of the different code versions including shared memory approaches, vector instructions and multi-processors (both CPU and GPU) and compares them in order to delimit the degree of improvement of using distributed solutions based on multi-CPU and multi-GPU. The performance of both approaches was analysed and it has been demonstrated that the addition of shared memory schemes to CPU computing improves substantially the performance of vector instructions enlarging the simulation sizes that use efficiently the cache memory of CPUs. In this case GPU computing is slightly twice times faster than the fine tuned CPU version in both cases one and two nodes. However, for massively computations explicit vector instructions do not worth it since the memory bandwidth is the limiting factor and the performance tends to be the same than the sequential version with auto-vectorisation and also shared memory approach. In this scenario GPU computing is the best option since it provides a homogeneous behaviour. More specifically, the speedup of GPU computing achieves an upper limit of 12 for both one and two GPUs, whereas the performance reaches peak values of 80 GFlops and 146 GFlops for the performance for one GPU and two GPUs respectively. Finally, the method is applied to an earth crust profile in order to demonstrate the potential of our approach and the necessity of applying acceleration strategies in these type of applications.
dc.format.extent	9 p.
dc.language.iso	eng
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/es/
dc.subject	Àrees temàtiques de la UPC::Física::Electromagnetisme
dc.subject	Àrees temàtiques de la UPC::Matemàtiques i estadística::Anàlisi numèrica::Mètodes numèrics
dc.subject.lcsh	Electrodynamics
dc.subject.lcsh	Electromagnetism
dc.subject.other	FDTD
dc.subject.other	GPU computing
dc.subject.other	OMPI
dc.subject.other	SIMD extensions
dc.subject.other	SV-Wave propagation
dc.subject.other	Performance analysis
dc.subject.other	Media
dc.subject.other	Implementation
dc.subject.other	Simulation
dc.subject.other	MPI
dc.title	Multi-GPU and multi-CPU accelerated FDTD scheme for vibroacoustic applications
dc.type	Article
dc.subject.lemac	Electrodinàmica
dc.subject.lemac	Electromagnetisme
dc.contributor.group	Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions
dc.identifier.doi	10.1016/j.cpc.2015.01.017
dc.description.peerreviewed	Peer Reviewed
dc.rights.access	Open Access
local.identifier.drac	15643500
dc.description.version	Postprint (author's final draft)
dc.relation.projectid	info:eu-repo/grantAgreement/MINECO/6PN/TIN2012-34557
local.citation.author	Francés, J.; Otero, B.; Bleda, S.; Gallego, S.; Neipp, C.; Urbano-Márquez, A.; Beléndez, A.
local.citation.publicationName	Computer physics communications
local.citation.volume	191
local.citation.startingPage	43
local.citation.endingPage	51

Fitxers d'aquest items

Nom:: Multi-GPU and multi-CPU accelerated ...
Mida:: 1,544Mb
Format:: PDF

Visualitza/Obre

Aquest ítem apareix a les col·leccions següents

Articles de revista [1.050]
Articles de revista [382]

Mostra el registre d'ítem simple

UPCommons. Portal del coneixement obert de la UPC

Multi-GPU and multi-CPU accelerated FDTD scheme for vibroacoustic applications

Fitxers d'aquest items

Aquest ítem apareix a les col·leccions següents

Explora