Finite Element Numerical Integration on Xeon Phi coprocessor

Krużel, Filip; Banaś, Krzysztof

Artykuł - szczegóły

Czasopismo

Annals of Computer Science and Information Systems

2014 | 2 | 603--612

Tytuł artykułu

Finite Element Numerical Integration on Xeon Phi coprocessor

Autorzy

Filip Krużel , Krzysztof Banaś

Warianty tytułu

Języki publikacji

Abstrakty

In the present article we describe the implementation of the finite element numerical integration algorithm for the Xeon Phi coprocessor. The coprocessor is an extension of the idea of the many-core specialized unit for calculations and, by assumption, its performance has to be competitive with the current families of GPUs. Its main advantage is the built-in set of 512-bit vector registers and the ease of transferring existing codes from normal x86 architectures. In the article we verify the performance of previously developed OpenCL algorithms for finite element numerical integration, ported to the new Xeon Phi coprocessor architecture. The algorithm is tested for standard FEM approximations of selected problems. The obtained timing results allow to compare the performance of the OpenCL kernels executed on the Xeon Phi and the contemporary GPUs.(original abstract)

Słowa kluczowe

Numeric algorithms Smart cards Hardware

Algorytmy numeryczne Karty procesorowe Sprzęt komputerowy

Czasopismo

Annals of Computer Science and Information Systems

Rocznik

2014

Tom

Strony

603--612

Opis fizyczny

Twórcy

autor

Filip Krużel

Cracow University of Technology, Poland

autor

Krzysztof Banaś

AGH University of Science and Technology Kraków, Poland

Bibliografia

AMD, AMD Accelerated Parallel Processing. OpenCL Programming Guide, revision 2.7, 2013.
Banaś K., and Krużel F., "Large scale numerical integration on GPU", submitted for publication.
Banaś K., Płaszewski P., and Macioł P., "Numerical integration on GPUs for higher order finite elements", Computers & Mathematics with Applications, vol. 67 (6), pp. 1319-1344, 2014, http://dx.doi.org/10.1016/j.camwa.2014.01.021
Barker K. J., Davis K., Hoisie A., Kerbyson D. K., Lang M., Pakin S., and Sancho J. C., "Entering the petaflop era: The architecture and performance of Roadrunner," High Performance Computing, Networking, Storage and Analysis, pp. 1-11, Nov. 2008, http://dx.doi.org/10.1109/SC.2008.5217926
Gaster B., Kaeli D., Howes L., Mistry P., and Schaa D., Heterogeneous Computing With OpenCL, Elsevier Science & Technology, 2011.
Goodwins R., "Intel unveils many-core Knights platform for HPC", www.zdnet.co.uk, 2010.
Govindaraju N. K., Larsen S., Gray J., and Manocha D., "A memory model for scientific algorithms on graphics processors," SC 2006 Conference, Proceedings of the ACM/IEEE, Nov. 2006, http://dx.doi.org/10.1109/SC.2006.2
IBM, Cell Broadband Engine Programming Handbook Including the PowerXCell 8i Processor, version 1.11, May 2008.
Intel, Intel 64 and IA-32 Architectures Optimization Reference Manual, April 2012.
Intel, Intel SDK for OpenCL Applications XE 2013 R2 Optimization Guide, 2013.
Intel, Intel Xeon Phi Coprocessor Datasheet, June 2013.
Intel, Intel Xeon Phi Product Family Performance, revision 1.4, 12th December 2013.
Khronos OpenCLWorking Group, The OpenCL Specification, Ed. A. Munshi, version 1.2, revision 19, 2012.
Krużel F., and Banaś K., "Vectorized OpenCL implementation of numerical integration for higher order finite elements," Computers & Mathematics with Applications, vol. 66 (10), pp. 2030-2044, 2013, http://dx.doi.org/10.1016/j.camwa.2013.08.026
Michalik K., Banaś K., Płaszewski P., and Cybułka P., "ModFem : a computational framework for parallel adaptive finite element simulations", Computer Methods in Materials Science, vol 13 (1), pp 3-8, 2013.
Morgan T. P., Intel teaches Xeon Phi x86 coprocessor snappy new tricks, www.theregister.co.uk, 2012.
NVIDIA, "NVIDIA'a Next Generation CUDA Compute Architecture: Kepler GK110. The Fastest, Most Efficient HPC Architecture Ever Built", Whitepaper, ver. 1.0, 2012.
NVIDIA, "Tesla K-Series Datasheet", Oct. 2013.
NVIDIA, CUDA C Programming Guide, version 6.0, 2014.
Rojek K., and Szustak L., "Adaptation of double-precision matrix multiplication to the Cell Broadband Engine architecture," in: PPAM'09: Proceedings of the 8th international conference on Parallel processing and applied mathematics, Springer-Verlag, Berlin, Heidelberg, pp. 535-546, 2010.
Roth F., System Administration for the Intel Xeon Phi Coprocessor, 2013.
Rul S., Vandierendonck H., D' Haene J., and De Bosschere K., "An experimental study on performance portability of OpenCL kernels", in: Application Accelerators in High Performance Computing, 2010 Symposium, Knoxville, TN, USA, p. 3, 2010.
Seiler L., Carmean D., Sprangle E., Forsyth T., Abrash M., Dubey P., et al., "Larrabee: a many-core x86 architecture for visual computing", in SIGGRAPH '08: ACM SIGGRAPH 2008 papers, pp. 1-15, 2008, http://dx.doi.org/10.1145/1399504.1360617
Wilt N., The CUDA Handbook: A Comprehensive Guide to GPU Programming, Addison-Wesley Professional, 2013

Typ dokumentu

Bibliografia

Identyfikatory

Identyfikator YADDA

bwmeta1.element.ekon-element-000171327099

Komentarze

Musisz być zalogowany aby pisać komentarze.

Annals of Computer Science and Information Systems

Finite Element Numerical Integration on Xeon Phi coprocessor

Zgłoszenie zostało wysłane

Zgłoszenie zostało wysłane