Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
2014 | 2 | 603--612
Tytuł artykułu

Finite Element Numerical Integration on Xeon Phi coprocessor

Warianty tytułu
Języki publikacji
In the present article we describe the implementation of the finite element numerical integration algorithm for the Xeon Phi coprocessor. The coprocessor is an extension of the idea of the many-core specialized unit for calculations and, by assumption, its performance has to be competitive with the current families of GPUs. Its main advantage is the built-in set of 512-bit vector registers and the ease of transferring existing codes from normal x86 architectures. In the article we verify the performance of previously developed OpenCL algorithms for finite element numerical integration, ported to the new Xeon Phi coprocessor architecture. The algorithm is tested for standard FEM approximations of selected problems. The obtained timing results allow to compare the performance of the OpenCL kernels executed on the Xeon Phi and the contemporary GPUs.(original abstract)
Opis fizyczny
  • Cracow University of Technology, Poland
  • AGH University of Science and Technology Kraków, Poland
  • AMD, AMD Accelerated Parallel Processing. OpenCL Programming Guide, revision 2.7, 2013.
  • Banaś K., and Krużel F., "Large scale numerical integration on GPU", submitted for publication.
  • Banaś K., Płaszewski P., and Macioł P., "Numerical integration on GPUs for higher order finite elements", Computers & Mathematics with Applications, vol. 67 (6), pp. 1319-1344, 2014,
  • Barker K. J., Davis K., Hoisie A., Kerbyson D. K., Lang M., Pakin S., and Sancho J. C., "Entering the petaflop era: The architecture and performance of Roadrunner," High Performance Computing, Networking, Storage and Analysis, pp. 1-11, Nov. 2008,
  • Gaster B., Kaeli D., Howes L., Mistry P., and Schaa D., Heterogeneous Computing With OpenCL, Elsevier Science & Technology, 2011.
  • Goodwins R., "Intel unveils many-core Knights platform for HPC",, 2010.
  • Govindaraju N. K., Larsen S., Gray J., and Manocha D., "A memory model for scientific algorithms on graphics processors," SC 2006 Conference, Proceedings of the ACM/IEEE, Nov. 2006,
  • IBM, Cell Broadband Engine Programming Handbook Including the PowerXCell 8i Processor, version 1.11, May 2008.
  • Intel, Intel 64 and IA-32 Architectures Optimization Reference Manual, April 2012.
  • Intel, Intel SDK for OpenCL Applications XE 2013 R2 Optimization Guide, 2013.
  • Intel, Intel Xeon Phi Coprocessor Datasheet, June 2013.
  • Intel, Intel Xeon Phi Product Family Performance, revision 1.4, 12th December 2013.
  • Khronos OpenCLWorking Group, The OpenCL Specification, Ed. A. Munshi, version 1.2, revision 19, 2012.
  • Krużel F., and Banaś K., "Vectorized OpenCL implementation of numerical integration for higher order finite elements," Computers & Mathematics with Applications, vol. 66 (10), pp. 2030-2044, 2013,
  • Michalik K., Banaś K., Płaszewski P., and Cybułka P., "ModFem : a computational framework for parallel adaptive finite element simulations", Computer Methods in Materials Science, vol 13 (1), pp 3-8, 2013.
  • Morgan T. P., Intel teaches Xeon Phi x86 coprocessor snappy new tricks,, 2012.
  • NVIDIA, "NVIDIA'a Next Generation CUDA Compute Architecture: Kepler GK110. The Fastest, Most Efficient HPC Architecture Ever Built", Whitepaper, ver. 1.0, 2012.
  • NVIDIA, "Tesla K-Series Datasheet", Oct. 2013.
  • NVIDIA, CUDA C Programming Guide, version 6.0, 2014.
  • Rojek K., and Szustak L., "Adaptation of double-precision matrix multiplication to the Cell Broadband Engine architecture," in: PPAM'09: Proceedings of the 8th international conference on Parallel processing and applied mathematics, Springer-Verlag, Berlin, Heidelberg, pp. 535-546, 2010.
  • Roth F., System Administration for the Intel Xeon Phi Coprocessor, 2013.
  • Rul S., Vandierendonck H., D' Haene J., and De Bosschere K., "An experimental study on performance portability of OpenCL kernels", in: Application Accelerators in High Performance Computing, 2010 Symposium, Knoxville, TN, USA, p. 3, 2010.
  • Seiler L., Carmean D., Sprangle E., Forsyth T., Abrash M., Dubey P., et al., "Larrabee: a many-core x86 architecture for visual computing", in SIGGRAPH '08: ACM SIGGRAPH 2008 papers, pp. 1-15, 2008,
  • Wilt N., The CUDA Handbook: A Comprehensive Guide to GPU Programming, Addison-Wesley Professional, 2013
Typ dokumentu
Identyfikator YADDA

Zgłoszenie zostało wysłane

Zgłoszenie zostało wysłane

Musisz być zalogowany aby pisać komentarze.
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.