PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
2015 | 5 | 721--728
Tytuł artykułu

Reproducible Floating-Point Atomic Addition in Data-Parallel Environment

Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
Floating-point additions in concurrent execution environment are known to be hazardous, as the result depends on the order in which operations are performed. This problem is encountered in data parallel execution environments such as GPUs, where reproducibility involving floating-point atomic addition is challenging. This problem is due to the rounding error or cancellation that appears for each operation, combined with the lack of control over execution order. In this article we propose two solutions to address this problem: work reassignment and fixed-point accumulation. Work reassignment consists in enforcing an execution order that leads to weak reproducibility. Fixed-point accumulation consists in avoiding rounding errors altogether thanks to a long accumulator and enables strong reproducibility.(original abstract)
Słowa kluczowe
EN
PL
Rocznik
Tom
5
Strony
721--728
Opis fizyczny
Twórcy
autor
  • Laboratoire DALI-LIRMM, France
  • Bretagne Atlantique Campus de Beaulieu, France
Bibliografia
  • K. Doertel, "Best known method: Avoid heterogeneous precision in control flow calculations," Intel, Tech. Rep., 2013.
  • N. J. Higham, Accuracy and stability of numerical algorithms. SIAM, 2002, second edition. [Online]. Available: http: //www.maths.manchester.ac.uk/∼higham/asna
  • (2014, july) N-body: Fp atomics v. recomputation. [Online]. Available: http://blog.cudahandbook.com/2012/11/ 02/n-body-fp-atomics-v-recomputation.aspx
  • J. Allard, S. Cotin, F. Faure, P.-J. Bensoussan, F. Poyer, C. Duriez, H. Delingette, and L. Grisoni, "Sofa an open source framework for medical simulation," in Medicine Meets Virtual Reality (MMVR'15), Long Beach, USA, February 2007.
  • W.-F. Chiang, G. Gopalakrishnan, Z. Rakamari c, D. H. Ahn, and G. L. Lee, "Determinism and reproducibility in large-scale HPC systems," in Informal Proceedings of the 4th Workshop on Determinism and Correctness in Parallel Programming (WoDet 2013), 2013.
  • P. Bakkum and K. Skadron, "Accelerating SQL database operations on a GPU with CUDA," in Proceedings of 3rd Workshop on General Purpose Processing on Graphics Processing Units, GPGPU 2010, Pittsburgh, Pennsylvania, USA, March 14, 2010, ser. ACM International Conference Proceeding Series, D. R. Kaeli and M. Leeser, Eds., vol. 425. ACM, 2010, pp. 94-103. [Online]. Available: http://doi.acm.org/10.1145/1735688.1735706
  • S. Collange, D. Defour, S. Graillat, and R. Iakymchuk, "Full-Speed Deterministic Bit-Accurate Parallel Floating-Point Summation on Multi- and Many-Core Architectures," INRIA, DALI-LIRMM, LIP6, ICS, Tech. Rep. HAL: hal-00949355, Feb. 2014.
  • J. Demmel and H. D. Nguyen, "Parallel reproducible summation," IEEE Trans. Computers, vol. 64, no. 7, pp. 2060-2070, 2015. [Online]. Available: http://doi.ieeecomputersociety.org/10.1109/TC. 2014.2345391
  • N. J. Higham, Accuracy and stability of numerical algorithms, 2nd ed. Philadelphia, PA: Society for Industrial and Applied Mathematics (SIAM), 2002.
  • J.-M. Muller and al., Handbook of floating-point arithmetic. Birkhäuser, 2010.
  • J. Nickolls and W. J. Dally, "The GPU computing era," IEEE Micro, vol. 30, pp. 56-69, March 2010. [Online]. Available: http://dx.doi.org/10.1109/MM.2010.41
  • D. Defour, "Impacting predictability of gpu's," HAL-CCSD, Tech. Rep. hal-00951920, 2013. [Online]. Available: http://hal.archivesouvertes.fr/hal-00951920
  • V. Volkov and J. W. Demmel, "LU , QR and Cholesky factorizations using vector capabilities of GPUs," Department of Electrical Engineering and Computer Science, University of California, Berkeley, inst-UCB-EECS:adr, LAPACK Working Note 202, May 2008. [Online]. Available: http://www.netlib.org/lapack/lawnspdf/ lawn202.pdf
  • W. chun Feng and S. Xiao, "To gpu synchronize or not gpu synchronize?" in Circuits and Systems (ISCAS), Proceedings of 2010 IEEE International Symposium on, 2010, pp. 3801-3804.
  • S. Xiao and W. chun Feng, "Inter-block GPU communication via fast barrier synchronization," in IPDPS. IEEE, 2010, pp. 1-12. [Online]. Available: http://dx.doi.org/10.1109/IPDPS.2010.5470477
  • J. A. Stuart and J. D. Owens, "Efficient synchronization primitives for GPUs," CoRR, vol. Abs/1110.4623, 2011. [Online]. Available: http://arxiv.org/abs/1110.4623
  • J. Sanders and E. Kandrot, CUDA by example: an introduction to general-purpose GPU programming. pub-AW:adr: Addison-Wesley, 2010.
  • U. W. Kulisch, Computer arithmetic and validity, 2nd ed., ser. de Gruyter Studies in Mathematics. Berlin: Walter de Gruyter & Co., 2013, vol. 33, theory, implementation, and applications.
  • T. Grandlund, "GNU MP: The GNU Multiple Precision Arithmetic Library," http://gmplib.org.
  • G. Bohlender and U. Kulisch, "Comments on fast and exact accumulation of products," in Applied Parallel and Scientific Computing. Springer, 2012, pp. 148-156.
Typ dokumentu
Bibliografia
Identyfikatory
Identyfikator YADDA
bwmeta1.element.ekon-element-000171424168

Zgłoszenie zostało wysłane

Zgłoszenie zostało wysłane

Musisz być zalogowany aby pisać komentarze.
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.