Reproducible Floating-Point Atomic Addition in Data-Parallel Environment

Defour, David; Collange, Sylvain

doi:10.15439/2015F86

Artykuł - szczegóły

Czasopismo

Annals of Computer Science and Information Systems

2015 | 5 | 721--728

Tytuł artykułu

Reproducible Floating-Point Atomic Addition in Data-Parallel Environment

Autorzy

David Defour , Sylvain Collange

Warianty tytułu

Języki publikacji

Abstrakty

Floating-point additions in concurrent execution environment are known to be hazardous, as the result depends on the order in which operations are performed. This problem is encountered in data parallel execution environments such as GPUs, where reproducibility involving floating-point atomic addition is challenging. This problem is due to the rounding error or cancellation that appears for each operation, combined with the lack of control over execution order. In this article we propose two solutions to address this problem: work reassignment and fixed-point accumulation. Work reassignment consists in enforcing an execution order that leads to weak reproducibility. Fixed-point accumulation consists in avoiding rounding errors altogether thanks to a long accumulator and enables strong reproducibility.(original abstract)

Słowa kluczowe

Algorithms

Algorytmy

Czasopismo

Annals of Computer Science and Information Systems

Rocznik

2015

Tom

Strony

721--728

Opis fizyczny

Twórcy

autor

David Defour

Laboratoire DALI-LIRMM, France

autor

Sylvain Collange

Bretagne Atlantique Campus de Beaulieu, France

Bibliografia

K. Doertel, "Best known method: Avoid heterogeneous precision in control flow calculations," Intel, Tech. Rep., 2013.
N. J. Higham, Accuracy and stability of numerical algorithms. SIAM, 2002, second edition. [Online]. Available: http: //www.maths.manchester.ac.uk/∼higham/asna
(2014, july) N-body: Fp atomics v. recomputation. [Online]. Available: http://blog.cudahandbook.com/2012/11/ 02/n-body-fp-atomics-v-recomputation.aspx
J. Allard, S. Cotin, F. Faure, P.-J. Bensoussan, F. Poyer, C. Duriez, H. Delingette, and L. Grisoni, "Sofa an open source framework for medical simulation," in Medicine Meets Virtual Reality (MMVR'15), Long Beach, USA, February 2007.
W.-F. Chiang, G. Gopalakrishnan, Z. Rakamari c, D. H. Ahn, and G. L. Lee, "Determinism and reproducibility in large-scale HPC systems," in Informal Proceedings of the 4th Workshop on Determinism and Correctness in Parallel Programming (WoDet 2013), 2013.
P. Bakkum and K. Skadron, "Accelerating SQL database operations on a GPU with CUDA," in Proceedings of 3rd Workshop on General Purpose Processing on Graphics Processing Units, GPGPU 2010, Pittsburgh, Pennsylvania, USA, March 14, 2010, ser. ACM International Conference Proceeding Series, D. R. Kaeli and M. Leeser, Eds., vol. 425. ACM, 2010, pp. 94-103. [Online]. Available: http://doi.acm.org/10.1145/1735688.1735706
S. Collange, D. Defour, S. Graillat, and R. Iakymchuk, "Full-Speed Deterministic Bit-Accurate Parallel Floating-Point Summation on Multi- and Many-Core Architectures," INRIA, DALI-LIRMM, LIP6, ICS, Tech. Rep. HAL: hal-00949355, Feb. 2014.
J. Demmel and H. D. Nguyen, "Parallel reproducible summation," IEEE Trans. Computers, vol. 64, no. 7, pp. 2060-2070, 2015. [Online]. Available: http://doi.ieeecomputersociety.org/10.1109/TC. 2014.2345391
N. J. Higham, Accuracy and stability of numerical algorithms, 2nd ed. Philadelphia, PA: Society for Industrial and Applied Mathematics (SIAM), 2002.
J.-M. Muller and al., Handbook of floating-point arithmetic. Birkhäuser, 2010.
J. Nickolls and W. J. Dally, "The GPU computing era," IEEE Micro, vol. 30, pp. 56-69, March 2010. [Online]. Available: http://dx.doi.org/10.1109/MM.2010.41
D. Defour, "Impacting predictability of gpu's," HAL-CCSD, Tech. Rep. hal-00951920, 2013. [Online]. Available: http://hal.archivesouvertes.fr/hal-00951920
V. Volkov and J. W. Demmel, "LU , QR and Cholesky factorizations using vector capabilities of GPUs," Department of Electrical Engineering and Computer Science, University of California, Berkeley, inst-UCB-EECS:adr, LAPACK Working Note 202, May 2008. [Online]. Available: http://www.netlib.org/lapack/lawnspdf/ lawn202.pdf
W. chun Feng and S. Xiao, "To gpu synchronize or not gpu synchronize?" in Circuits and Systems (ISCAS), Proceedings of 2010 IEEE International Symposium on, 2010, pp. 3801-3804.
S. Xiao and W. chun Feng, "Inter-block GPU communication via fast barrier synchronization," in IPDPS. IEEE, 2010, pp. 1-12. [Online]. Available: http://dx.doi.org/10.1109/IPDPS.2010.5470477
J. A. Stuart and J. D. Owens, "Efficient synchronization primitives for GPUs," CoRR, vol. Abs/1110.4623, 2011. [Online]. Available: http://arxiv.org/abs/1110.4623
J. Sanders and E. Kandrot, CUDA by example: an introduction to general-purpose GPU programming. pub-AW:adr: Addison-Wesley, 2010.
U. W. Kulisch, Computer arithmetic and validity, 2nd ed., ser. de Gruyter Studies in Mathematics. Berlin: Walter de Gruyter & Co., 2013, vol. 33, theory, implementation, and applications.
T. Grandlund, "GNU MP: The GNU Multiple Precision Arithmetic Library," http://gmplib.org.
G. Bohlender and U. Kulisch, "Comments on fast and exact accumulation of products," in Applied Parallel and Scientific Computing. Springer, 2012, pp. 148-156.

Typ dokumentu

Bibliografia

Identyfikatory

DOI

10.15439/2015F86

Identyfikator YADDA

bwmeta1.element.ekon-element-000171424168

Komentarze

Musisz być zalogowany aby pisać komentarze.

Annals of Computer Science and Information Systems

Reproducible Floating-Point Atomic Addition in Data-Parallel Environment

Zgłoszenie zostało wysłane

Zgłoszenie zostało wysłane