Fault-Tolerance Techniques for High-Performance Computing

Fault-Tolerance Techniques for High-Performance Computing
Author :
Publisher : Springer
Total Pages : 320
Release :
ISBN-10 : 9783319209432
ISBN-13 : 3319209434
Rating : 4/5 (434 Downloads)

Book Synopsis Fault-Tolerance Techniques for High-Performance Computing by : Thomas Herault

Download or read book Fault-Tolerance Techniques for High-Performance Computing written by Thomas Herault and published by Springer. This book was released on 2015-07-01 with total page 320 pages. Available in PDF, EPUB and Kindle. Book excerpt: This timely text presents a comprehensive overview of fault tolerance techniques for high-performance computing (HPC). The text opens with a detailed introduction to the concepts of checkpoint protocols and scheduling algorithms, prediction, replication, silent error detection and correction, together with some application-specific techniques such as ABFT. Emphasis is placed on analytical performance models. This is then followed by a review of general-purpose techniques, including several checkpoint and rollback recovery protocols. Relevant execution scenarios are also evaluated and compared through quantitative models. Features: provides a survey of resilience methods and performance models; examines the various sources for errors and faults in large-scale systems; reviews the spectrum of techniques that can be applied to design a fault-tolerant MPI; investigates different approaches to replication; discusses the challenge of energy consumption of fault-tolerance methods in extreme-scale systems.


Fault-Tolerance Techniques for High-Performance Computing Related Books

Fault-Tolerance Techniques for High-Performance Computing
Language: en
Pages: 320
Authors: Thomas Herault
Categories: Computers
Type: BOOK - Published: 2015-07-01 - Publisher: Springer

DOWNLOAD EBOOK

This timely text presents a comprehensive overview of fault tolerance techniques for high-performance computing (HPC). The text opens with a detailed introducti
Advances in Mathematical Methods and High Performance Computing
Language: en
Pages: 503
Authors: Vinai K. Singh
Categories: Computers
Type: BOOK - Published: 2019-02-14 - Publisher: Springer

DOWNLOAD EBOOK

This special volume of the conference will be of immense use to the researchers and academicians. In this conference, academicians, technocrats and researchers
High Performance Computing in Science and Engineering
Language: en
Pages: 172
Authors: Tomáš Kozubek
Categories: Computers
Type: BOOK - Published: 2021-01-07 - Publisher: Springer Nature

DOWNLOAD EBOOK

This book constitutes the thoroughly refereed post-conference proceedings of the 4th International Conference on High Performance Computing in Science and Engin
2018 IEEE ACM 8th Workshop on Fault Tolerance for HPC at EXtreme Scale (FTXS)
Language: en
Pages:
Authors: IEEE Staff
Categories:
Type: BOOK - Published: 2018-11-16 - Publisher:

DOWNLOAD EBOOK

Authors are invited to submit original papers on the research and practice of fault tolerance in extreme scale distributed systems (primarily HPC systems, but i
Innovative Research and Applications in Next-Generation High Performance Computing
Language: en
Pages: 488
Authors: Hassan, Qusay F.
Categories: Computers
Type: BOOK - Published: 2016-07-05 - Publisher: IGI Global

DOWNLOAD EBOOK

High-performance computing (HPC) describes the use of connected computing units to perform complex tasks. It relies on parallelization techniques and algorithms