Fault-tolerant computing is a generic term describing redundant design techniques with duplicate components or repeated computations enabling uninterrupted (tolerant) operation in response to component failure (faults). The central theme of this course is to expose students to the use of reliability and availability computations as a means of comparing fault-tolerant designs. This course defines fault-tolerant computer systems and illustrates the prime importance of such techniques in improving the reliability and availability of digital systems. Topics include: Introduction to redundancy theory, limit theorems, decision theory in redundant systems; Hardware fault tolerance: Computer redundancy, detection of faults, replication and compression techniques, self repairing techniques, concentrated and distributed voters, models of fault tolerant computer; Software fault-tolerance: Fault tolerance versus fault intolerance, fault tolerance objectives; errors and their management strategies, implementation of error management strategies; Software fault tolerance techniques, software defence, protective redundancy; Architectural support of fault-tolerant software protection mechanisms, recovery mechanisms.
Course Type | Major |
---|---|
Credit Hour | 3 |
Lecture Hour | 45 |
Biweekly Quiz, One Midterm Exam, One Final Exam, Project
Letter Grade | Marks | Grade Point |
---|---|---|
A | 90 - 100 | 4.00 |
A- | 85 - 89 | 3.70 |
B+ | 80 - 84 | 3.30 |
B | 75 - 79 | 3.00 |
B- | 70 - 74 | 2.70 |
C+ | 65 - 69 | 2.30 |
C | 60 - 64 | 2.00 |
C- | 55 - 59 | 1.70 |
D+ | 50 - 54 | 1.30 |
D | 45 - 49 | 1.00 |
F | 00 - 44 | 0.00 |