single-point fail-safe

a notion that, like the fabled parity bit, any single failure will be detected
and thwarted from divulging unsafed (un(en)crypted) data on ordinary paths
[on any path(s) other than specified/scheduled]

[under construction]

How do you verify a billion single-point failure possibilities, when only the culprit is desired? [Any large composite system has upwards of a billion circuit points in the total topology] Testing 0.1% merely show that the system is not altogether 'bug'y; and that that 0.1% is very costly, as the single-point failure may be involved in numerous functions and scenarios, thus multiplying any test procedure, and pressing toward a working model of the famed 'halting' problem. If one point is critical, how do you find it?!

When do you find it? A system declared single-point fail-safe must identify and notify of its failure (or utterly fail so as to alert the operator eventually by its omission if not its direct warning). Pre-testing leaves the final product no more than a communist's parley, measured, estimated, concluded, excused. If a single point does fail, it may go unnoticed (unused) until more points have failed ... the single-point fail-safe is brought to nought beneath the greater expectation of cumulated multi-point failures. Optimally reused components and subcomponents reduces this likelihood. A runtime test may improve detection of points failed (and raise the question of whether to notify and ignore, or disable avenues to those points, thus isolating the possibility of error ... so long as the test program is sufficiently comprehensive of the system configuration (we may later call this meta-programming: the interpretation of overall operability into test-decidability).

Then the runtime test adds its own time and space allocation, and failures thereïn are extraneous (accreted) to the initial pre-TEMPEST speculation upon the project objectives. And the test program must be tested, either pretesting, or a second test-watcher-test (quite feasibly a second machine as well to ensure credibility of either or both; and the operational advantage of 'hot-spacre' components). That old criticism of early computers, "the pinnacle of engineering: all directions are downhill", characterizes the vexing notion of fail-safe CRYPTO.

Single-point-failures come in all 'flavors' ... typically (single) bits that deviate from designer/manufacturer specifications: slower or faster than usual, as when an analog circuit component stresses, alterates, changing time constants - or when digital design-philosophy variances (eg. the 74LS169 up/down-counter ca. 1979, differed from the usual 'LS' and the 25LS169 set-up, hold, delay timings] stress the marginal synchronization tolerances] - but single-point-decisions, removing race conditions, hazards, metastabilities, usually resolve these, albeit with delayed correction: that is, increase failure-dwell/duration - 'sense' obliteration, stuck-on, stuck-off, floating (indeterminate) or half-floating, stuck-to, controlled-by, or interfering-with another bit - effecting one-bit, or even two-bit failure modes (but less combinatorically than any-two), intermittent sticking, floating,, ... and the possible subsequence to logical 'sense' obliteration, changing function, inversion, wired-'or'/'and'/'if'/..., large-scale logic-avalanching (as when the bit affects expected numeric precision, which cumulates iteratively to some delayed bound-failure, while violating short-term statistical variances undetected) ... power surge, defeating process, code intermittency (which may compound program (re)sequencing).

The single-point-failure may be 'solid', intermittent, or, situationally intermittent, as when code sequences draw excessive local power, and diminishing the reliability-margin of bits in general ... which means the intermittency may move around, averaging one bit per pass. In the military application the single bit failure may be linked to a specific (overwhelming) causal (initiating) event, such as EMP (electro-magnetic-pulse from asymmetric or pre-magnetized nuclear device detonation) or plasma reflection-shorting (especially of external antennae) within fireball range (which may be ten miles radius for a moderately large thermonuclear device) or subsequent attendant lightning strikes (which alone are expectable in commercial applications) or cosmic 'ray' events (protons or neutrons) which tend to be, or induce, momentary failures and cumulative degradation (leading to modal failures as well) or local catastrophic events (usually damaging or destroying some other equipment(s) but must be limited thereto by 'failure-safe'ing: commonly implemented as 'fire-wall') or by probing (testing or maintenance) during 'live' operation, or intentional self-destruct sequences of certain subcomponents (against tampering attempts).

Some of these single-point-failures can be tested-for, as most likely to occur; but otherwise, the remainder count into the billions in modern computerized equipment, and cannot be 'random-test'ed, but may be categorized in small systems. In general the military criteria (have been) reduced to preventing unclear(ed) data from reaching undesirable places (typically, out) immediately: within one message interval, or at most one key-and-message interval, allowing for subencrypted blocks.


[under construction]

A premise discovery under the title,

Grand-Admiral Petry
... cruising the cosmo-net on planetship Earth ...
'Majestic Service in a Solar System'
Nuclear Emergency Management

© 1996