What is Reliability ?
Reliability is the probability that an item will perform a
required function under stated conditions for a stated period of time. The
probability of survival, R(t), plus the probability of failure, F(t), is always
unity.
Expressed as a formula : F(t) + R(t) = 1 or, F(t)=1 - R(t).
The required function includes both a definition of
satisfactory and unsatisfactory operation (failure). The stated conditions are
the total physical environment, including mechanical, thermal, and electrical
conditions. The stated period of time is the time during which satisfactory
operation is desired.
What is Availability ?
- The probability that a system is in its intended functional
condition and therefore capable of being used in a stated environment.
Availability deals with the duration of up-time for operations and is a measure
of how often the system is alive and well. It is often expressed as
(up-time)/(up-time + downtime) with many different variants. Up-time and
downtime refer to dichotomized conditions. Up-time refers to a capability to
perform the task and downtime refers to not being able to perform the task.
What is Failure ?
Failure is any event that impacts a system in a way that
adversely affects the system criteria. For example, the criteria could include
output in a sold-out condition, or maintenance cost or capital resources in a
constrained budget cycle, environmental excursions or safety, etc. A failure
definition should contain specific criteria and not be ambiguous. Failure
definition can change on a given system over time.
Field failures do not generally occur at a uniform rate, but
follow a distribution in time commonly described as a "bathtub curve." The life
of a device can be divided into three regions: Infant Mortality Period, where
the failure rate progressively improves; Useful Life Period, where the failure
rate remains constant; and Wearout Period, where failure rates begin to
increase.
Within a population of units is a small sub-group of units
with latent defects that will fail when exposed to a stress that would
otherwise be benign to a good unit. With the failure of the weak units, the
remaining population is more reliable, and the failure rate is known to
decrease.
Units that pass the Infant Mortality Period have a high
probability of surviving the conditions provided by the system and its
environment. Failures that occur during the Useful Life Period are residual
defects surviving Infant Mortality, unpredictable system or environmental
conditions, or premature wearout.
Wearout failures are generally associated with such failure
mechanisms as metal migration, hot electron effects, wirebond intermetallics,
or thermal fatigue. Typically, the wearout of a semiconductor occurs after many
years or even decades, and outlives the lifespan of the system in which the
component is used.
What is Maintainability ?
A measure of the ease and rapidity with which a system can be
restored to operational status following a failure. Maintainability deals with
duration of maintenance outages or how long it takes to achieve (ease and
speed) the maintenance actions compared to a datum. The datum includes
maintenance (all actions necessary for retaining an item in, or restoring an
item to, a specified, good condition) is performed by personnel having
specified skill levels, using prescribed procedures and resources, at each
prescribed level of maintenance. Maintainability characteristics are usually
determined by equipment design which set maintenance procedures and determine
the length of repair times.
What is Failure Mode ?
A particular way in which failures occur, independent of the
reason for failure.
What is Early Life Period ?
The early life period of device operation is characterized by
a rapidly declining failure rate. It occurs between 0 and 10,000 hours (~1
year) of device operation. Ambient operating temperature is specified to be
55?C. The failure rate during the early life period can be modeled by the
Weibull Distribution:
l(t) = lot-a
where 0 < a < 1. l(t) is usually expressed in percent failures per 1,000 hours.
What is Useful Life Period ?
Beyond the infant mortality period, in the useful life period,
the failure rate is assumed to be determined by the exponential distribution.
The failure rate here is at its lowest and relatively constant during this
period. It begins after 10,000 hours (~1 year) of device operation. Reliability
during this period must be specified as a single, essentially constant failure
rate. An operating temperature of 55?C, an activation energy of 0.62eV and
normal operating voltage are used for lifetime and reliability calculations.
What is Failure Rate ?
The number of failures of an item per unit measurement of
life. Failure rate is considered constant over the useful life period.
What is Failure Modes and Effects Analysis (FMEA) ?
A modified methodology to identify the modes of failure events
and assigning values to them based on unit cost and frequency, then
prioritizing the result in order to focus the organization on the significant
few failures.
What is Failure Modes, Effects and Criticality Analysis (FMECA) ?
This the the detailed version of FMEA. Instead of examining
the system as larger units, you assign criticality values of each failure for
the smallest units in the system that is observed.
What is Mean Time Between Failures (MTBF) ?
Total operating time divided by the number of failures. MTBF
is the inverse of failure rate.
What is Mean Time To Restore (MTTR) ?
Total elapsed time from initial failure to the reinitiating of
system status. Mean Time To Restore includes Mean Time To Repair (MTBF + MTTR = 1.)
What is Root Cause Failure Analysis (RCFA) ?
A technique for uncovering the cause of a failure by deductive
reasoning down to the physical and human root(s), and then using inductive
reasoning to uncover the much broader latent or organizational root(s).
Internal Links