One strategy for interpreting data without stringent assumptions, is to use order statistics.
Read HC 4.6. In the following, we will use the notation of HC where the
pdf of the random variable
is denoted by
, rather than
.
pdf of the random variable
is denoted by
, rather than
.
Definition
5..1
Let
,
, ...
denote a
random sample from a continuous distribution with pdf
. Let
be the smallest of these,
the next
in order of magnitude,
etc. Then
is called the ith order statistic of
the sample, and
the vector of order
statistics. We may write
. The following
alternative notation is also common;
.
Order statistics are non-parametric and only rely upon the weak assumption that
the data are samples from a continuous distribution. we pick up information by ordering the data. If we know the underlying distribution, we can
combine that knowledge with the rank of the order statistic of interest. For instance, if the underlying distribution is normal,
from a sample of size 101 will have a higher probability of being near the median than
or
. But without ordering, the same could not be said for
. So the ordering gives us extra information and we shall now explore the densities of order statistics,
denoted
etc.
Example 1 Suppose you were required to assess the ability to handle a crowd at
a railway station with regard to stair width, staff etc. The statistic of
interest is
.
Example 2 An oil product freezes at
C and the company ponders whether it should market it in a cold climate. We would require the density of
the minimum order statistic,
to assess the risk of the product failing.
Examples of other Other situations where an order statistic is of interest are listed below.
Order statistics are useful for summarizing data but may be limited for detailed descriptions of some process which has been measured. Order statistics are also ingredients for higher level statistical procedures.
Figure 5.1 shows the sample cdf as a step function increasing by
at each order statistic.
We can make statements about individual order statistics
by borrowing information provided by the entire set.
Remember that all we assumed
about the original data was that it were continuous; there were no
assumptions about the distribution. But now that the data are ordered,
we can use the extra information provided by the ordering to derive
density functions.
The data,
might be independent but the ordered data
, are not.
To begin our study of order statistics we first want to find the
joint distribution of
.