next up previous contents
Next: Information in a Sample Up: Reduction of Data Previous: The Exponential Family of   Contents


Likelihood


Definition 7..5
Let $X_1, \ldots, X_n$ be a random sample from $f(x;\theta)$ and $x_1, \ldots, x_n$ the corresponding observed values. The likelihood of the sample is the joint probability function (or the joint probability density function, in the continuous case) evaluated at $x_1, \ldots, x_n$, and is denoted by $L(\theta;x_1,\ldots,x_n)$.

Now the notation emphasizes that, for a given sample x, the likelihood is a function of $\theta$. Of course

\begin{displaymath}L(\theta; {\bf x})= f({\bf x}; \theta),\hspace*{.5cm}[ =L(\theta),
\mbox{ in a briefer notation}].\end{displaymath}

The likelihood function is a statistic, depending on the observed sample x. A statistical inference or procedure should be consistent with the assumption that the best explanation of a set of data is provided by $\hat{\theta}$, a value of $\theta$ that maximizes the likelihood function. This value of $\theta$ is called the maximum likelihood estimate (mle). The relationship of a sufficient statistic for $\theta$ to the mle for $\theta$ is contained in the following theorem.



Theorem 7..2
Let $X_1, \ldots, X_n$ be a random sample from $f(x;\theta)$. If a sufficient statistic $T=t({\bf X})$ for $\theta$ exists, and if a maximum likelihood estimate $\hat{\theta}$ of $\theta$ also exists uniquely, then $\hat{\theta}$ is a function of $T$.

Proof
Let $g(t({\bf x}; \theta))$ be the pdf of $T$. Then by the definition of sufficiency, the likelihood function can be written

\begin{displaymath}
L(\theta; x_1, \ldots, x_n)=f(x_1, \theta) \ldots f(x_n; \th...
...g\left(t(x_1, \ldots, x_n); \theta\right) h(x_1, \ldots, x_n)
\end{displaymath} (7.5)

where h( $x_1, \ldots, x_n$) does not depend on $\theta$. So $L$ and $g$ as functions of $\theta$ are maximized simultaneously. Since there is one and only one value of $\theta$ that maximizes $L$ and hence $g(t(x_1,
\ldots, x_n);\theta)$, that value $\theta$ must be a function of $t(x_1, \ldots, x_n)$. Thus the mle $\hat{\theta}$ is a function of the sufficient statistic $T=t(X_1, \ldots,X_n)$.


Sometimes we cannot find the maximum likelihood estimator by differentiating the likelihood (or $\log$ of the likelihood) with respect to $\theta$ and setting the equation equal to zero. Two possible problems are:

(i)
The likelihood is not differentiable throughout the range space;
(ii)
The likelihood is differentiable, but there is a terminal maximum (that is, at one end of the range space).

For example, consider the uniform distribution on $[0, \theta]$. The likelihood, using a random sample of size $n$ is

\begin{displaymath}
L(\theta;x_1, \ldots, x_n)=\left\{\begin{array}{ll}
\frac{1}...
... i=1, \ldots, n\\
0 & \mbox{ otherwise }. \end{array} \right.
\end{displaymath} (7.6)

Now $1/\theta^n$ is decreasing in $\theta$ over the range of positive values. Hence it will be maximized by choosing $\theta$ as small as possible while still satisfying $0 \leq x_i \leq \theta$. That is, we choose $\theta$ equal to $X_{(n)}$, or $Y_n$, the largest order statistic.



Example 7..8
Consider the truncated exponential distribution with pdf

\begin{displaymath}f(x;\theta)=e^{-(x-\theta)}I_{[\theta,\infty)}(x).\end{displaymath}

The Likelihood is

\begin{displaymath}L(\theta; x_1, \ldots, x_n)=e^{n \theta-\sum x_i}
\prod_{i=1}^n I_{[\theta,\infty)}(x_i). \end{displaymath}

Hence the likelihood is increasing in $\theta$ and we choose $\theta$ as large as possible, that is, equal to $\min(x_i)$.


Further use is made of the concept of likelihood in Hypothesis Testing (Chapter 3), but here we will define the term likelihood ratio, and in particular monotone likelihood ratio.



Definition 7..6
Let $\theta_1$ and $\theta_2$ be two competing values of $\theta$ in the density $f(x;\theta)$, where a sample of values X leads to likelihood, $L(\theta;{\bf X})$. Then the likelihood ratio is

\begin{displaymath}\Lambda=L(\theta_1; {\bf X})/L(\theta_2;{\bf X}).\end{displaymath}

This ratio can be thought of as comparing the relative merits of the two possible values of $\theta$, in the light of the data X. Large values of $\Lambda$ would favour $\theta_1$ and small values of $\Lambda$ would favour $\theta_2$. Sometimes the statistic $T$ has the property that for each pair of values $\theta_1$, $\theta_2$, where $\theta_1 > \theta_2$, the likelihood ratio is a monotone function of $T$. If it is monotone increasing, then large values of $T$ tend to be associated with the larger of the two parameter values. This idea is often used in an intuitive approach to hypothesis testing where, for example, a large value of $\overline{X}$ would support the larger of two possible values of $\mu$.



Definition 7..7
A family of distributions indexed by a real parameter $\theta$ is said to have a monotone likelihood ratio if there is a statistic $T$ such that for each pair of values $\theta_1$ and $\theta_2$, where $\theta_1 > \theta_2$, the likelihood ratio $L(\theta_1)/L(\theta_2)$ is a non-decreasing function of $T$.



Example 7..9
Let $X_1, \ldots, X_n$ be a random sample from a Poisson distribution with parameter $\lambda$. Determine whether ( $X_1, \ldots, X_n$) has a monotone likelihood ratio (mlr).

Here the likelihood of the sample is

\begin{displaymath}L(\lambda; x_1, \ldots, x_n)=e^{-n \lambda}\lambda^{\sum x_i}/\prod x_i!. \end{displaymath}

Let $\lambda_1$, $\lambda_2$ be $2$ values of $\lambda$ with $0 < \lambda_1
< \lambda_2 < \infty$. Then for given $x_1, \ldots, x_n$

\begin{displaymath}\frac{L(\lambda_2;{\bf x})}{L(\lambda_1;{\bf x})}=
\frac{e^{-...
...a_2}{\lambda_1}\right)^{\sum x_i} e^{-n(\lambda_2-\lambda_1)} .\end{displaymath}

Note that $(\lambda_2/\lambda_1) > 1$ so this ratio is increasing as $T({\bf x})=\sum x_i$ increases. Hence ( $X_1, \ldots, X_n$) has a monotone likelihood ratio in $T({\bf x})=\sum x_i$.


next up previous contents
Next: Information in a Sample Up: Reduction of Data Previous: The Exponential Family of   Contents
Bob Murison 2000-10-31