Moment estimation
Theoretical foundation: Law of large numbers, $n → +\infty, A_k \overset{P} \longrightarrow \mu_k, B_k \overset{P}\longrightarrow \nu_k$
$\mu_k = EX^k = \frac{1}{n}\sum^n_{i=1}X_i^k, k =1,2,...$ or $\nu_k = E(X- EX)^k = \frac{1}{n}\sum^n_{i=1}(X_i-\bar X)^k, k =1,2,…$
Therefore, use $A_k, B_k$ to estimate $\mu_k, \nu_k$, we can solve unknown parameters $\theta_1, \theta_2, … ,\theta_m$ by listing equations:
$\left \{ \begin{array} {lcr} \mu_1 = EX = g_1(\theta_1, \theta_2, … ,\theta_m) \\ \mu_2 = EX^2 = g_2(\theta_1, \theta_2, … ,\theta_m) \\ ... \\ \mu_m = EX^m = g_m(\theta_1, \theta_2, … ,\theta_m) \end{array} \right. \longrightarrow \left \{ \begin{array} {lcr} \theta_1 = h_1(\mu_1, \mu_2, ..., \mu_m) \\ \theta_2 = h_2(\mu_1, \mu_2, ..., \mu_m) \\ ... \\ \theta_m = h_m(\mu_1, \mu_2, ..., \mu_m) \end{array} \right. \longrightarrow \left \{ \begin{array} {lcr} \hat\theta_1 = h_1(A_1, A_2, ..., A_m) \\ \hat\theta_2 = h_2(A_1, A_2, ..., A_m) \\ ... \\ \hat\theta_m = h_m(A_1, A_2, ..., A_m) \end{array} \right.$
Here origin moment and central moment are completely equivalent. Take $\mu, \sigma^2$ as example, the solution process: Method 1: Origin moment (1) $\left \{ \begin{array} {lcr} \mu_1 = E(X) = \mu \\ \mu_2 = E(X)^2 = D(X)+E(X^2) = \mu^2+\sigma^2 \end{array} \right.$ (2) $\left \{ \begin{array} {lcr} \mu =\mu_1 \\ \sigma^2 = \mu_2 -\mu_1^2 \end{array} \right.$ (3) use $A_1$ to estimate $\mu_1$, $A_2$ to estimate $\mu_2$, then $\left \{ \begin{array} {lcr} \hat {\mu} = A_1 \\ \hat{\sigma^2} = A_2-A_1^2= E(\bar X^2)-E^2(\bar X) = D(\bar X) = B_2 \end{array} \right.$ Method 2: Central moment (1) $\left \{ \begin{array} {lcr} \nu_1 = E(X-E(X)) = 0 = \mu-\mu_1 \\ \nu_2 = D(X) = \sigma^2 \end{array} \right.$ (2) $\left \{ \begin{array} {lcr} \mu =\mu_1 \\ \sigma^2 = \nu_2 \end{array} \right.$ (3) use $B_2$ to estimate $\nu_2$, then $\left \{ \begin{array} {lcr} \hat {\mu} = A_1 \\ \hat{\sigma^2} = B_2 \end{array} \right.$
<aside> ✔️ e.g.1 $X\sim U(\mu-\rho, \mu+\rho), \mu, \rho>0$ are unknown, give the expression of $\hat\mu, \hat\rho$. (1) $\left \{ \begin{array} {lcr} \mu_1 = E(X) = \mu \\ \nu_2 = D(X)= \frac{(\mu+\rho -\mu+\rho)^2}{12}= \frac{1}{3}\rho^2 \end{array} \right.$# $\mu_1, \nu_2$ are the most commonly ued pairs (2) $\left \{ \begin{array} {lcr} \mu = \mu_1 \\ \rho = \sqrt{3\nu_2} \end{array} \right.$ (3) use $A_1$ to estimate $\mu_1$, use $B_2$ to estimate $\nu_2$, then $\left \{ \begin{array} {lcr} \hat \mu = A_1 \\ \hat \rho = \sqrt{3B_2} \end{array} \right.$.
</aside>
Maximum likelihood estimation
Theoretical foundation: $P(A) = F(\theta),$ now that $A$ happens, then choose the estimated value of $\theta$ with $P(A)_{\max}$.
Discrete population
$X_1, X_2, …, X_n$ are samples from $X$, $x_1, x_2, …, x_n$ are the sample values,the mass distribution is $P\{X=x\} =p(x;\theta)$, then
$P\{X_1 = x_1, X_2 =x_2,…,X_n= x_n\} = P\{X_1 = x_1\}P\{X_2 = x_2\}…P\{X_n = x_n\} = \prod^n_{i=1}P\{X_i =x_i\}$
$= \prod^n_{i=1} p(x_i;\theta) \equiv L(\theta, x_1, x_2, …, x_n)\overset{x_1, x_2, …, x_n\text{ given}}=L(\theta)$, the likelihood function.
Continuous population
$X_1, X_2, …, X_n$ are samples from $X$, $x_1, x_2, …, x_n$ are the sample values,the density function$f(x;\theta)$, then
likelihood function $L(\theta) = \prod^n_{i=1} f(x_i;\theta)$.
therefore, $\hat \theta(x_1,x_2, ..., x_n)$ satisfies $L(\hat \theta)=\max_{\theta \in \Theta} L(\theta,x_1,x_2, ..., x_n)$ is called the maximum likelihood estimated value, the corresponding $\hat \theta(X_1,X_2,…, X_n)$ is called the maximum likelihood estimation(MLE).
Solution process:
(1) write down $L(\theta)= \prod^n_{i=1} p(x_i;\theta)$ or $L(\theta) = \prod^n_{i=1} f(x_i;\theta)$;
(2) take logarithm of both sides: $\ln L(\theta) = \sum^n_{i=1} \ln p(x_i, \theta) \equiv l(\theta)$ , log-likelihood function;
(3) solve log-likelihood equation: $\frac{dl(\theta)}{d\theta}|_{\theta = \hat \theta} = 0$ and get $\hat \theta$.
<aside> ✔️ e.g.1 Population $X$, $f(x) = \left \{ \begin{array} {lcr} \sqrt{\theta} x^{\sqrt{\theta}-1} &, 0≤x≤1 \\ 0 & , \text{elsewhile} \end{array} \right.,\theta>0$ unknown, $(X_1, X_2, …, X_n)$ are samples from $X$, solve MLE of $\theta$. $L(\theta) = \prod^n_{i=1} \sqrt{\theta} x_i^{\sqrt{\theta}-1} = \theta^{\frac{n}{2}}(\prod^n_{i=1}x_i)^{\sqrt{\theta}-1}$ #don’t forget the subscript then $l(\theta) = \frac{n}{2}\ln\theta+(\sqrt{\theta}-1)\sum^n_{i=1} \ln x_i, \frac{dl(\theta)}{d\theta}= \frac{n}{2\theta}+\frac{1}{2\sqrt{\theta}}\sum^n_{i=1} \ln x_i=0,$ then $\theta = n^2(\sum^n_{i=1} \ln x_i)^{-2}$. #usually no need for verification therefore, MLE of $\theta$ is $\hat \theta_L = n^2(\sum^n_{i=1}\ln X_i)^{-2}$. #substitute $x_i$ with $X_i$
</aside>
In some situations, the differential method cannot work out, then return to the definition.
<aside> ✔️ e.g.2 Population $X\sim U[0, \theta], \theta>0$ unknown, $(X_1, X_2, …, X_n)$ are samples from $X$, solve MLE of $\theta$. $f(x) = \left \{ \begin{array} {lcr} \frac{1}{\theta} &, 0≤x≤\theta \\ 0 & , \text{elsewhile} \end{array} \right.$, then $L(\theta) = \prod^n_{i=1} f(x_i)= \left \{ \begin{array} {lcr} \frac{1}{\theta^n} &, 0≤x_1, x_2, ..., x_n≤\theta \\ 0 & , \text{elsewhile} \end{array} \right.$ therefore $l(\theta)=-n\ln \theta$, $\frac{dl(\theta)}{d\theta}= -\frac{n}{\theta} \ne0$, we cannot use diffential method to solve $\hat \theta_L$. Definition method: from $L(\theta)$, we see that when $0≤x_1, x_2, ..., x_n≤\theta$, $L(\theta)$ is a decreasing function of $\theta$, the smaller $\theta$ is, the bigger $L(\theta)$ is. However, $\theta$ must be bigger than any $x_i$, or $L(\theta) =0$. Therefore, $\hat \theta_L = \max\{x_1, x_2, ..., x_n\}$. Practice. What if $X\sim U[\theta, 2\theta], \theta>0?$
</aside>
<aside> ✔️ e.g.3 $f(x) = \left \{ \begin{array} {lcr} \frac{1}{\theta}e^{-\frac{x-\mu}{\theta}} &, x≥\mu \\ 0 & , \text{elsewhile} \end{array} \right.$, give the expression of $\hat \mu_L, \hat\theta_L.$ Method 1: Moment estimation $\mu_1 = E(X) = \int^{+\infty}{-\infty}xf(x)dx = \int^{+\infty}{\mu} \frac{x}{\theta}e^{-\frac{x-\mu}{\theta}}dx=\mu+\theta$ $\nu_2 = D(X) = E(X-\mu-\theta)^2=\int^{+\infty}{\mu} (x-\mu-\theta)^2\frac{1}{\theta}e^{-\frac{x-\mu}{\theta}}dx\overset{x-\mu\equiv t}=\int^{+\infty}{0}(t-\theta)^2\frac{1}{\theta}e^{-\frac{t}{\theta}}dt=\theta^2$ therefore $\mu = \mu_1 -\sqrt{\nu_2}, \theta = \sqrt{\nu_2}$ replace $\mu_1$ with $\bar X$, $\nu_2$ with $B_2$, then $\hat \theta = \sqrt{B_2} = \sqrt{\frac{1}{n}\sum^n_{i=1}(X_i-\bar X)^2}$, $\bar \mu = \bar X - \sqrt{B_2} = \bar X - \sqrt{\frac{1}{n}\sum^n_{i=1}(X_i-\bar X)^2}$. Method 2: MLE $L(\mu, \theta)= \prod^n_{i=1}\theta^{-1}e^{-\frac{x-\mu}{\theta}} = \theta^{-n}e^{-\frac{1}{\theta}\sum^n_{i=1}(x_i-\mu)},x_1,x_2, ...,x_n\ge\mu.$ then $l(\mu, \theta) = -n\ln \theta-\frac{1}{\theta}\sum^n_{i=1}(x_i-\mu)=-n\ln \theta-\theta^{-1}(n\bar X-n\mu)$ $\frac{\partial l(\mu, \theta)}{\partial \theta}=0\iff-\frac{n}{\theta}+\frac{n\bar X -n\mu}{\theta^2}=0\Rightarrow \theta = \bar X-\mu.$ $\frac{\partial l(\mu, \theta)}{\partial \mu}=0\iff$ $\frac{n}{\theta}=0$ no solution, therefore cannot use the differential method. $\frac{\partial l(\mu, \theta)}{\partial \mu}>0$, then $L(\mu, \theta)$ is a increasing function of $\mu$, however $\mu$ must be smaller than any $x_i$, otherwise $L(\mu, \theta)=0$. Therefore, $\hat \mu_L=\min\{x_1, x_2, …, x_n\}$, $\hat \theta_L = \bar X - \min\{x_1, x_2, …, x_n\}$.
</aside>