Scoring algorithm , also known as Fisher's scoring ,[ 1] is a form of Newton's method used in statistics to solve maximum likelihood equations numerically , named after Ronald Fisher .
Sketch of derivation
Let
Y
1
,
…
,
Y
n
{\displaystyle Y_{1},\ldots ,Y_{n}}
be random variables , independent and identically distributed with twice differentiable p.d.f.
f
(
y
;
θ
)
{\displaystyle f(y;\theta )}
, and we wish to calculate the maximum likelihood estimator (M.L.E.)
θ
∗
{\displaystyle \theta ^{*}}
of
θ
{\displaystyle \theta }
. First, suppose we have a starting point for our algorithm
θ
0
{\displaystyle \theta _{0}}
, and consider a Taylor expansion of the score function ,
V
(
θ
)
{\displaystyle V(\theta )}
, about
θ
0
{\displaystyle \theta _{0}}
:
V
(
θ
)
≈
V
(
θ
0
)
−
J
(
θ
0
)
(
θ
−
θ
0
)
,
{\displaystyle V(\theta )\approx V(\theta _{0})-{\mathcal {J}}(\theta _{0})(\theta -\theta _{0}),\,}
where
J
(
θ
0
)
=
−
∑
i
=
1
n
∇
∇
⊤
|
θ
=
θ
0
log
f
(
Y
i
;
θ
)
{\displaystyle {\mathcal {J}}(\theta _{0})=-\sum _{i=1}^{n}\left.\nabla \nabla ^{\top }\right|_{\theta =\theta _{0}}\log f(Y_{i};\theta )}
is the observed information matrix at
θ
0
{\displaystyle \theta _{0}}
. Now, setting
θ
=
θ
∗
{\displaystyle \theta =\theta ^{*}}
, using that
V
(
θ
∗
)
=
0
{\displaystyle V(\theta ^{*})=0}
and rearranging gives us:
θ
∗
≈
θ
0
+
J
−
1
(
θ
0
)
V
(
θ
0
)
.
{\displaystyle \theta ^{*}\approx \theta _{0}+{\mathcal {J}}^{-1}(\theta _{0})V(\theta _{0}).\,}
We therefore use the algorithm
θ
m
+
1
=
θ
m
+
J
−
1
(
θ
m
)
V
(
θ
m
)
,
{\displaystyle \theta _{m+1}=\theta _{m}+{\mathcal {J}}^{-1}(\theta _{m})V(\theta _{m}),\,}
and under certain regularity conditions, it can be shown that
θ
m
→
θ
∗
{\displaystyle \theta _{m}\rightarrow \theta ^{*}}
.
Fisher scoring
In practice,
J
(
θ
)
{\displaystyle {\mathcal {J}}(\theta )}
is usually replaced by
I
(
θ
)
=
E
[
J
(
θ
)
]
{\displaystyle {\mathcal {I}}(\theta )=\mathrm {E} [{\mathcal {J}}(\theta )]}
, the Fisher information , thus giving us the Fisher Scoring Algorithm :
θ
m
+
1
=
θ
m
+
I
−
1
(
θ
m
)
V
(
θ
m
)
{\displaystyle \theta _{m+1}=\theta _{m}+{\mathcal {I}}^{-1}(\theta _{m})V(\theta _{m})}
..
Under some regularity conditions, if
θ
m
{\displaystyle \theta _{m}}
is a consistent estimator , then
θ
m
+
1
{\displaystyle \theta _{m+1}}
(the correction after a single step) is 'optimal' in the sense that its error distribution is asymptotically identical to that of the true max-likelihood estimate.[ 2]
See also
References
^ Longford, Nicholas T. (1987). "A fast scoring algorithm for maximum likelihood estimation in unbalanced mixed models with nested random effects". Biometrika . 74 (4): 817– 827. doi :10.1093/biomet/74.4.817 .
^ Li, Bing; Babu, G. Jogesh (2019), "Bayesian Inference" , Springer Texts in Statistics , New York, NY: Springer New York, Theorem 9.4, doi :10.1007/978-1-4939-9761-9_6 , ISBN 978-1-4939-9759-6 , S2CID 239322258 , retrieved 2023-01-03
Further reading
Optimization computes maxima and minima.