The Davidon–Fletcher–Powell formula (or DFP ; named after William C. Davidon , Roger Fletcher , and Michael J. D. Powell ) finds the solution to the secant equation that is closest to the current estimate and satisfies the curvature condition. It was the first quasi-Newton method to generalize the secant method to a multidimensional problem. This update maintains the symmetry and positive definiteness of the Hessian matrix .
Given a function
f
(
x
)
{\displaystyle f(x)}
, its gradient (
∇
f
{\displaystyle \nabla f}
), and positive-definite Hessian matrix
B
{\displaystyle B}
, the Taylor series is
f
(
x
k
+
s
k
)
=
f
(
x
k
)
+
∇
f
(
x
k
)
T
s
k
+
1
2
s
k
T
B
s
k
+
…
,
{\displaystyle f(x_{k}+s_{k})=f(x_{k})+\nabla f(x_{k})^{T}s_{k}+{\frac {1}{2}}s_{k}^{T}{B}s_{k}+\dots ,}
and the Taylor series of the gradient itself (secant equation)
∇
f
(
x
k
+
s
k
)
=
∇
f
(
x
k
)
+
B
s
k
+
…
{\displaystyle \nabla f(x_{k}+s_{k})=\nabla f(x_{k})+Bs_{k}+\dots }
is used to update
B
{\displaystyle B}
.
The DFP formula finds a solution that is symmetric, positive-definite and closest to the current approximate value of
B
k
{\displaystyle B_{k}}
:
B
k
+
1
=
(
I
−
γ
k
y
k
s
k
T
)
B
k
(
I
−
γ
k
s
k
y
k
T
)
+
γ
k
y
k
y
k
T
,
{\displaystyle B_{k+1}=(I-\gamma _{k}y_{k}s_{k}^{T})B_{k}(I-\gamma _{k}s_{k}y_{k}^{T})+\gamma _{k}y_{k}y_{k}^{T},}
where
y
k
=
∇
f
(
x
k
+
s
k
)
−
∇
f
(
x
k
)
,
{\displaystyle y_{k}=\nabla f(x_{k}+s_{k})-\nabla f(x_{k}),}
γ
k
=
1
y
k
T
s
k
,
{\displaystyle \gamma _{k}={\frac {1}{y_{k}^{T}s_{k}}},}
and
B
k
{\displaystyle B_{k}}
is a symmetric and positive-definite matrix .
The corresponding update to the inverse Hessian approximation
H
k
=
B
k
−
1
{\displaystyle H_{k}=B_{k}^{-1}}
is given by
H
k
+
1
=
H
k
−
H
k
y
k
y
k
T
H
k
y
k
T
H
k
y
k
+
s
k
s
k
T
y
k
T
s
k
.
{\displaystyle H_{k+1}=H_{k}-{\frac {H_{k}y_{k}y_{k}^{T}H_{k}}{y_{k}^{T}H_{k}y_{k}}}+{\frac {s_{k}s_{k}^{T}}{y_{k}^{T}s_{k}}}.}
B
{\displaystyle B}
is assumed to be positive-definite, and the vectors
s
k
T
{\displaystyle s_{k}^{T}}
and
y
{\displaystyle y}
must satisfy the curvature condition
s
k
T
y
k
=
s
k
T
B
s
k
>
0.
{\displaystyle s_{k}^{T}y_{k}=s_{k}^{T}Bs_{k}>0.}
The DFP formula is quite effective, but it was soon superseded by the Broyden–Fletcher–Goldfarb–Shanno formula , which is its dual (interchanging the roles of y and s ).[ 1]
Compact representation
By unwinding the matrix recurrence for
B
k
{\displaystyle B_{k}}
, the DFP formula can be expressed
as a compact matrix representation . Specifically, defining
S
k
=
[
s
0
s
1
…
s
k
−
1
]
,
{\displaystyle S_{k}={\begin{bmatrix}s_{0}&s_{1}&\ldots &s_{k-1}\end{bmatrix}},}
Y
k
=
[
y
0
y
1
…
y
k
−
1
]
,
{\displaystyle Y_{k}={\begin{bmatrix}y_{0}&y_{1}&\ldots &y_{k-1}\end{bmatrix}},}
and upper triangular and diagonal matrices
(
R
k
)
i
j
:=
(
R
k
SY
)
i
j
=
s
i
−
1
T
y
j
−
1
,
(
R
k
YS
)
i
j
=
y
i
−
1
T
s
j
−
1
,
(
D
k
)
i
i
:=
(
D
k
SY
)
i
i
=
s
i
−
1
T
y
i
−
1
for
1
≤
i
≤
j
≤
k
{\displaystyle {\big (}R_{k}{\big )}_{ij}:={\big (}R_{k}^{\text{SY}}{\big )}_{ij}=s_{i-1}^{T}y_{j-1},\quad {\big (}R_{k}^{\text{YS}}{\big )}_{ij}=y_{i-1}^{T}s_{j-1},\quad (D_{k})_{ii}:={\big (}D_{k}^{\text{SY}}{\big )}_{ii}=s_{i-1}^{T}y_{i-1}\quad \quad {\text{ for }}1\leq i\leq j\leq k}
the DFP matrix has the equivalent formula
B
k
=
B
0
+
J
k
N
k
−
1
J
k
T
,
{\displaystyle B_{k}=B_{0}+J_{k}N_{k}^{-1}J_{k}^{T},}
J
k
=
[
Y
k
Y
k
−
B
0
S
k
]
{\displaystyle J_{k}={\begin{bmatrix}Y_{k}&Y_{k}-B_{0}S_{k}\end{bmatrix}}}
N
k
=
[
0
k
×
k
R
k
YS
(
R
k
YS
)
T
R
k
+
R
k
T
−
(
D
k
+
S
k
T
B
0
S
k
)
]
{\displaystyle N_{k}={\begin{bmatrix}0_{k\times k}&R_{k}^{\text{YS}}\\{\big (}R_{k}^{\text{YS}}{\big )}^{T}&R_{k}+R_{k}^{T}-(D_{k}+S_{k}^{T}B_{0}S_{k})\end{bmatrix}}}
The inverse compact representation can be found by applying the Sherman-Morrison-Woodbury inverse to
B
k
{\displaystyle B_{k}}
. The compact representation is particularly useful for limited-memory and constrained problems.[ 2]
See also
References
^ Avriel, Mordecai (1976). Nonlinear Programming: Analysis and Methods . Prentice-Hall. pp. 352– 353. ISBN 0-13-623603-0 .
^ Brust, J. J. (2024). "Useful Compact Representations for Data-Fitting". arXiv :2403.12206 [math.OC ].
Further reading
Optimization computes maxima and minima.