RESEARCH
ON AN
AUGMENTED LAGRANGIAN
PENALTY FUNCTION
ALGORITHM
FOR
NONLINEAR PROGRAMMING
FINAL REPORT
NASA
RESEARCH GRANT
NSG
1525
April 1978 through December 1978
Principal Investigator:
Dr. Les
Frair
Department
of
IEOR
VPI
& SU
ABSTRACT
This
research
represents
an
extensive study
of the
Augmented
Lagrangian
(ALAG)
Penalty Function Algorithm
for
optimizing
nonlinear
mathematical
models.
The
mathematical models
of
interest
are
deterministic
in
nature
and
finite dimensional optimization
is
assumed.
A
detailed review
of
penalty function techniques
in
general
and the
ALAG
technique
in
particular
is
presented. Numerical experiments
are
conducted
utilizing
a
number
of
nonlinear
optimization problems
to
identify
an
efficient ALAG Penalty Function Technique
for
computer implementation.
TABLE
OF
CONTENTS
1.
INTRODUCTION
2.
REVIEW
0*F
RELATED MINIMIZATION TECHNIQUES
2.1
Penalty Function Technique
2.2
Lagrangian
Primal-Dual
Method
3.
AUGMENTED LAGRANGIAN PENALTY FUNCTION TECHNIQUE
3.1
Introduction
3.2
Review
of
Technique
for
Equality Constrained Problem
3.2.1 Equality Constrained Problem
3.2.2
Powell-Hestenes
Augmented Penalty Function
3.3
Review
of the
Technique
for a
Constrained Problem with
Equalities
and
Inequalities
3.3.1 Constrained Problem
3.3.2 Powell-Hestenes-Rockafeller Penalty Function
4.
NUMERICAL RESULTS
4.1
Introduction
4.2
Numerical Results
for
Unconstrained Algorithms
4.3
Numerical Results
for
Constrained Algorithms
APPENDIX
A.
MATHEMATICAL CONCEPTS
AND
PENALTY FUNCTION TECHNIQUES
APPENDIX
B.
COMPUTERIZED ALAG ALGORITHM
AND
APPLICATION
APPENDIX
C.
COMPUTER PROGRAM DOCUMENTATION
REFERENCES
I.
INTRODUCTION
The
current advanced stage
of
development
of the
theoretical
framework
of
-unconstrained optimization
has
served
as a
powerful force
for
unification
of the
subject which, until some years ago, consisted
of
a
collection-of
disjointed algorithms.
The
evolution
of
these
algorithms depended strongly
on
practical computation
of
solution
to
specific
problems.
The
interplay
of
theory
and
algorithms
has
made
it
possible
to
transfer theoretical progress into improved algorithms.
Powell
(P5)
has
reviewed comprehensively modern algorithms
and
the
effect
of
theoretical
work
on the
design
of
practical algorithms
for
unconstrained optimization. Murray (Mil)
has
presented
the
main-
.stream
of
developments
in
numerical methods
for
unconstrained optimization.
Much
of the
current research
has
been focused
on
understanding, comparing,
improving
and
extending
the
available numerical methods instead
of
devising totally
new
algorithmic concepts. These refinements
and
modifi-
cations
are
.not expected
to
significantly improve
the
efficiency
of
existing
algorithms (G2).
At
present
a
robust collection
of
potent
and
sophisticated general
purpose
algorithms
for
unconstrained optimization
is
available
as
high-
quality
software (G2).
These
algorithms have been tested
and
proven
to
be
efficient
and
reliable
for
solving
a
variety
of
typical test problems
and
practical problems. Successful development
of
such algorithms
for
unconstrained optimization
has
been
the
springboard
for the
more recent
success
in the
design
of
algorithms
for
constrained problems.
Availability
of
efficient numerical methods
for
solving
unconstrained optimization problems
has
motivated
the
design
of
algorithms that convert
a
constrained problem
to a
sequence
of
unconstrained problems which have
the
property
that
successive
solutions
of the
unconstrained problems converge
to the
solution
of
the
constrained problem.
This
transformation approach
has
been
systematically employed
in the
development
of
numerical algorithms
for
constrained optimization
for
more than
a
decade.
In
recent years
a
substantial
body
of
theory
has
been established
for
these transfor-
mation techniques
and
many computational algorithms have been
proposed
(B4), (Fl), (L3).
To
review briefly
the
transformation technique, consider
the
following inequality constrained
nonlinear
programming problem.
Let
(2)
f(X)
and c. (X) i =
l,2,....,m
be
real valued functions
of
class
C
'x/
i "u
on a
nonempty open
set L in an
n-dimensional Euclidean space
E
PI :
Minimize
f)
over
all X £ L .
Subject
to c. (X) > 0, i =
1,2,
. . . . ,m
where feasible region
F is a
nonempty compact set.
F
= {X : c. (X) > 0 i=
1,2,
,m, Xe L, L c. E
n
>
'v
j_ % = 'v
Methods
for
solving
PI via
unconstrained
minimization
have been
classified, described
and
analyzed
in
detail
by
Lodtsma (L3).
Para-
metric
transformation methods solve
PI by
reducing
the
computational
process
to .a
sequence
of
successive unconstrained minimizations
of a
compound
function defined
in
terms
of the
objective function
f
(X),
the
constraint functions
c. (X) i =
l,2,....,m
and one or
more
controlling parameters.
By
gradually removing
the
effect
of the
constraints
in the
compound function
by
controlled changes
in the
value
of
one or
more parameters
a
sequence
of
unconstrained problems
is
generated. Successive solutions
of
these unconstrained problems
converge
to a
solution
of
the. original constrained problem.
The
advantage
of
this approach lies
in the
fact
that
the
constraints need
not be
dealt with separately
and
that
efficient numerical methods
for
computing unconstrained extrema
can be
applied.
During recent years
the
parametric transformation technique
known
as the
Augmented Lagrangian
(ALAG)
Penalty Function Technique
has
gained recognition
as one of the
most effective
type
of
methods
for
•solving
constrained
minimization
problems.
In the
opinion
of
many
researchers
in
this field,
the
ALAG penalty function technique
is the
best
method available
for
solving
problems with nonlinear constraints
in the
absence
of
special structure (B4).
The
disadvantages
of the
method
are
negligible
and the
advantages
are
strong, especially
the
lack
of
numerical difficulties
and the
ease
of
using
the
unconstrained minimi-
zation routine.
The
method
has
global convergence
at an
ultimately
superlinear rate,
the
computational effort
per
minimization falls
off
rapidly,
initial starting point need
not be
feasible
and the
function
is
defined
for all
values
of the
parameters (F7).
The
ALAG penalty function technique
is a
balance between
the
classical penalty function technique
and the
Lagrangian primal-dual
method
which
are
both parametric transformation techniques.
The
design
of
this method
was
motivated
by
efforts
to
overcome
the
numerical
in-
stability
of the
penalty function technique near
the
solution (P3),
(H2)
and
attempts
to
eliminate
the
"duality gap"
in
nonconvex
programming (R6).
The
classical penalty function
technique
and the
Lagrangian p»imal-dual
method
are
briefly reviewed
and the
development
of
the
ALAG penalty function technique
by.the
merger
of the
penalty
idea
with
the
primal-dual philosophy
is
traced
in
section
2. The
ALAG penalty, function technique
is
described, reviewed
and
discussed
in
section
3. The
results
of
numerical
investigations
are
presented
in
section
4. The
symbols, mathematical terms
and
related concepts used
in
this
work
are
defined briefly
in
appendix
A. The
method
of
solving
a
nonlinear
problem using
the
ALAG penalty function technique
is
illustrated with numerical examples
in
appendix
B.
II.
REVIEW
OF
RELATED MINIMIZATION TECHNIQUES
2.1
Penalty Function Technique
The
pertalty
methods have been extensively used
in
numerical optimiza-
tion
for
more than
a
decade.
The
penalty function approach
has
been
popular,
as
evidenced
by
applications
to
practical problems (D3), because
it is
conceptually simple
and
easy
to
implement.
It
permits
a
transparent
program
structure
as it is
fully based
on
unconstrained minimization. These methods
are
applicable
to a
broad class
of
problems, even those involving noncpnvex
constraints.
The
most attractive feature
of
these methods
is the
fact
that
they
take advantage
of the
powerful unconstrained minimization methods
that
have
been developed
in
recent years.
The
penalty function technique
is a
sequential parametric transformation
method.
It is an
iterative algorithm
that
requires
the
solution
of an
unconstrained optimization problem
at
each iteration.
In
these methods
the
objective function f(X)
is
minimized using
an
unconstrained minimization
Oi
technique
while
maintaining
implicit control over
the
constraint violations
by
penalizing
the
objective function
at
points which violate
or
tend
to
violate
the
constraints.
The
solution
X* to the
constrained minimization
-v
problem
Pi is
approached from outside
the
feasible region
f and
these
methods
are
also referred
to as
exterior point methods.
The
penalty
function technique
has
been popularized
mainly
through
the
work
of
Fiacco
and
McCormick (Fl). Fiacco
and
McCormick (Fl) developed
the
Sequential
Unconstrained
Minimization
Techniques
(SUMT)
for
nonlinear programming
using penalty function
and
related concepts.
A
chronological survey
of
the
development
of the
penalty methods
and
detailed discussion
and
analysis
of
penalty
and
related methods
are
presented
in
reference
(Fl).
The
penalty function method
for PI
consists
of
sequential minimizations
of
the
form
minimize
P(X,
a), X e L £ E
n
P(X,
a) is the
penalty function with control parameter
o > 0.
This function
is
designed
to
impose
an
increasing penalty
on the
objective function
as
constraint violation
increases.
The
control parameter
a is
used effectively
to
increase
the
magnitude
of
penalty.
The
penalty function transformation
may be
represented
as
m
P(X,
a) =
f(X)
+ a S
n.(c.(X)),
a > 0
where
[1]
i\j
'V 1=1 3- 3- 'X/
n.(t)
is
defined
as the
loss function with
the
following properties.
(i)
n.(t)
is
continuous
on° < t <
m
(ii)
for
inequality constraint
c.(X)
> 0
1
^ -
n.
(t) -> °° as t
->-
-» and n. (t) = 0 for t > 0
(iii)
for
equality constraint
c.(X)
= 0
1
^
n.(t)
> 0 Vt,
n.(t)
= 0 for t = 0 and
n
(t) -> «
as
t
±°°
Usually
the
loss function,
n.(t),
is
chosen such that when
the
objective
(2)
function
and the
constraint functions
are of
class
C ,
P(X,
a) is
twice
Oi
differentiable.
P(X,
o) is
defined
on an
open
set L S E and
P(X,
o) -> °°
'V /V
as
constraint violation
increases.
Several different loss functions have been proposed
for use in the
penalty
function algorithm
and
these
are
discussed
by
Fiacco
and
McCormick
7
(Fl).
The
most commonly
and
widely used loss function
is the
quadratic
loss function.
For an
inequality constraint
c.(X)
> 0,
quadratic loss
i >\, =
2
function
is n (c
(X))
=
[min
(0, c
(X))]
. For an
equality constraint
1 i
-v
1
^
c.(X)
= 0, th£
quadratic loss function
is
n.(c.(X))
=
(c.(X))
.
i ^ i i -v i %
An
elaborate treatment
of the
penalty function algorithm
can be
found
in
(Fl), (L5)
and
(Zl)
for a
general nonlinear problem.
The
basic algorithm
may
be
represented
as
follows:
(k)
(i)
Select
an
infinite sequence
{a }
which
is
monotonically
increasing
as k
°°.
Find
X i F>
where
F is the
feasible
a.
region defined
by the
constraint functions.
Set k = 0.
(ii)
Set k = k + 1.
(iii)
Minimize
P(X,
a^')
to
find
X(a^)
= X^'
starting
the
f\j
r
O 'Y*
(k-1)
minimization from
X .
Return
to
(ii)
if
convergence
is
not
satisfied.
Convergence tests
in
step
(iii)
are
usually based
on the
magnitude
of
quantities such
as
(f(X
(k)
)
-
f(X
(k
~
1}
))
and ||
X
(k)
-
X
(k
~
1)
||
where
|| X ||
is
the
Euclidean norm
of the
vector
X.
Other convergence criteria
are
discussed
by
Fiacco
and
McCormick'(Fl).
It is
assumed
that
the
function
(k)
f(X)
is
bounded below
so
that
a
solution
X to the
unconstrained minimization
*\, 'Xf
(k)
in
step
(iii)
exists
for
each
a . In
step
(i) the
initial starting point
X^
' is
outside
the
feasible region
F and the
trajectory corresponding
to
<\j
the
sequence
{X . }
generated
by the
algorithm lies outside
F.
Therefore
v
penalty function methods
are
also
known
as
exterior-point methods.
Any
(k)
limit
point
of the
sequence
{X }
generated
by the
penalty method
is a
solution
X* to the
constrained minimization problem
PI
(H4), (L5), (Zl).
The
penalty function technique might
be
regarded
as a
"primal"
approach
to
implicitly account
for the
constraints, although
its
connections with
duality
are
known
(Fl), (L5), (Zl).
The
approximation
of the
constrained
problem
by the
unconstrained penalty problem becomes more
and
more exact
as
the
control parameter
a -> °°.
However considerable computational difficulties
are
experienced with
the
traditional penalty function algorithm
as o -> °°.
These difficulties
are
delineated
in
detail
in
references (L3), (L5), (M5),
(Rll).
The
computational difficulties arise from P(X,
a)
forming
an
increasingly
'v
steep-sided
valley
as the
control parameter
is
increased
to
allow
the
unconstrained solutions
to
approach
the
constrained solution
to Pi
from
outside
the
active constraints.
In
particular,
the
Hessian
matrix
of the
penalty
function
[1]
becomes extremely ill-conditioned
as a
increases.
This leads
to
numerical
instabilities during unconstrained minimizations
of
the
penalty function
and
slow
convergence
of the
algorithm.
Attempts
to
overcome these computational difficulties have resulted
in
several modifications (Fl), (F2), (L3)
to the
penalty function technique.
Hestenes (H2)
and
Powell
(P3),
at
about
the
same time, independently
proposed
modifications that resulted
in a new
method related
to the
penalty
function
technique.
In
this
new
-method
penalty terms
are
added
to the
Lagrangian associated with
the
original
constrained problem. Hestenes (H2)
termed
this
the
"Multiplier Method".
It has
become
known
as the
Augmented
Lagrangian Penalty Function Technique
in
subsequent discussions. This
method
alleviated some
of the
computational difficulties associated with
the
traditional penalty function technique (F8)
and
achieved
better
convergence properties than
the
method
of
penalty functions (HA).
This
method
is
reviewed briefly
in
Chapter
3.
2.2
Lagrangian
Primal-Dual
Method
The
Lagrangian primal-dual method transforms
a
constrained convex
programming
problem into
a
sequence
of
unconstrained minimizations
of the
classical Lagrangian associated with
the
constrained
minimization
problem.
The
constrained problem
PI
becomes
a
convex programming problem when
the
objective
function f(X)
is
convex
and the
constraints c.(X)
i = 1, 2,
...,
m
a.
la.
are
concave.
The
concept
of the
primal-dual method
was
first implemented
by
Arrow,
et al.
(Al)
in the
differential gradient scheme
for
approaching
the
saddle-point
of the
Lagrangian L(X,
X)
associated with
a
convex program.
Oi "(j
The
Lagrangian associated with
the
convex problem
PI may be
represented
as
m _
L(X,
X) = f (X) - E X.
c.(X),
X £ L % E
n
, X e E . [2]
'X*
*\j fYi i 1 •*- 'Xf 'Xl *\»
-v
1=
j_
m
m
where
E is the
nonnegative orthant
of
m-dimensional
Euclidean
space
E
m
and
the
vector
X e E, is
called
a
vector
of
multipliers.
Suppose
that
a
point
X*
satisfies
the
constraints
of the
convex program
PI and the
problem functions
are of
class
C . If
there exists
a
vector
X*
such that
X*
> 0, X.*
c.(X*)
= 0 V i and
VL(X*,
X*) =0, [3]
i i a. a, <\, i\,
then
X* is a
global solution
to the
convex program
PI. The
vector
X* is
a.
^
said
to be the
vector
of
Lagrange multipliers associated with
X*. If the
.
- a.
gradients
of the
active constraints
at X* are
linearly independent, then
X*
<\i
a.
is
a
regular point
of the
feasible region
F and
there exists
a
vector
of
Lagrange multipliers
X*
satisfying [3].
The
conditions
in [3] are
called
a.
the
Kuhn-Tucker first-order necessary conditions
for X* to be a
solution
to
1Q
PI and for the
convex program
PI,
these
are
also sufficient conditions
for
X* to be a
global solution.
For a
nondifferentiable convex problem
PI let
<\j
n
m
there
exist
an X* e E and a X* e E.
such
that
the
pair (X*,
X*) is the
^ >\j ^ ^
saddle-point
of the
Lagrangian L(X,
X)
-associated
with
the
convex program
* . 'x. a.- .
PI,
i.e., L(X*,
X) <
L(X*,
X*) <
L(X, X*).
Then
X* is the
global solution
r\,
r\j ~ ^ ^
1>
r
l>
. r\,
to
the
convex program
PI and X* is the
vector
of
Lagrange multipliers
a.
associated
with
X*.
oj
The
differential
gradient scheme
of
Arrow,
et al.
(Al)
for a
convex
program
may be
viewed
as a
small-step
primal-dual
method
where estimates
of
X* and X* are
modified
at
each iteration
to
exploit
the
saddle-point
'Xi
Oi
nature
of
L(X,
X)
near (X*, X*). This structure
of the
method
is
revealed
f\,
<\, f\j ^ .
by
the
system
of
difference equations formulated
by
Uzawa (Al)
to
represent
the
differential gradient method. Davis (Dl) represents
the
iterations
in
this method
as
B-LCX,
x
(k)
)
r\,
1 1 <Vi 'Vi 'Vi
(k+1)
_ (k) -1 (k) (k)
X
= mm ID, X - a. o
VXL(.X
, X J
where
a and a are
scalars
representing step-size, VL(X,
X) is the
gradient
X 2. . <\( <\, o>
of
L(X,
X)
with respect
to X,
VXL(X,
X) is the
gradient
of
L(X,
X)
with
'X-
% f\j
'X/X< •'X;
'X. 'X/ 'Xi
respect
to X and B and B are
positive definite matrices
of
order
n and
r\j
1 i
(0)
(0)
m
m
respectively.
The
algorithm
may be
started
at any X
v
' e F and X e E,.
'X, »\» "*"
As
the
constrained problem
in the
above method
is
convex,
the
Lagrangian L(X,
X) is
also convex with respect
to X. The
iterations
on
'x/
a. <\/
(k)
X ,
therefore,
are
descent iterations
on
L(X,
X) and
update
of
multipliers
r\j 'Xl "Xl
11
X
may be
viewed
as
ascent iterations
on
L(X,
X). The X
update
may
O.
i\, r\, %
also
be
regarded
as
approximate solutions
to the
associated dual problem
(k)
at
X^ . The
dual associated
with
PI is .
<\,
m
Dl":
Maximize
l/(X)
over
all X e E
a.
^>
+
l/(X)
=
infimum L(X,
X) , X e L .
'Xi
'V 'Vi *X»
The
Lagrangian L(X,
X) is
minimized over
X e L for a
sequence
of
multiplier
"(j
Oi TJ
(k)
vectors
X and the
algorithm
is a
primal-dual method. Methods that
are
a.
similar
in
concept
to
this algorithm
are
described
by
Powell (PA)
,
Bertsekas
(B4)
, and
Lasden (LI)
.
The
algorithms based
on
Lagrangian primal-dual method
are not
susceptible
to
numerical
instabilities
such
as
those
discussed
in
connection
with
the
penalty method. Primal-dual methods
are
based
on the
viewpoint
that
the
Lagrange multipliers
X* are
also fundamental
unknowns
associated with
a
%
constrained problem. This
is due to the
reason
that
Lagrange multipliers
measure sensitivities
and
often have meaningful interpretations
as
prices
associated with constraint resources (H4)
,
(L5). Useful duality results
for
convex programs have been presented
by
Luenberger (L5)
and
Zangwill (Zl)
.
Various formulations
of the
duality theory
for
nonlinear convex programs
using
the
classical
Lagrangian have been reworked
and
extended
by
Geoffrion
(Gl)
so as to
facilitate, more readily, computational
and
theoretical
applications.
Methods based
on the
classical
Lagrangian
for
solving
a
constrained
problem
PI
have been reviewed
by
Lootsma (L3)
.
The
Lagrangian primal-dual method
is
known
to
have serious disadvantages
(R3)
,
(R6)
. The
most restrictive
one is
that
the
constrained problem must
12
be
convex
In
order
for the
dual
problem
to be
well
defined
and X
iterations
<\>
to
be
meaningful.
In
general
inf
(PI)
> sup
(Dl)
and the
equality holds
good
only
for the
convex problem
PI. For
nonconvex problems only
the
inequality
hcvlds
in the
above relationship
and in
such cases
a
duality
gap
is
said
to
exist.
For
nonconvex problems Everett (E2) introduced
a
primal
dual
method called generalized Lagrange multiplier method. This
and
other
associated methods
are
summarized
by
Lootsma (L3). Even though Everett (E2)
suggested
some methods
of
handling
the
duality gap,
the
method
has
been
found
to be
useful only
for
certain
nonlinear
problems with special structure.
The
method
is of
importance
in the
decompoisition
of
large-scale problems
with separable functions.
In
such cases minimization
of the
Lagrangian
can .
be
carried
out
efficiently
due to the
special structure
of the
constrained
problem
(E2), (LI), (L5).
For
a
convex program,
if X* is the
optimal solution
to the
constrained
"u
problem
with corresponding Lagrange multiplier vector
X*,
then
X* is the
>\j
<\,
unconstrained minimizer
of
L(X, X*). However,
if X* is a
local solution
to
a.
a. . ^
a
nonconvex program with corresponding Lagrange multiplier vector
X*,
then
^
X* may not be the
unconstrained local minimizer
of
L(X,
X*) and
L(X,
X*) may
^ r\j ^ ^ Oi
even
have negative second derivatives
at X* in
certain directions normal
Oi
to
the
feasible manifold
F
(R3). Since curvature
at a
point
is
determined
by
the
second partial derivatives, attempts were made
to
make
the
Lagrangian
associated with nonconvex programs
a
convex function
by
adding quadratic
penalty
terms
to it.
This concept
was
first suggested
by
Arrow
and
Solow (Al)
in
connection with
the
solution
of a
nonconvex equality constrained problem
using
the
differential gradient method. Arrow
and
Solow augmented
the
classical Lagrangian with quadratic penalty terms
and
this
'elegant idea
13
made
the new
augmented Lagrangian locally convex. This idea
was
independently
reconsidered
in an
entirely different algorithmic context
for
equality
constrained problems
by
Hestenes (H2),
Powell
(P3)
and
Haarhoff
and
Buys (Hi).
The
algorithms
that
resulted from these efforts belong
to the
Augmented
Lagrangian Penalty Function Technique which
is
reviewed
in
Section
3.
14
III. AUGMENTED LAGRANGIAN PENALTY FUNCTION TECHNIQUE
3.1
Introduction
The
ALA'G
penalty function technique
may be
reviewed from
two
entirely
different
points
of
view.
The
first view-point
is
that
the
methods that
belong
to
this technique modify
the
Lagrangian associated with
a
nonconvex
or
a
weakly
convex constrained problem
to
have
a
local convexity property.
This
is
because
the
characterization
of
solution
to a
constrained problem
in
terms
of a
saddle-point
of the
Lagrangian depends heavily
on
convexity
properties
of the
underlying problem.
The
local saddle-point property
is
obtained
by the
presence
of a
convexifying parameter
in the
Lagrangian
which
makes
the
associated
Hessian
positive definite
for
large enough,
but
•finite,
values
of the
parameter.
Following
this idea
of
local
convexification
many
different modifications
of the
classical
Lagrangian have been proposed
to
close
the
duality
gap in
nonconvex programming (Al), (A2), (M2).
The
second viewpoint
is to
consider
the
technique
and the
quadratic
penalty
function method
within
a
common generalized penalty function frame-
work.
The
approach here
is to
circumvent instabilities associated with
the
classical
penalty function method
by
adding penalty terms
to the
Lagrangian
function.
The
advantages
of
using
a
first-order penalty furnction have
been
listed
by
Lootsma (L3)
and
McCormick (M5)
.
Therefore methods that
augment
the
Lagrangian with quadratic penalty terms
are
considered
in
detail.
The
development
of the
ALAG penalty function technique
is
traced from
the
second
viewpoint.
15
3.2
Review
of the
Technique
for
Equality Constrained Problem
3.2.1 Equality Constrained Problem
The
equality constrained problem
P2 may be
represented
as
follows:
P2:
Minimize f(X)
^
Subject
to
c.(X)
=0 i = 1, 2,
...,
m m < n
1
^ ~ '
(2)
f(X)
and
c.(X)
i = 1, 2,
...,
m are
real-valued functions
of
class
C
defined
on a
nonempty open
set L SL. E . The
Lagrangian associated with
P2
is
m
L(X,
X) =
f(X)
- Z X, c
(X),
X e
E
m
..
[4]
l
\t
*\j
f
\j i 11 f\j f\.
±=1
The
gradient
and
Hessian
of
this
Lagrangian
with respect
to X are
VL(X,
X)
2
and
V
L(X,
X)
respectively.
*\/
f^j
Let
X* be an
optimal solution
to P2 and the
problem functions f(X)
(2)
and
c.(X),
i = 1, 2,
...,
m be of
class
C in an
open neighborhood
of X*.
The
following
are
assumed
to
hold
good
at X*.
(i) The
point
X* is a
regular point
of the
feasible
set
"u
F
= {X:
c.(X)
= 0 i = 1, 2,
...,
m, X e L S E
n
}
Let
N* =
N(X*)
be the nxm
matrix [Vc,, Vc
0
, ...,
Vc ].
r\j
i\j J. >\, Z ^ m
The
regularity
of the
feasible
set at X* is
satisfied when
N* is of
full
rank.
(ii) There exists
an
unique Lagrange multiplier vector
X*
such
that
the
following first-order necessary conditions
for
local optimality
at X* are
satisfied.
X*
e E
m
,
c.(X*)
= 0 Vi and
VL(X*.
X*) = 0. . [5]
f
\j
1 'V;
/
\J *X> *\>
16
(iii)
The
second-order
necessary
conditions
for
local
optimality
at
X* are
that
in
addition
to [5]
^
Y
T
V
2
L(X*,
X*) Y > 0 . V Y e V^ E" [6]
^ 'Xi % Oj ^
V
= {Y: Y
T
Vc. = 0 Vi}
>\>
"u O, 1
(iv)
The
second-order sufficient conditions
for X* to be an
isolated
"c
local minimum
are
that
in
addition
to [5]
Y
T
V
2
L(X*,
X*) Y > 0 V
nonzero
Y e V [7]
^ O> "\j <\, ^
(v)
Strict complementarity holds
at X*,
i.e.
, X.* ^ 0 Vi
it 1-
3.2.2
Powell
-
Hestenes Augmented Penalty Function
Powell
(P3) suggested
the
following penalty function
to
solve
P2.
,
m
2
<Kx,
e, s) =
f(x)
+± z
o.(c.(x)
- e.)
*\i
<\i r\j 2. -I 1 1 O> !
=
f(X)
+ \
(c(X)
-
6)
T
S (c(X)
- 6) [8]
where
9 e E
m
,
C(X)
is a
vector
of
constraint functions
c . (X) i = 1, 2,
...,
m
o>
'Vi 'v ' i *\<
and
S is a
diagonal matrix
of
order
m
with,
diagonal elements
a. > 0. Let
a e E be a
vector with
a. as
components. While
the
classical penalty
/\,
"*•*"
i
function
for P2
contains
at
most
m
control parameters,
the
above function
depends
on 2m
parameters which
are the
components
of 6 and a. The
main
f\j
i\,
difference between
classical
quadratic penalty function
and [8] is the
introduction
of
parameters
9. In [8]
quadratic penalty terms have been
<Vi
added
to the
Lagrangian associated with
P2.
The
augmented penalty function
(j>
is
used
in the
algorithm
as
follows.
17
Algorithm
Al:
(i)
Select 9
(1)
=0.,
k = 0,
S
(1)
= I and
X
(0)
i F.
o> >\j
(ii)
k = k + 1
(iii)
Minimize
<J>(X,
9
(k)
,
S
(k)
)
to
find
=
X(6 , S )
starting
the
unconstrained
*\i
^
minimization
from
X . .
>\,
(k)
(iv)
If C(X ) is
converging sufficiently rapidly
to
zero then
e
(k+l?
=
6
(k)
'Vl
^
=
s
(k)
and
return
to
otherwise
(i/io)
e
(k)
^
1Q
g
(k)
and
return
to
•In
step
(ii)
<|>
is
minimized with respect
to X
without constraints
for
fixed
<v/
(k) (k)
values
of 6 and S and
this
is the
inner iteration
of the
algorithm.
^
(k) (k)
Step
(iii)
is the
outer iteration
in
which
6 and S are
changed
to
%
(k)
force
constraint satisfaction
and
cause
the
sequence
of
solutions
{X }
^
to
converge
to X* at a
reasonably fast rate.
^
The
scheme
for
adjusting
0
parameters
in the
outer iteration
is
(k)
based
on the
observation
that
if
X
vr
"'
is the
minimizer
of
<j>(X,
9
V
~' , S
*v/
a. ^
(k)
in
the
inner
iteration, then
X is
also
a
solution
of the
problem
Minimize
f(X)
X e L <= E
n
Subject
to
C(X)
=
In
order
to
solve
the
equality constrained problem
P2 it is
sufficient
to
find
6
(k)
and
S
(k)
such
that
X
(k)
=
X(9
(k)
,
S
(k)
) solves
the
system
of
'Vi
f\, *\i 1*
nonlinear equations
18
C(X(6
(k)
,
S)) - 0. [9]
The
above system
of
equations
is in
terms
of 2m
parameters
8 and a
i
i
(k) (k)
i = 1, 2, . . . , m. One
vector
of
parameters
0 or. a may be
fixed
and
[9]
then
is a*
system
of m
equations
in m
remaining parameters.
(k)
If
6 is
fixed, then
[8]
reduces
to a
basic penalty transformation.
f\j
Specifically when
0
parameters
are set to
zero,
$
becomes
the
classical
<\j
quadratic
penalty function.
In
such
a
case convergence
of the
sequence
(k)
{X } to X* is
ensured
by
letting
a.
-*
°°, i = 1, 2 . . . , ra.
This leads
a-
'v i
to
numerical instabilities
and
slow convergence. Therefore
in
Powell's
(k) (k)
method
S is
held constant
and 0 is
changed
to
force constraint
<\,
satisfaction through iterative solution
of
[9].
Powell
(P2)
derived
a
(k) (k)
simple correction
for
adjusting
0
parameters when
S is
fixed
by
'V
•applying
generalized Newton iteration
to
solve
[9].
This correction
is
represented
as
(k)
'
(k)
(k) (k)
By
definition
X is the
unconstrained minimizer
of
<j>(X,
6 , S ).
Oj "\> O;
Therefore
V4>(X
(k)
,
0
(k)
,
S
(k)
)
= 0,
i.e.,
Vf(X
(k)
)
+ I
a.(C.(X
(k)
)
-
0.
(k)
)
VC.(X
(k)
)
= 0.
[11]
Continuity
of
C(X)
in the
neighborhood
of the
local minimizer
X* of P2
<X>
'Xi 'Vi
(\r\ df-V
implies
that
the
matrix
N(X
V
') is of
full rank
for X
v
'
sufficiently close
/U\
to
X*.
When
X
v
' is in the
neighborhood
of X* and
when
X
v
'
->-
X* the
estimates
X
(k)
=-S
(k)
(C(X
(k)
)
-0)
[12]
i\,
19
exist
and
have
as
limit points
the
unique values
X* = S* 9*,
where
9* and S*
are the
parameters corresponding
to X*.
Hence
the
final value
of the
product
S 9, in the
limiting sense,
is a
constant
and may
considered
to be
independent
of S
when
S is
fixed
and 9 is
adjusted
to let 9
-
6*. Due to
<\, "" <\,
this
reason, when
S is
increased
in
step (iii)
of
algorithm
Al to
improve
the
rate
of
convergence
of the
sequence
{max|c.(X
. )|) to 0 and {X } to
,
i ^
X*, 6 is
decreased
to
keep
S 9 a
constant.
Convergence
of the
algorithm
is
measured using
the
sequence
{max|c.(X
)|),
Under
the
assumptions
in
3.2.1
and
when
the
Hessian matrix
of
<}>
is
positive
definite
at X*,
Powell
(P2) proved
that
the
rate
of
.convergence
is
linear
and
the
convergence ratio depends
on
I/a.
for a. > a
1
. The
threshold value
a' is a
large
but
finite positive real number. Therefore
by
choosing
S to
be
large
so
that
S is
close
to S',
where
S
1
=
a'I,
the
algorithm
can be
made
to
have linear convergence
at any
arbitrary rate. Superlinear convergence
is
achieved when
a. -> °°. In
Powell's
algorithm
the
rate
of
convergence
is
taken
to be
satisfactory when
the
maximum residual,
max|c.(X
)|, of the
system
of
equations
[9] is
reduced
by a
factor
of
four
on
each iteration.
The
reason
for
preferring
the
slower rate
of
convergence implied
by the use
of
factor four
is
that faster convergence tends
to
make
the
inner iterations
(k)
more difficult (P2).
When
the
sequence
{max|c.(X
)|)
either fails
to
converge
or
converges
to
zero
at too
slow
a
rate,
S is
increased
by a
factor
of
ten.
The
choice
of
factor
ten to
increase
S is
arbitrary.
Numerical evidence indicates
that
the
value
of a is
seldom required
to
2
be
greater than
10 to
ensure rapid convergence
(Rll).
The
Hessian matrix
of 4>
depends
on
both
9 and S. The
change
in
this
matrix
is
dominated
by the
increase
in S
(P2). This
is
another reason
20
for
using
a
factor
of ten to
increase
S
when
the
rate
of
convergence
is
slow
and
keeping
S
constant when rate
of
convergence
is
satisfactory.
If
S is
chosen
to be
large
in the
initial iteration, instead
of
gradually
increasing
S, the
Hessian
of
<|>
becomes ill-conditioned
and the
unconstrained
minimization
of
<J>
in the
inner iteration becomes very difficult
to
perform.
Further
for a
large
S, an
arbitrary starting point
X ' and
arbitrary values
'Vi
(k)
of
0
parameters,
the
sequence
(X } may not
converge
to X*.
Therefore
S
a, . r\, <^
(k) (k)
is
increased
so as to
force
X
into
a
region
in
which
sequence
{X }
^ <\,
locally converges
to X*.
Once this region
is
reached,
S is
kept constant
(k)
and
6
parameters
are
adjusted
so as to let X
X*.
Further
the
gradual
<\,
<x, %
increase
of S is
designed
to
make
<f>
continuous
and
continuously
dif
ferentiable
with respect
to X for all
values
of the
parameters.
In
Powell's
algorithm
the
minimizations
in the
inner iteration
are not
beset
by
computational
difficulties associated with
the
basic penalty function transformations.
The
minimizations
are
well
scaled
and
progressively less computational
(k)
effort
is
required
as k
increases
and X
->
X*.
1j "Vi
Hestenes (H2)
,
independently
of
Powell
and at
about
the
same time,
proposed
a
similar method
for
solving
P2 and he
called
it the
method
of
multipliers.
The
method
is
based
on the
observation
that
if X* is a
^
nonsingular minimum
of P2,
there exists
a
multiplier vector
X* and a
constant
Oj
a
such that
X*
affords
an
unconstrained local minimum
to the
function
T(X,
X*, S) =
f(X)
- X*
T
C(X)
+ 1/2
(C(X))
T
SC(X)
.
[13]
'V.'V.
O>'X<f\,'Vi
Of'X.'V'Xi
where
S = ol.
Conversely,
if
C(X*)
= 0 and X*
affords
a
minimum
to
[13],
^ 'Vi 'Vi
then
X*
affords
a
minimum
to P2. In the
method
of
multipliers
a
large
21
positive
constant
a is
suitably chosen
and is
held fast.
The
augmented
penalty function considered
is
T(X,.X,
S) =
f(X)
- X
T
C(X)
+ 1/2
(C(X))
T
SC(X) [14]
i
r r i
where
X e B and 8 is an
arbitrary compact subset
of E . The
function
in
/
v>
[14]
is
sequentially minimized
for
successive estimates
X of the
unique
a. .
Lagrange multiplier vector
X* at X*.
"Xi
^i
(k) (k)
The
unconstrained minimization
of
T(X,
X , S) for an
estimate
X
r\,
<\, %
(k) (k)
of
X* is the
inner iteration.
Let X = X(X , S) be an
unconstrained
^ -v i,
(k) (k)
minimizer
of
T(X,
X , S) . In the
outer iteration
the
estimate
X is
a.
a/ ^
(k)
updated
so as to
cause
X -> X*.
Hestenes suggested
the
following
formula
'V
O;
(k)
for
adjusting
the
multiplier vector
X
a.
where
S = o I, 0 < a <. a, a = yo and 0 < y < 1- The
relation
(k)
[15]
is
derived from
the
observation
that
X is a
local minimizer
of
-v
(k)
(k+1)
T(X,
X , S) and X is
chosen
so
that first order necessary conditions
^ <\, ^ .
(k)
are
satisfied
at X for P2.
Hestenes
(H2)
did not
analyze
the
convergence
of
the
method,
but
subsequently (H4) established that
the
method converges
linearly
and
superlinear convergence
may be
achieved when
a
°°. In
practical
applications very fast linear convergence occurs
for a
large
but
finite value
of o.
Convergence
is
induced
by not
only
a
large value
of
a but
also
by
multiplier iteration [15] (F8)
.
In
Powell's
method when
S is
fixed
and 6
parameters
are
adjusted
to
'V
(k)
let
X
->-
X*, the
unique Lagrange multiplier vector
X* = S 6*,
where
6*
22
corresponds
to the
vector
of
parameters
at X*.
This implies
that
a
'Xi
connection
can be
established between
the
augmented function
<f>
in [8] and
T in
[14]
using
the
relationship
i
V
i =
1> 2>
'"'
m
'
From
[8], [14]
and
[16],
1
m
2
<KX,
0, S) =
T(X,
X, S) +± Z \
.la.. [17].
r\j
<\, >\j f\j L
._•]
J- 1
The
difference between
<f>
and T is
independent
of X. If
X(6,
S) and
X(X,
S)
'X-
<V; O. 'X/ O>
are
unconstrained
minimizers
of
<j>
and T
respectively
for any S and if 6 and
a.
X
are
related
as in
[16],
then
X(6,
S) =
X(X,
S).
Therefore
the
iterative
<\,
fx, Oi *\J i\,
-methods
suggested
by
Powell
and
Hestenes
for
changing
9 and X
parameters
'Vi
'Xi
are the
same.
In
view
of the
equivalence relationship
[17]
between
cf>
and T, the
numerical algorithm
Al is
discussed
in
terms
of the
augmented penalty
function
T. - In the
outer iteration adjustment
of X
parameters using
[15]
%
is
considered, assuming
that
6 and X are
related
by
[16].
The
algorithm
f\, 'Xi
Al is
discussed
and
analyzed using
X
parameters
to
emphasize
the
primal-dual
<\j
(k)
nature
of the
method which iterates with
an
approximation
X to the
'V,
(k)
Lagrange multipliers
X* in
such
a way as to
make
X •* \*.
'Xi
^
The
algorithm
Al is now
modified
and
denoted
as the
Powell-Hestenes
augmented
penalty function algorithm
A2. The
convergence
of the
algorithm
is
measured
in
terms
of B =
max|c.(X)|.
23
Algorithm
A2:
(i)
Select
X
(1)
=
X
(0)
, S
(1)
=
S
(0)
,
k = 0,
arbitrary starting
point
X
(0)
and
B
(1)
= B_
where
B
0
>
max!c.(X
(0)
)I.
'v.
0 0 = . ' i
<\,
'
i
(ii)
k = k + 1
(iii)
Minimize
T(X,
X
(k)
, S
(k)
)
to
find
X
(k)
=
X(X
(k)
,
S
(k)
)
(k-1)
starting
the
unconstrained minimization from
X
a.
(iv) Find
V = {i:
|c
±
(X
(k)
)|
>
B
(k)
/4}.
If
max|c
(X
(k)
)|
>
B
(k)
,
set
B
(k+1)
-
B
(k)
.
Go to
(vii).
1 *\; ^
1
(v)
B
(k+1)
=
max|c
i
(X
(k)
)|.
If
B
(k+1)
< E
stop.
The E is a
i
specified
tolerance
for B.
(vi)
If
B
(k+1
>
<
B
(k
V4
or
X
(k)
= X
0
^
*\»
^
set
X -
A<
k
>
-
S
(k)
C(X
(k)
)
,
go to
(ii).
(vii)
Set
X
(k+1
>
=
X
(k)
Ck+11
Ck} ^
yixi^j.y
i f*.
VR./ TT'
TI
o.
- 10 o Vi e V,
i i
go
to
(ii).
When second order sufficiency conditions hold good
at X* for P2,
there exists
Oi
a a
1
> 0
such
that
for o. > a
1
Vi, the
Hessian matrix
of
both
(j>(X,
6*, S)
and
T(X,
X*, S) at X* is
positive definite
and X* is a
strong local minimum
of
<j.(X,
6*, S) and
T(X,
X*, S)
(B2), (B7),
(F8),
(H2)
. It
should
be
noted
'V
^ 'V/
that
the
local convexity
of
<j>(X,
9*, S) and
T(X,
X*, S)
near
X* is
established
without
any
assumptions about
the
convexity
of
problem
P2. The aim of the
(k) (k)
algorithm
A2 is to
.keep
S
constant
and
adjust
X so as to
cause
X -* X*.
f\j *V *X»
24
Therefore
in
subsequent discussions
it is
assumed
that
a. > a
1
Vi
have
been
chosen
and
held fast
so
that
(j>
and T are
locally convex.
Due to
this reason
the
explicit dependence
of X on S is
dropped
and
X(X,
S) is
represented
as
f\j
^ 'V
X(X).
%
<\,
Haarhoff
and
Buys (HI) proposed
a
numerical algorithm very similar
to
the
Powell-Hestenes method. They were motivated
by the
following observations
about
the
traditional quadratic penalty function approach
to
solve
P2. Let
the
quadratic penalty
function
for P2 be
m
2
P(X,
a) =
f(X)
+ a Z
(c.(X))
, a > 0.
Let
X(a)
be an
unconstrained
minimizer
of
P(X,
a) for a
large value
of
'o r\j
control
parameter
a and X* be a
local minimizer
of P2. The
gradient
of
<v
P(X,
a) is
zero
at
X(a)
but the
gradient
at X* is
Vf(X*).
Therefore,
in the
'Y/
f\, . 'V . i\, O.
usual case when
Vf(X*)
is
nonzero, X(a)
and X*
have
to be
different.
Let X
'V
^ ^ ^ ^
be
a
solution
to the
under-determined system
of
equations C(X)
=0. At X
a. <\j a.
the
gradient
of
P(X,
a) is
Vf(X) which
is
generally,
not
zero. Therefore
X
>\j
Oi "Xi ^J
and
X(a)
are
different
and for any
finite value
of a,
X(o)
is
neither
a
"u
. ^
solution
to P2 nor
satisfies C(X)
= 0.
Usually X(o) tends
to X*
when
a -> °°
a/ 'v/ *v> a.
(L5), (Zl). From these observations Haarhoff
and
Buys added
a
linear
combination
of
constraints
to
P(X,
a) to
obtain
a.
T(X,
X, S) - f (X) - A
T
C(X)
+ ^
(C(X))
T
SC(X),
S = ol
"
where
X e E and o > 0.
This function achieved their objective, i.e.,
a.
balanced
the
gradient
of
f(X)
in the
vicinity
of the
minimum
by a
linear
^
combination
of
gradients
of
constraint functions C(X).
r\, <\j
25
The
augmented penalty function proposed
by
Haarhoff
and
Buys
is .
identical
to the
Powell-Hestenes
augmented penalty function
for P2.
However
.
the
numerical algorithm
of
Haarhoff
and
Buys
has
some distinct features.
They noted
that
the
multiplier updates
[15]
are
valid only
when
the
function
(k)
(k)
T(X,
X , S) is
minimized exactly
for
each
X and
that
it is
better
to
*V *\j 'X*
(k)
terminate
the
inner iterations when
a
better value
of
T(X,
X , S) is
obtained. They suggested that
the
multipliers.in
the
outer iteration
be
obtained
from
the
first order necessary condition.
Vf(X
(k)
)
=
N(X
(k)
)
X, X e E
m
. .
[18]
The
condition
[18]
represents
an
over-determined system
of n
equations
in m
(k)
parameters.
Taking
the
scalar
product
of
[18]
with each VC.(X
), the
<\,
i a,
following
system
of
equations
is
obtained.
N
T
(X
(k)
)
Vf(X
(k)
)
=
N
T
(X
(k)
) N(X
(k)
)
X, X e E
m
.
[19]
^ <\, ^ ^ a. a. a.
The
expression
in
[19]
represents
a
system
of m
equations
in m
parameters
X
that
may
easily
be
solved
for X.
This,
in
effect,
is a
least squares'
'V
^
solution
to
[18].
The
vector
of
multipliers
X is an
estimate
of the
unique
a/
Lagrange multiplier vector
X* at X* and X
tends
to A*.
% >\j -V ^
Haarhoff
and
Buys were more concerned with computational considerations
than
with convergence
or
duality aspects
of the
algorithm.
They
suggested
that
the
problem functions
be
scaled
so
that
the
gradients
are of the
same
magnitude
and o. be on the
order
of
ten.
In
this algorithm
a. i = 1, 2,
...,
m
are
kept constant
and in the
inner iteration
the
variable metric method
of
26
fir)
Davidon-Fletcher-Powell (DFP)
is
used
to
minimize
T(X,
X ., S). The
^ ^
2-1
approximation
to [V T] is
updated using
the DFP
update formula
(Mil).
A
restoration
step
is
included
in the
inner
iteration
and in
this step
T
is
minimized without using derivatives
in a
direction
that
leads
to the
satisfaction
of
linearized constraints. Other numerical aspects
of the
algorithm,
such
as the
various stopping criteria
for
inner
and
outer
2
-1
iterations
and
updating
the
approximation
to
inverse Hessian
[V T] are
discussed
in
reference (HI).
The
elegant idea
of
local convexification
of the
Lagrangian
was
first
introduced
by
Arrow
and
Solow (Al). They suggested addition
of
quadratic
penalty
terms
to the
classical Lagrangian
to
arrive
at a
modified Lagrangian
that
was
locally convex. They were motivated
by
adaptation
of the
differential gradient scheme, developed
by
Arrow,
et al.
(Al)
for
approaching
saddle points
of
convex programs,
to
nonconvex programs. Their
differential
gradient method
is a
small step-size algorithm while those
of
Hestenes, Powell
and
Haarhoff
and
Buys
are
large step-size methods.
In the
above contributions
to the
augmented penalty function technique
duality
concepts were
not
employed. Primal-dual interpretation
of the
technique
was
analyzed
by
Buys (B7), Luenberger (L5), Rockafellar (R12)
and
Bertsekas (B2), (B3).
A
detailed review
of the
duality results
may
be
found
in
reference (F8).
The
duality results
are
summarized briefly
in the
next section.
3.3
Review
of the
Technique
for a
Constrained Problem with Equalities
and
Inequalities
3.3.1 Constrained Problem
The
problem
P3
with equality
and
inequality constraints
is
represented
as
27
P3:
Minimize
f(X),
'X e L £: E"
Subject
to
c.(X)
= 0 i = 1, 2,
...,
k
i
%
c.(X)
> 0 i =
k+1, ...,
m, 0 < k < n.
i <\, - = =
The
real valued functions
f(X)
and
c.(X)
Vi are
defined
on a
nonempty open
a,
x
>
set
L£ E . Let X* be a
local optimal solution
to P3. The
problem functions
(2)
are
of
class
C on L and
specifically
in an
open neighborhood
of X*. The
Lagrangian associated with
P3 is
rp
m
L(X,
A) =
f(X)
- A
C(X),
A e E , X e L £; E .
[20]
The
following conditions
are
assumed
to
hold good
at X*
(Fl), (M13).
(1) X* is a
regular point
of the
feasible region
a.
F
= (X: C. (X) = 0, 1 < i < k and
C.(X)
> 0, k < i < m}
Let
E = {i: 1 < i < k}
I = {i:
C.(X*)
=0, k < i < m}. The X* is a
regular point
i
<\,
^
of
F
when
{VC.(X*)}
i e E £ I is a
linearly independent
set,
i\,
1 <\,
(2)
There exists
an
unique Lagrange multiplier vector
A* e E
such
that
the
Kuhn-Tucker
conditions
are
satisfied
at
(X*,
A*)
C.(X*)
= 0 i e E
C.(X*)
> 0 A.* > 0, A.*
C.(X*)
=0 i £ e
[21]
1 *\y '1 3. 'Xj
VL(X*,
A*) = 0
'X/
f\j r\j
These
are
first-order necessary conditions
for
local optimality
at
X* and (X* A*) e E°
satisfying
[21]
is
termed
a
Kuhn-
'X,
'X. 'X.
Tucker point.
28
(3)
Second-order necessary conditions
for
local optimality
of X*
a.
are
that
in
addition
to
[21]
Y
T
V
2
L(X*.
A*) Y > 0 VY e /*
SrE
n
.
[22]
~
where
V*
= {Y: Y
T
VC.(X*)
= 0, i e E S I* and
'v
'v ^, i <\,
Y
T
VC.(X*)
> 0, i e
I-I*},
-
I is the
index
set of
active inequalities,
I* is the
index
set
of
strongly active inequalities
and I - I* is the
index
set of
weakley active constraints. However
the
following weaker
second-order necessary condition
is
usually assumed instead
of
[22] (Fl),
(Mil).
Y
T
V
2
L(X*,
A*) Y > 0 V Y e. E
n
[23]
%
%
'v a/ - %
V
= {Y: Y
T
VC.
(X*)
= 0, i e E
<~l}.
i\j
i\, r\j 1 'X,
(4)
Strict complementarity holds
at
(X*,
A*)
when
a. 'x.
A.*
j 0 for
each
1 < i < m for
which
C.(X*)
= 0.
[24]
1 = 1 'V,
A
weaker
form
of
[24]
is
A.*
> 0 and
C.(X*)
= 0, i e I.
[25]
i i -x,
(5)
Second-order sufficient conditions
for X* to be an
isolated
>\,
local minimum
are
that
in
addition
to
[21]
and
[23]
Y
T
V
2
(X*,
A*) Y > 0 V
nonzero
Ye/*. [26]
29
However
the
condition
[26]
is
usually replaced
by the
verifiable
condition
(Mil),
.Y
T
V
2
L(X*,
X*) Y > 0 V
nonzero
Y e V.
[27]
3.3.2 Powell
-
Hestenes
-
Rockafellar Penalty Function
The
augmented Lagrangian penalty function
for P3 is
obtained
by
combining
the
Powell-Hestenes penalty function
T and the
Rockafellar penalty
function
T. The
combined function
may be
represented
as
T
DU1
,(X,
X, a) =
f(X)
- E [X,
C.(X)
- ^ °
C,
2
(X)]
+
ieE
--•*,'--•*.
[28]
\ E [a (C (X) -)
2
- X
2
/a ]
2
where
X. X.
(C.(X)
-) = min
[(C.CX)
- —), 0]
i o, a. la. a.
,-m
c
m
a e c,, , X e E .
"~"
In
[28],
the
factor
X./a.
represents
a
penalizing threshold
for the ith
inequality constraint.
The
multipliers
X. Vi are
unconstrained
and
this
is
an
useful property
of the
augmented penalty function
T .
Further
the
r
HK
function
T
possesses
a
number
of
strong properties
not
exhibited
by the
PHR
classical Lagrangian
L(X,
X). The
following properties
of T
D
make
it
r.
f\j r UK
ideal
for use in. a
primal-dual algorithm
for
solving
P3.
Let
M(X)
be the
index
set of the
inequalities
that
contribute
to
the
quadratic penalty term
in T for an
estimate,
X, of the
Lagrange
Jr
UK f\j
30
multiplier
vector
A*.
'V
•-M(X)
= {i: i i E,
C
±
(X)
< A
/oi}. [29]
Equivalently,,
M(A)
= {i: ± \. E, A. - a.
C.(X)
> 0}. .
[30]
•\»
1 1 1
*\/
At
the
local optimum (X*,
A*) of P3,
M(A*)
is the
index set,
I*, of the
strongly active inequalities.
By the
strict
complementarity assumption,
I = I* and
therefore M(A*) represents
the
active inequality constraints
at
the
local
optimum
(X*, A*). Further
the set
EUM(A)
represents
the
*\»
*\t ^
inactive inequality constraints
at the
intermediate approximation
(X, A) to
the
solution (X*, A*).
Let L = E
LTM(A).
Then,
'v <x, a.
L = {i: i i E.
C.(X)
>
A./a
}.
[31]
1^ 11
Equivalently,
L
= {i: i i E, A. - a.
C.(X)
> 0}.
[32]
•vl
1 1
<X/
Using
the
above results
the
augmented penalty function
T may be
FHK.
represented
as
follows.
r_
UD
(X, A, a) =
f(X)
- E (A.
-~a.
C.(X))
C.(X)
FHK
'V-'X.'X.
-X.
j,T7
vr/->\
X
/ll'X,
1
ieE
M(A)
[33]
1
v
-. 2,
- Z A.
/a..
2 .
T
i i
31
The
representation
of T in
[33]
clearly illustrates
that
it is
obtained
rHK
by
combining
the
Powell-Hestenes penalty function
T and the
Rockafellar
penalty
function
T.
Mangasarian
(M2)
associated
a
wide class
of
Lagrangians with
the
nonconvex program
P3. The
unconstrained stationary points
and
local saddle-
points
of
each Lagrangian were shown
to be
related
to the
Kuhn-Tucker
points
or
local
or
global
solutions
of P3
(M2),
The
Lagrangians considered
by
Mangasarian
(M2)
were
twice differentiable globally.
The
augmented penalty
function
T
belongs
to the
general
class
of
Lagrangians investigated
by
rnK
Mangasarian
(M2)
.
However
the
penalty function
T^.,-,
is
twice continuously
rHK
differentiable
in X
except
at
points where
X. - a.
c.(X)
=0, i e
M(X).
ft 1 1 1
*"\j
f\j
By
the
strict complementarity condition,
X. - a.
c.(X*)
^ 0 for i e
H(X*~)
,
1 X 1 'X/ 'V
i.e.,
i e I.
Therfore
T is
twice continuously differentiable
in an
open
rHK
neighborhood
about
(X*, X*).
Oj Oj
Mangasarian
(M2)
established
the
properties
of the
general class
of
Lagrangians
for P3. As T is a
member
of
this class
of
Lagrangians,
PHR
the
following properties hold
good
for
T_.._.
(M2)
.
.These properties
of T
rnK rHK
also were established
by
Rockafellar
(R6).
For a e E ,
(X*,
X*) is a
f
\j * * *\;
f
\j
Kuhn-Tucker point
of P3 if and
only
if it is a
stationary point
of T.
rHK
2
For
large
but
finite
a., V T_, is
positive definite
(M2), (A3)
and
1 rHK
T
(X*,
X, o) < T
(X*,
X*, a) < T (X, X*, a)
rnK f\j r\j r\j
rni\.
r\j *\, ^v»
rntv
r\j f\,
.V
X e E , X e A
where
A is
some open neighborhood
of X*.
Conversely,
if
(X*,
X*) is a
saddle-point satisfying
[34],
then
X* is a
solution
of P3
for X e A.
[34]
32
A
duality theory
in
terms
of
extended
Lagrangians
was
presented
by
Mangasarian
(M2).
The' augmented dual problem
may be
represented
as
D3:
Maximize
g(X,
a).
a.
g(X,
a) = inf T (X, X, a)
f\j
f\j V /
tllx
i\f r\j
[35]
The
augmented dual function
g(X,
a) is
concave
in (X, a) and is
strictly
nondecreasing
in a. If the
point (X*,.
X*)
satisfies
the
optimality conditions
and
if a is
sufficiently large, then
(X*,
X*) is an
isolated local maximum
of
D3.
Conversely,
if
(X*,
X*) is a
global
or
local solution
of D3,
then
the
optimality conditions
for P3 are
satisfied
at
(X*,
X*)
Let
X(X)
=
X(X,
a) be an
unconstrained minimizer
of T (X, X, a) for
^
r
o 'V 'Y/ *\»
Jrrlix
'X; Oj' 'x/
.X
in an
open neighborhood
of X*.
Then
the
dual function
at
this point
may
be
expressed
as
g(X,
a) = T (X
(X),
X, a) = T
(X).
<\,
r\,
rtiK
'x, % r\j a> rnK <\,
Useful
duality results
for
multiplier iterations
may be
summarized
as
follows
(F8).
f
^
-C
±
(X(X))
i e E
-min(C
i
(X
(X)),
XVo^)
i £ E
[36]
9X.
[37]
Let
N be a
matrix
with
Vc. (X) , i e
E1TM(X)
as
columns
and G be the
Hessian
^ i
<\,
%
oE
T
pRR
.
Then
_ .
PHR
T
-1
-N G N
0
0
-s-
1
i
e
EVM(X)
i
e
EVM(X)
[38]
33
Because
X* =
X(X*,
a) for
large
a, the
optimality conditions
and the
a.
a. ^ ^
expressions [37]
and
[38] imply
that
T is
concave
in A for X
close
to
PHR r\, r^
X*
is a
strong unconstrained maximizer
of
?_..,_,.
% rnK
The
above results indicate that
the
problem
P3 may be
solved
by
*
locating
a
saddle-point
of T
„„_,.
The
saddle-point theory
and
local duality
rnK
results suggest
a
primal-dual algorithm
for
solving
P3. The
algorithm
consists
of
inner
and
outer iterations
and is
similar
to the
algorithm
A2.
(kl fkl (kl
In
the
inner
iteration,
k, for
fixed
X^ ' and o
v
,
T_,
UD
(X, X
v
, <T ') is
% r\j rHR r\j >\, %
minimized with respect
to X
starting
the
unconstrained minimization from
<\,
X . The
initial starting point
X
need
not be
feasible
and may be
'V 'V
(k) (k) (k)
chosen arbitrarily.
Let X = X(X , a ) be the
unconstrained minimizer
'Vi
o> 'V "\J
(k)
(k) (k)
of
T (X, X , a ). In the
outer iteration
a is
increased
so as to
'
'
force
(X
v
, X
v y
)
into
a
region about
(X*,
X*) and X^
J
is
adjusted
so
o>
% ^ f\, %
fk">
^kl
as
to
ensure
X
v
' -> X* and X
v
' -+ X*.
"Vi
'V/ *\/ <\,
The
duality relationships
[37]
and
[38]
suggest gradient
and
Newton
steps
for
adjusting
X in the
outer iteration
so as to
maximize
the
dual
.0.
function;
Mangasarian
(M2)
analyzed
the
method
of
multipliers with
a
gradient step
for
adjusting
X in the
outer iteration.
=
x
(k)
+
gvx
M
[39]
'U
r
VA/
rnK <\j
He
established
the
linear convergence
of
this algorithm with exact
minimizations
in the
inner iteration
and a
large
but
finite
a. He
also
%
investigated
the
relation between
3 and the
speed
of
convergence
of the .
method.
The
convergence
and
duality
analyses
presented
by
Rockafellar
(R6)
also
are
valid
for the
primal-dual algorithm
for
P3..
Rockafellar
(R6)
established
the
convergence
of the
algorithm with inexact minimizations
in
the
inner iteration. Pierre
and
Lowe (P2) comprehensively reviewed
the
technique
for P3 and
presented
a
numerical algorithm,
test
problems
and
computational
results.
In
this implementation
of the
ALAG penalty function
technique
in a
numerical algorithm,
a
simple gradient
step
for
adjusting
X
was
used
in the
outer iteration.
x
(k)
+
svx
(X). [40]
f\,
*\i
rnK.
^
The
penalty parameters
a.
were monotonically increased
in the
outer iteration.
The
linear constraints were also' included
in the
penalty term.
A
constraint
with upper
and
lower bounds
was
treated
as two
separate constraints. This
approach
introduces
two
dual variables
for
such
a
constraint.
Fletcher
(F8) suggested second-order
X
iteration updates.
He
also
^
devised
a
Newton-like iteration
for
updating
X
using estimates
of G in
[38].
a.
In
the
numerical experiments, Fletcher (F8) used
a
quasi-Newton
method
for
unconstrained minimization
of
T_
11T)
and
built-up estimate
of G. The
change
JrHK
in
G was
accounted
for
when
a was
changed.
The
computational results
•Xi
presented
by
Fletcher (F8) indicate
that
the
Newton-like algorithm
for
updating
X is
more efficient that
the
gradient step
for
adjusting
X. In
^ ^
these
numerical experiments
the
penalty constants
a.
were
also
adjusted
(F8). Fletcher (F8) showed that this scheme
for
adjusting
a.
never fails
to
induce convergence
of the
algorithm
and
avoids increasing
o. by an
arbitrary
factor
of 10.
Buys
and
Gonin (B9) performed sensitivity analysis with
the aid of
the
ALAG penalty function
T
D
.
Similar sensitivity results were developed
PHR
35
by
Armacost
and
Fiacco
(A3)
using augmented Lagrangian function
T
p
HR
-
In
these
analyses
the
following parametric mathematical programming problem
was
considered.
P3(a):
Minimize
f(X,
a), X e E
n
, a e E
v
[41]
Subject
to
c.(X,
a) = 0, i = 1, 2,
...,
k
^-
'X/ Oj
c.(X,
a) > 0, i =
k+1, k+2, ...,
m
1 f\/ f\j
0 < k < n
In
[41]
a is the
vector
of
sensitivity parameters.
In
these analyses,
the
problem
functions were assumed
to be
twice continuously differentiable
in
(X, a) in a
neighborhood
of
(X*,
a*) and for
some
a*, the
conditions
in
3.4.1
were assumed
to
hold
at
(X*,
a*,
X*).
The X* is the
vector
of
Lagrange multipliers associated with
a
solution
X* to P3
(a*).
36
IV.
NUMERICAL RESULTS
4.1. Introduction
Numerical experiments have been conducted
to
identify
the
most efficient
ALAG Penalty Function Technique
for
computer implementation. These numerical
exercises, include testing individual unconstrained optimizers
and
constrained
optimizers
utilizing
a
wide range
of
inequality
and
equality constrained
nonlinear
optimization problems. Phase
one of
these numerical experiments
involved testing
a
number
of
popular unconstrained optimization algorithms.
The
most effective
of
these algorithms were then incorporated into ALAG
Penalty Function routines
for the
solution
of
constrained optimization
problems.
4.2
Unconstrained Optimizing Algorithms
Two
different classes
of
algorithms
for
solving
the
unconstrained
optimization
problems have been
tested
on
several sample problems.
The
first
class
of
algorithms tested were those that
do not
require derivative
functions. These algorithms
make
use of
finite difference approximations
for
derivatives
or
work
solely
with
the
given problem function
in
seeking
an
optimum.
The
second
class
of
unconstrained optimizers require explicit
first
derivative functions.
The
unconstrained optimization techniques
are
identified
in the
following
table
and
discussed
in
(L5).
These
algorithms performance
on a
number
of
sample problems
is
described
in
Table
II.
Based
on the
results presented
in
Table
II and
computer
programming considerations algorithms
4, 5 and 7
were incorporated
into
computerized ALAG Penalty Function routines
and
tested
with
a
number
37
TABLE
I
UNCONSTRAINED OPTIMIZERS TESTED
Derivative Free Optimizers
1..
Hooke-Jeeves Pattern Search Algorithm
2.
Powell's
Algorithm
3.
Stewarts Adaptation
of the
Davidon-Fletcher-Powell
Algorithm
4.
Fletcher's Finite Difference Technique
for a
Complimentary Davidon-Fletcher-Powell Algorithm
First
Derivatives Required
5.
Complimentary Davidon-Fletcher-Powell Algorithm
6.
Davidon's Variance Algorithm
7.
Complimentary Davidon-Fletcher-Powell Algorithm
(with
no
line searches)
38
w
en
to
H
PM
O
§
O
H
O
O
M
H
w
o
H
H
O
<u
N
H
e
H
4J
P
O
13
<U
C
-H
4-1
05
ti
O
0
.5
^
vD
m
•^-
CO
CM
rH
0
5
o
P
CO
VD
rH
m
rH
*
m
o
rH
rH
CM
vo
rH
rH
rH
CM
x
\
rH
|
CM
X
•rH
X
CM
O
CM
in
o
-f.
CM
x
^
rH
X
rH
^_x
+
CM
CM
x-v
/"""\
r~H
CM
X
rH
+
X CM
1 CM
CM
X
X
v
-^
^_x OO
0 rH
O 00
rH
rH
."
+
x
to
rH
CO
rH
CO
rH
0
rH
00
^-
^
CO
CM
«^f
'
rH
rH
CM
x-^
CO
oo
rH
vO
CO
1
CM
X
rH
X
vO
rH
o
O
O
_i_
CM
CM
X
CM
rH
rH
0
o
II
X-N
to
CM
CO
rH
00
ON
OO
CO
^
CO
CM
O
rH
CM
rH
X
1
vO
vO
rH
Oi
^>
^^
^
CO
00
*^
in
rH
+ CM
CM
VD
x-v
O
rH
rH
X
0
rH
rH
CM
+ P>l
X
CM
+
s-^ CM
CM
rH
rH
X,
S^
^>
X
1 CM
CM
a\
X
0 rH
O 00
rH
t^
II
+
X
to
CO
CO
K
00
^c
>^-
^
CM
O
CO
0
CO
rH
CM
,*^
^f-
CM
rH
O
rH
1
CM
CM
X
CM
rH
&
ON
O
o
-*
_l_
CM
CO
X
_i.
CM
CM
X
_!_
CM
rH
X
n
P
PM
.
•vl"
CM
r- oo
oo
co in
OO
ON vO
CO
rH CM
-3-
00
CM
* m
CO
ON
m * VD
rH rH
00 00 rH
CO CM ON
rH rH
CO
K
* <r
rH
*^ O^ ' Cvl
m co in
CM
CM <}
m
CM
-f-
CM
x^
1
CM
rH
X
1
*~-s
ON
m
o
0
o
0
.1.
CM
CO
I-l
1
rH
x^
CM
-f X-N CM
CO
X-N
CO
X r-t
^ CM X
0 + 1
CM
CM H
1
X ^
CM CM
X
+ +
N
X 1
tJ
^.^
+
^ CO
rH
co
+ X
O CO CM
rH
X X
1
CM ^
rH
X O
X
rH 0
^ X rH
II II II
S^
!xl b^
^ ^ f"
.
in VD r^»
CM
CM
00
rH
\0
rH
CO
m
X
o*\
s^
1
\,Q
vO
CNJ
CN
x
^
^-X
ro
CNJ
X
1
I
rn
X
m
CM
VD
CM
x
-|-
.CM
/-^
^-^
CM
CM
X
1
i-H
N*-'
rH
X
1
m
CM
CM
-}-
CM
x-^
X-N
CM
X
1
rH
rH
X
|
m
rH
^
II
X
to
.
00
m
m
VD
CM
1
vO
CM
CM
CO
K
0
ON
rH
rH
ON
"""*
"^.^
st
X
1
1-1
X
^~s
o
rH
-f-
•*3"
X-N
CO
'
X
CM
1
CM
X
*~s
-j-
CN
s—\
•*$
X
1
CO
X
in
-}-
CM
CM
X
0
rH
+
rH
X
V
-
X
II
><
(^
f
ON
<J-
ON
^
CM
m
O
rH
ON
ON
ON
ON
ON
O
00
CM
-cr
.
CO
CM
CM
x-^
rH
1
^f
X
00
rH
X
~|-
•»^
/-N
/—
\
*<1"
X
1
CO
X
**
'
c
n]
4-1
v
^
+
vO
x~\
CO
X
1
CM
X
**-S
o
0
rH
•|?
/
s
CM
X
1
rH
X
(!)
^-^
II
X*
PM
0
rH
H
O
p.
I
H
U
0.
O
(1)
00
Vi
0)
c
o
a
C!
T)
39
of
inequality
and
equality constrained nonlinear optimization problems.
4.3
ALAG Constrained Optimizing Algorithms
The
selected ALAG routines were
tested
on
many
of the
example constrained
.
problems
presented
in
(B6). Table
III
summarizes
the
computational results
achieved
for
these example problems where
the
algorithms
tested were
1.
ALAG algorithm with unconstrained optimizer
5
(see Table
I).
2.
ALAG algorithm with unconstrained optimizer
7
(see Table
I).
3.
ALAG algorithm with unconstrained optimizer
4
(see
Table
I).
40
TABLE
III
COMPUTATIONAL RESULTS
FOR
NONLINEAR CONSTRAINED PROBLEMS
Problem (See Reference (B6))
12-1
12-3
12-5
12-8
12-10
12-14
12-15
12-17
12-18
12-23
-12-25
Number
of
Function
and
Gradient
Evaluations
Number
of
Unconstrained
Problems
Algorithm
1
44
19
42
57
42
27
21
80
147
32
172
2
33
24
62
49
31
74
30
120
193
45
174
3
167
96
166
99
72
121
118
*
*
122
326
1
3
2
3
3
2
2
2
5
7
3
7
2
3
3
3
4
,2
3
3
9
9
3
6
3
5
4
5
5
4
5
4
*
*
5
9
*Did
not
converge
to
correct
solution.
APPENDIX
A
MATHEMATICAL CONCEPTS
AND
PENALTY
FUNCTION TECHNIQUES
1.
Introduction
Symbols, mathematical terms
and
related concepts
are
defined
and
briefly
reviewed
in
this
section.
The
topics that
are
directly connected with this
work
are
alone
considered.
The
terms
and
definitions
are
those commonly used
in
standard books
on
Nonlinear Programming (H4), (H5), (L4), (L5), (Ml).
A
detailed
information about
the
following concepts
may be
found
in the
above
references.
2.
Euclidean
n-Dimensional
Space
In
this
work
real-valued
functions
on a set L in an
Euclidean space
E
are
considered.
By an
Euclidean space
E is
meant
a
linear space whose points
T
are
representable
by
n-tuples
X =
(x
1
,
x_,
...,
x ) . The
nonnegative
f\.
-L £ it
n
n
n
orthant
of E is
denoted
as E
+
and the
positive orthant
of E. is
denoted
n
as
£-)-.[_.
A
point
is
represented
as a
column vector
UAJjig
capital letters
with underscore
X, Y,
...,
or
lower case letters with underscore
a, b,
...,
t\i
r\j '\/ *\/
or
Greek letters with underscore
a, X,
....
The
components
of a
vector
are
real numbers represented
by
lower
case letters with subscript.
The set of
real numbers
is
denoted
as E. The
real numbers
in E are
represented
by
lower
case letters
a, b,
...,
and
Greek letters
a, 3,
..., without subscript
or
with subscript
a , a ,
...,
a , a ,
.... Superscript
in
parentheses
-Lt>
used
to
represent
an
element
of a
sequence
of
vectors
or
real numbers.
Subscript
-U>
also used
to
distinguish different vectors
X , X ,
....
A
linear space
E
n
is a set of
elements
X, Y,
..., called vectors,
*\»
'V
for
which
the
operations
of
addition
of
vectors
and
multiplication
of
vectors
by
scalars
a, b, ... are
defined
and the
Euclidean norm
of a
vector
is
defined
as
II
v
II
_ /, 2 2 .
21/2
Linearity
implies
that
if a c E, b e E, x e E and y e E ,
then
ax + by e E
'V
% 'X, <\,
A
subspace
L of E is a
subset
of E
such that
L is a
linear space with
the
n
same operations
as
those defined
in E and
with
the
same scalar field.
A
,-n
subspace
L of E is
also called
a
linear manifold.
3.
Sets
The
set F of
elements
X in E
satisfying
a
property P(X)
is
represented
as
F
= {X:
P(X)}.
a.
A
member
Y of the set F is
denoted
as y e F and if Y is not a
member
of F,
<x/
<v 'v
then
y ^ F. The
union
of two
sets
A and B in E is the set of
elements
that
a.
belong
to
either
A or B.
A
S-B = {X: X e A or X e B}.
a.
a. ^
The
intersection
of two
sets
A and B is the set of
elements that belong
to
both
A and 8.
AS=B
= {X: X e A and X e 8}.
'V;
^ r\j
If
every element
of A is
also
a
member
of 8,
then
A is a
proper subset
of
B,
i.e., A£:8.
If. A B,
then
A may be a
proper subset
of 8 or may be 8
itself.
The
complement
of a set A is
denoted
as ^ and it
consists
of
elements
not in A. If a e E and b e E,
etc.,
[a, b]
denotes
the set of
real numbers
a < x < b. . If x e (a, b]
then
a < x < b.
A
real-valued function
f (X)
defined
on a
subset
F of E is
represented
a/
as
f(X):
E
->
E. The
minimization
of
f(X) over
the set F is
represented
as
Minimize f(X)
X
e F ^
If
F is the
space
E ,
then
the
minimization
is
unconstrained. Otherwise
the
minimization
is
constrained.
4.
Linearly Independent
Set of
Vectors
A set of m
vectors
X.. , X X is
said
to be a set of
linearly
f
\j^-
'V^
'xJfi
independent vectors
if a
relation.of
the
form
a,X..
+
a
0
X
0
+ . .. + a X =0
1^1 2^2 rn^m
holds only
when
the
scalars
a.. , a ,
...»
a are all
zero.
The
vectors
are
linearly dependent
if
they
are not
linearly independent.
A set of n
linearly
independent
vectors
is a
basis
for E . The
dimension
of a
space
is the
number
of
vectors
in a
basis
for
that space.
Let a set of m
linearly
independent vectors
in E
n
define
a
subspace
B of E . The set of all
vectors
in E
which
are
orthogonal
to B is a
subspace called
the
orthogonal
complement
of B and is
denoted
by B . Any
vector
X e E may be
uniquely
"\>
represented
as X = Y + Z
where
Y e B and Z e B .
^b 'V *\t 'V* 'V
5.
Characterization
of
Neighborhood
of a
Point,
Sets
and
Sequences
5.1
Neighborhood
of a
Point
The
e-neighborhood
of a
point
1
X* in E is the set of
points
X
lying
in
the
open sphere
or
ball
of
radius
e > 0 and X*. The
e-neighborhood
of X* =
Oj 'V
{X: ||
x-X* ||«<
e}. In
general
it is not
necessary
to
restrict
a
neighborhood
of
a
point
to be an
e-neighborhood. Therefore
a
neighborhood
of a
point
X*
is
defined
as any
open
set
containing
X*.
5.2
Nature
of a
Point
X
With Respect
to a Set F in £
n
A
point
X is an
interior point
of F. if F
contains
an
e-neighborhood
of
X . A
point
X is an
accumulation point
or a
limit point
of F if
every
e-neighborhood
of X
contains
a
point
X ^ X
belonging
to F. A
limit point
of
F
need
not be in F. A
point
X is an
isolated point
of F if X is in F
but
is not a
limit point
of F. A
point
X is a
boundary point
of F if
every
e-neighborhood
of X
contains points
in F and
points
not in F. A
point
X
is an
exterior point
of F if it is
interior
to the
complement
of F.
5.3
Characterization
of a Set in
Terms
of the
Points
in it
A set F. in E is
open
if all of its
points
are
interior points.
Equivalently,
F is
open
if
given
X e F and 3 and e > 0 3 || Y-X || < e
/
\; f\j f\j
implies
Y e F. It is
closed
if it
contains
its
limit points. Equivalently,
"u
F
is
closed
if X e F and X
->
X
implies
X e F. The
closure
of any set F
in
E
n
is the
smallest closed
set
containing
F. The
boundary
of a set is
that
part
of the
closure
that
is not in the
interior.
A set F is
bounded
if
there exists
a
positive number
r
such that
|| X || < r for
every
X e F.
A
closed
and
bounded
set is
said
to be
compact.
A
neighborhood
of a set
F
is an
open
set V
containing
F. By an
e-neighborhood
of F is
meant
a set
of
points each
of
which
lies
in an
e-neighborhood
of
some point
X in F.
'V/
The
e-neighborhood
of F is the
union
of the
e-neighborhoods
of its
points.
If
A CL, E is a
bounded
set of
real numbers, then
the
smallest real
number
y
such that
x < y Vx e A is
called
the
least
upper bound
or
supremum
of
A and is
denoted
as
y
=
sup(x)
or y =
sup{x:
x e A},
x e A
Similarly,
the
greatest lower bound
or
infimum
y of a set A is
denoted
as
y.
=
inf(x)
or y =
inf{x:
x e A},
x e A
5.4
Characterization
of a
Sequence
/,\
°°
A
sequence
of
vectors
is
represented
as {X }, ' or as {X }
when
the
r
^ k=0 ^
(k)
index
set is
implicitly understood.
The
sequence
{X } is
said
to
converge
^
to
the
limit
X* if || X - X* ||
-*
0 as k
«.
Equivalently
, X* is the
limit
'V f\, Oi *\»
(k)
point
of the
sequence
{X } if for
every
e > 0
there
is an
integer
p
such
.
•X"
(k)
that
X is in the
e-neighborhood
of X*
whenever
k > p .
Each
of the
symbols
a.
^
>
x*",
"lira
X
(k)
=.X*"
and
lira
X
(k)
= X*.
(k) (k)
signifies
that
X* is the
limit
of the
sequence
{X
v
'}. If X
v
•*-
X* and
"V <\» »Vi Oi .
{Y
(k)
}
is a
subsequence
of
{X
(k)
},
then
Y
(k)
X*. A
sequence
{X
(k)
}
is a
•\,
i/ ^ 'v 'v/
Cauchy sequence
if
lim
X
-X =0.
A
sequence
{X '} in E
n
converges
if and
only
if it is a
Cauchy
sequence.
A
sequence
(X } is
bounded
if
there
is a
finite positive number
r
such
a.
that
|j X II < r for
every integer
k. A
point
X* is an
accumulation
point
f\j
^
(k)
or
.a
cluster point
of a
sequence
{X } if it is the
limit
of a
subsequence
of{X
(k)
>.
.
•x.
A set F in E is
closed
if and
only
if the
limits
of
convergent sequences
in
F are in F.
Every bounded sequence
{X^ } of
points
in E
n
possesses
a
a.
(k)
convergent subsequence.
Let {r } be a
bounded sequence
of
real numbers
and
I/ =
supfr
: i > k}. .
Then
{I/ }
converges
to a
real number
q*
called
the
limit superior
of {r } and I/* = lim
k-*»
5.4.1 Order
of
Convergence
of a
Sequence
(k)
Let
{r } be a
sequence
of.
real numbers converging
to r*. The
order
(k)
of
convergence
of {r } is
defined
as the
supremum
of
nonnegative numbers
p
satisfying
-
*
0 < lim
-
r*|p
This
definition
of the
order
of
convergence
is a
step-wise concept
as it
defines bounds
on the
progress made
in
moving from
kth
term
to
(k+l)th
term.
The
order
of
convergence
is
determined only
by the
properties
of the
sequence
when
k
->
°°. It is a
measure
of the
speed
of
convergence
of the
"tail"
of
(k)
the
sequence
{r }.' A
large value
of p
implies
a
high speed
of
convergence.
If the
sequence
has pth
order
of
convergence
and if
0
- lim ^ -
r
*'
then
asymptotically
_
r
*|
=
3|r
(k)
.-r*|
p
.
When
p= 2 the
sequence
has
second order convergence.
(k)
If
the
sequence
{r } has an
order
of
convergence equal
to
unity,
then.
it
is
said
to
converge
linearly
to r*. The
sequence converges
to r*
linearly
with convergence ratio
3 if
(k+1)
_ -
lim -L- ^-L = 3 < 1.
_
*
A
linearly
convergent sequence with convergence ratio
3 is
said
to
have
a
(k)
tail
that
converges
at
least
as
fast
as the
geometric sequence
{d3 } for
some constant
d.
Therefore
linear
convergence
is
sometimes referred
to as
geometric
convergence.
The
smaller
the
convergence ratio,
the
faster
is the
rate
of
convergence. When
p = 1 and
3=0,
the
rate
of
convergence
is
said
to
be
superlinear.
The
convergence
of any
order greater than unity
is
also
superlinear.
The
average convergence rates
may be
used
to
place bounds
on the
average
progress
per
step over
a
large number
of
steps. However
in
comparing
convergence
of
different sequences,
the
step-wise convergence rates
are
usually used. When
the
'sequences
are
well
behaved
and the
limits involved
in
the
definition
of
convergence rates exist
the
step-wise
and
average
convergence rates coincide. Additional information
on the
convergence
of
sequences
may be
found
in
(L5).
(k)
The
convergence properties
of a
sequence
of
vectors
(X } are
defined
with respect
to a
function
that
converts
the
sequence
of
vectors into
a
sequence
of
numbers.
If
f(X):
E
-*
E is
defined
on E , the
convergence
r\,
(k)
of
.{X } to X* can be
defined
in
terms
of the
convergence
of
f(X)
to
f(X*).'
a, >\, O. -\/
(k)
The
function
f(X)
used
in
this
way to
measure
the
convergence
of {X } is
^
^
called
the
error function.
In
optimization theory,
the
objective function
f(X)
or the
function
"u
j
||
X - X* || is
chosen
as the
error function
to
analyze
the
convergence
of
r\j
^ .
(k)
the
sequence
of
intermediate solutions
(X } to X*. The
order
of
convergence
^
^
of
a
sequence
is
insensitive
to the
particular error function used
and
hence
the
particular error function used
to
measure convergence
is not
really
very
important
(L5).
The
order
of
convergence
of a
sequence
is a
local convergence
property
and
is a
measure
of the
ultimate speed
of
convergence.
It is
generally used
to
determine
the
relative advantage
of one
algorithm
to
another.
The
global
convergence property
is
concerned with whether starting
at an
arbitrary
point
the
sequence generated
will
converge
to a
limit point
or a
solution.
6.
Matrix Notation
A
matrix with
m
rows
and n
.columns
is
denoted
as an mxn
matrix.
A
diagonal matrix with
n
rows
is
denoted
as a
diagonal matrix
of
order
n.
A
diagonal matrix with unity
as
diagonal elements
is
denoted
as the
identity
matrix
I. The
double subscript notation
is
used
to
represent
the
elements
of
a
matrix.
A
matrix
H
with
elements
h.. is
represented
as H =
{h..}.
T
The
transpose
of a
matrix
B is
written
as B . A
square matrix
is
said
to
be
nonsingular
if its
determinant
is not
zero.
The
inverse
of a
nonsingular
square
matrix
G is
denoted
as G . A
matrix
N
whose columns
are X, , X ,
...,
X
'
is
represented
as N = [X , X X ]. A
vector
X e E
n
is a
matrix with
n
rows
and one
column.
A row
vector
is
represented
as the
transpose
of a
column vector.
The
determinant
of a
matrix
H is
denoted
as
|H|.
7.
Eigenvalues
and
Quadratic Forms
Let
H be a
square matrix
of
order
n. A
scalar
X and a
nonzero vector
X e E
satisfying
HX = XX are
said
to be an
eigenvalue
and an
eigenvector
respectively
of H. The
number
X is the
eigenvalue
of H
corresponding
to
the
eigenvector
X. All the
eigenvalues
of H are
obtained
by
solving
the
characteristic polynomial
of
degree
n in X, |H - IX| =0.
T
If
the
square matrix
H of
order
n is
symmetric, i.e.,
H = H ,
then
(i) The
eigenvalues
of H are
real.
(ii)
Let X and X be
distinct eigenvalues
of H and X and X be the
T
corresponding eigenvectors, then
X X = 0.
The
matrix
H is
positive definite when
T
(a) The
quadratic form
X H X is
positive definite, i.e.,
X
T
H X > 0 V
nonzero
X e E
n
.
(b) All its
eigenvalues
are
positive,-
i.e.,
X. > 0 Vi.
(c) The
determinants
of the
leading principal minors
of H are
positive.
The
leading principal minors
of H are
represented
as
H.
=
{h..} (i,j
= 1, 2,
....
p).
P
1
3
The
matrix
H is
positive semidefinite
when
T
(a) The
quadratic form
X H X is
positive semidefinite, i.e.,
X
T
H X > 0 V
nonzero
X e E
n
and
T n
X
H X = 0 for
some
nonzero
X e E .
10
(b) The
eigenvalues
X. > 0 Vi and X. = 0 for at
least
one but
not
all i.
The
leading principal minor
test
cannot
be
used
to
determine semidef initeness
of
the
matrix.
H.
When some
of the
determinants
of the
leading principal minors
are
zero, the. test
will
not
provide information about
the def
initeness
of H.
A
matrix
H .is
indefinite
when
(a).
The
quadratic form
X
T
H X is
indefinite, i.e.,
X
T
H X < 0 for
'V
'X/ ^ %
some nonzero
X e E and X H X > 0 for
other nonzero
X e E .
^ Oi \> ^
(b) The
eigenvalues
X . < 0 for
some
i and X . > 0 for
some
j .
J ,
(c).
Let |H. | , i = 1, 2,
...,nbe
determinants
of the
leading principal
minors
of H. The
matrix
H is
indefinite
if
|H.|
^ 0 Vi and
IH
|/|H
I < 0 for
some
i and IH
I/JH
I > 0 for
some
j.
i'
i-l j ' j-1
8.
Norm
and
Condition Number
of a
Matrix
The
norm
of a
square matrix
H of
order
n,
subordinate
to the
vector
i,
I|H
S
|!
norm
X , is
defined
as H = max u
n
. The
norm
H
relative
II
x H # o
HJ.II
"Xi
to
the
Euclidean
norm
|| X || is
T
T
||H||
= max
[i_!_-V/
2
,
X
£
E
n
.
X
* 0 X
T
X ^
Therefore
the
norm ||.H
||
relative
to the
Euclidean norm
of a
vector
X in E
a.
T
is
the
square root
of the
largest eigenvalue
of H H. If H is a
symmetric
matrix, then
|| H || is the
largest eigenvalue
of H and || H || is the
reciprocal
of the
smallest eigenvalue
of H. Let X and X be the
largest
J6
-O
and
smallest
eigenvalues
of H.
Then
the
condition number
r of the
matrix
11
H is
defined
as r = X /X .
.The matrix
H is
said
to be
well-conditioned
if
i(j
-O
the
value
of r is
close
to 1. If the
value
of r is
very large,
the
matrix
H is
said
to be
ill-conditioned.
The
ill-conditioning
of H
increases
as
the
value
of r
increases
.
9.
Functions
A
real valued function
f (X)
defined
on a
subset
L of E is
represented
^
as
f(X):
E -> E. A
function f(X)
is
said
to be
continuous
on a set L if it
<\t
TJ
is
continuous
at
each point
X in L. It is
continuous
at a
point
X in L if
%o ^o
f(X)
> f(X )
whenever
X e L and X '
->
X .
Equivalently
,
f(X)
is
continuous
at
r\j
<\j°
^ *\i
i\fl
^
X if
given
e > 0
there
is a 6 > 0
such that
II
X - X
II
< 6
implies
0,0 . ' -\,
-\,o
"
|f(X)
- f (X ) | < e. A set of
real-valued functions
c.(X),
i = 1, 2,
....
m
^ o>° i \,
may be
regarded
as a
single vector function C(X)
: E
•->
E .
Such
a
vector
o>
a/
function
is
said
to be
continuous
on an
open
set L C. E if
each
of its
component functions
is
continuous
on L.
(k) (k)
A
real valued function f(X)
is
said
to be of
class
C or f e C on
a.
an
open
set L Sr E if it is
continuous
and
possesses continuous partial
(k+1)
(k)
derivatives
of all
orders
< k. If f e C on L, it is of
class
C on
L.
The
gradient
of f e c' ' at X* is the
column vector
'Y,
no.
C2")
If
f e C ,
the-
Hessian
of f at X* is the
square symmetric matrix
of
order
<\i
n
denoted
as
V
2
f(X*)
or
F(X*)
12
If
the
vector-valued function
C(X):
£
n
->-
E
m
is of
class
C^ , its
gradient
f
\l 'X/ r» y~i
OL,
.
at
X* is the mxn
matrix,
VC(X)
=
(7-^>
v
*
Vi, j = 1, 2,
....
n,
called
t\j
f\j
*~Vj
o X, A.
Jacobian
of C at X*. If a
vector
X e E
m
and if the
real-valued function
X
C: E -> E is of
class
C^ ', the
gradient
of X C at any
point
X is
V[X
T
C]
=
[VC]
T
X.
T m
The
Hessian
of X C at any
point
X is
equal
to Z X. V C
(X).
i=l ""
The set of
points satisfying
the
equation
f(X)
= c,
where
c e E and
a.
f:
E •* E,
forms
a
level surface
of f. If f is of the
form
f(X)
=
n . ^
£
a x + b, a. not all
zero,
then
the
level surfaces
of f are
(n-1)
dimensional hyperplanes
and Vf is the
normal
to the
hyperplanes.
In
general,
if
f e C^ ' and Vf ± 0 at X in L,
then
Vf(X
) is the
.normal
at X to
.the
level surface
f(X)
= f(X ). If f e C^ ', d is a
direction vector
in E
n
*\j
Oj^ ^
and
F is the
Hessian
of f,
then
the
directional derivative
of f at a
point
X in the d
r\,
is d'
T
F d.
T
X in the
direction
d is d Vf and the
second derivative
of f in
that
direction
Let f e" C be
defined
on an
open
set L 5z E and X e L. In an
open
neighborhood
of X , f may be
represented using
the
following
Taylor series
f(X)
= f(X ) +
(X-X
)
T
Vf(X
) + j
(X-X
)
T
V
2
f(X,)(X-X
)
Oj *\^
f
\j
f
\j^'
'X; *\t ^\J
f
\J *\J *\t
+ r(X X-X )
•x,-
1
-
>\,
f\
J
-
L
where
r(X ,
X-X
1
)
is the
remainder term.
The
remainder term satisfies
the
relation
(H4)
r(X,,
AX) .
lim
^ ^- = 0
where
AX = X -. X .
AX-+0
|| AX || * ^ ^
13
Therefore
the
quadratic approximation
to
f(X)
about
X_ is the
Taylor series
f(X)
=.f(X
) +
AX
T
Vf(X.)
+ i
AX
T
V
2
f(X.)
AX,
AX
= X - X, .
r\j
r\jl.
t\,
r\,- r\jL
f. r\,
/^J.
<X, 'Vi 'Xi
"Xi-l
10.
Implicit*Function Theorem
The
implicit function theorem
is
concerned with
the
conditions under
which
a set of
equations g.(X,
A) = 0
1=1,
2,
...,
n, X e E
n
, A e E
m
,
g.: E
E Vi can be
solved
for X as a
function
of A,
i.e.,
as
X(A).
Let
1 *\j <\j <\, 'X*
g.
Vi be
continuous
and
have continuous first
and
second order partial
derivatives with respect
to X on an
open
set 8
E
n+m
.
Let g:
E
n+m
-> E
n
be
a
vector-valued function with
g. as
elements.
Let Vg be the nxn
Jacobian
matrix
of g
with respect
to X.
Suppose
that
g.(X,
A) =.0 i = 1, 2,
...,
n and
|Vg|
^ 0 at a
point
(X*,
A*) in 8.
Then there exists
a
continuous function X(A)
on a
neighborhood
'X*
*\j 'X' *V
A£E.E
m
of A* and a
constant
e > 0
such, that X(A*)
= X*,
g.(X(A),
A) = 0 Vi,
'X'
i\j ^ r\j 1- f\, <\j ^
A
e A.
Further g.(X,
A) =0, || X -
X(A)
|| < e, A e A
only when
X =
X(A)
.
^'X.'X.'X.
If
the
functions g.(X,
A) Vi are of
class
C on 8,
then
the
function X(A)
(2)
is
also
of
class
C on A.
11.
Local
and
Global Minima
of a
Function
on a Set
Let
f(X):
E
-*-
E be
defined
on an
open bounded
set. E . A
point
a.
X* e L is
said
to be a
relative minimum point
or a
local minimum point
of
a,
f
over
L if
there
is an e > 0
such
that
f(X*)
<
f(X)
V X e L and || X - X* || < e,
The
point
X* is
said
to be a
strict local minimum point
or
strict relative
i>
minimum point
or an
isolated local minimum point
of f
over
if
there exists
an e > 0
such
that
f(X*)
<
f(X)
V X e L, X i X* and || X - X* |j < e.
14
A
point
X* e L is
said
to be a
global minimum point
of f
over
L if
f(X*)
<
f(X)
V X e L. The
point
X* is
said
to be a
strict global minimum
f\j *\j Vi
point
of f
over
L if
f(X*)
<
f(X)
V X e L, X t X*.
A
point
X* is a
local
(global)
maximum point
of
f(X) over
L if it is a
local
(global)
minimum point
of
-f.(X)
. . A
point that maximizes
or
minimizes
f oh L
is
called
an
extreme point
of f on L.
12.
Infimum
and
Supremum
of a
Function
on a Set
Let
f(X):
E
->
E be
defined
on an
open bounded
set L S^E . The
infimum
a.
of
f on L is the
greatest lower bound
of f on S. It is the
largest number,
~oo
<
a
< oo
}
suc
h
that
f (X) > a
holds
for all X e L. It is
denoted
as
"inf f(X)"
or
"inf f(X)
on L" or
"inf
f(X)".
Equivalently,
V
r~
I
f
\J '
f
\J *\J
AG L.
a = inf
{f(X):
X e L} if
(i) a <
f(X)
V X e L
(ii) there
is a
sequence
{X } e L
such that
lim
f(X
(k)
)
= a
A
point
X* in L
minimizes f(X)
on L if and
only
if
f(X*)
= inf
f(X). When
a
minimizing
point
X* e L
exists, f(X*)
is the
infimum
as
well
as the
minimum
of
f(X)
on L. If
f(X):
E
->
E is a
continuous function defined
'V 'V
On
a
compact
set
F^.E
,
then there exists
a
point
X*
such that f(X*)
=
inf
f(X)
on L.
15
The
supremum
of f
over
L is the
least upper bound
of f on L. It is the
smallest real number,
° < 3 < °°,
such
that
f(X)
< 3 ¥ X e L. It is
denoted
as
"sup f(X)"
or
"sup f(X)"
or
"sup f(X)
on L".
Equivalently,
XeL . -v.
*
sup{f(X):
X e L} =
inf{-f(X):
XeL}.
a.
A
point
X*
maximizes
f(X)
on L if and
only
if
f(X*)
= sup
f(X).
13. .
Convex Sets
and
Convex Functions
13.1 Convex Sets
A set FJ£ E is
said
to be a
convex
set if for
every
X.. , X e F and
0 < a < 1,
.
ax, +
(1-a)
X. e F.
Geometrically,
a set is a
convex
set if the
line segment
joining
any two
points
in the set
lies
in the
interior
of
that
set.
If 3X. +
(1-3)
X. e F
rx,!
^2.
for
every
X , X e F and 3 e E,
then
the set F is
said
to be an
affine
set
Or"-.
O/
or
a
linear variety.
The
closure
of a
convex
set is
convex.
The
intersection
and
union
of
any
number
of
convex sets
is
convex.
The
null
set is
assumed
to be
convex.
The
convex
set
defined
by
every convex linear combination
of a
finite number
of
points
in E is a
simplex
in E . The
convex hull
of a set S is the
smallest convex
set
containing
S. The
closure
of a
convex hull
of S is
the
closed convex hull
of S.
13.2 Convex
and
Concave Functions
16
A
function
f(X):
E -> E
defined
on a
convex
set L is
said
to be
convex
on L if for
every
X , X e L and 0 < a < 1,
Ojl
r\,2.
f(aX
+
(1-a)
X ) <
af(X
) +
(1-a)
f(X ).
If
for X ± X , 0 < a < 1, X , X e L
fXaXj^
+
(1-a)
X
2
) <
aftXj^)
+
(1-a) f(X
2
),
then
f(X)
is
said
to be
strictly convex
on L. A
function
f(X)
is
said
to be
(strictly)
concave
on L if
-f(X)
is
(strictly) convex
on L. A
positive linear
combination
of
convex functions
is
convex.
If
f(X):
E
E
defined
on a
convex
set L^. E is of
class
C on L,
a.
then
f(X)
is
convex
on L if and
only
if
f(X
2
)
>
f(X
x
)
+
Vf(X
1
)
(X
2
- ^
for
all
points
X , X e L.
a.1 Oj2
If
for all X
n
, X
0
e L,
f(X
) > f(X ) +
Vf(X
) (X - X )
"V-2
0^1 Ojl Ojl
<\,Z
/
9^
then
f is
strictly convex
on L. If
f(X)
is of
class
C
v
' on a
convex
set L,
%
then
f(X)
is
convex
on L if and
only
if at
each point
X e L the
Hessian
a, 'x,
matrix
F of f is
positive
semidefinite.
If F is
positive definite
V X e L,
then
f is
strictly convex
on L. .
13.3 Convex Sets Defined
by
Convex
and
Concave Functions
17
Let
f(X):
E
E be a
convex function defined
on a
convex
set L. The
'V'
set
F = {X:
f(X)
< a, X e L} is a
convex
set for
every
a e E. If
f(X)
is
a
concave function defined
on a
convex
set L,
then
the set
9
F
= {X:
f(X)
> a, X e L}
is
a
convex
set for
every
a e E.
If
f(X)
is
linear
or
affine, then f(X)
< a
defines
an
open half space,
f(X)
< a
defines
a
closed half space
and
f(X)
= 0
defines
an
(n-1) dimensional
<\,
~ 'b
hyperplane.
The
intersection
of a
finite number
of
closed half spaces
is a
convex polytope.
A
nonempty bounded convex polytope
is a
convex polyhedron.
A
convex
set may be
defined
by
linear equalities. However
nonlinear
equalities
cannot define
a
convex set.
A
detailed treatment
of
convex
sets
and
convex functions
may be
found
in
references (H4), (L5), (Ml), (Rl), (Zl),
(Z2).
14.
Penalty
and
Barrier Function Methods
Consider
the
inequality constrained problem
PI. The
feasible region
F
is
defined
as
follows.
F
= {X:
c.(X)
> 0, 1 < i < m}.
^ 1
-X.
= = =
The
interior
of the
feasible region
F is
defined
as
F
= {X:
c.(X)
> 0, 1 < i < m}.
The
exterior
of the
feasible region
F is
denoted
as F.
18
14.1 Barrier Function Method
The
barrier function method
is a
transformation technique.
The
barrier
function transformation
for PI may be
represented
as
B(X,u)
=
f(X)
+ I
P.(C.(X)),
U> 0.
^
~ ^ . i i ^ /"
1=
The
function
B(X,u)
is
defined
so
that
a
barrier
is
constructed
at the
<\,
/ .
boundary
of the
feasible region
F. A
solution
X* to PI is
approached from
'V
the
set F by
modifying
the
barrier function using
the
control parameter
tl.
The set F is
assumed
to be
nonempty
and
this means
that
any
boundary point
of
F may be
approached from
a
point
in the set F.
This also implies
that
the
barrier function
is not a
suitable transformation
for
equality constraints.
In the
function
B(X,Ll),
the
second term
is the
barrier term.
For IX > 0,
i,
'
this
term
is
bounded
and is
defined continuously
on the
interval
c.(X)
> 0.
i
^
Further
p . (t) -* », as t
->
0 .
=,
The
commonly used barrier functions
are
(Fl)
,
(RID
(i) The
inverse
barrier function
p.(c.(X))
=
(c.(X))~
.
i i ^ i
-\,
(ii)
The
logarithmic barrier function
p.(c.(X)
=
-£n(c.(X)).
1 1 r\j . 1 i\,
The
function
B(X,u
) is
defined
on F and
twice continuously differentiable
'V/ -L
in
F
T
.
Further
B(X,u)
> 0 and
B(X,u.)
°° as c
(X*)
-> 0 for any i.
I r\j I r^ I 1
Therefore
a
barrier
is
established
at the
boundary
of the
feasible region.
This
barrier prevents
a
search procedure
for
locating
a
solution
X* to PI
r\,
from
leaving
the
feasible region.
As
B(X,U.)
is
defined
on F and the
method
^
'
!
operates
in F , the
barrier function method
is
also called
an
interior-point
method.
If
c.(X*)
= 0,
then
as X
X*, the
growth
of
p.(c.(X))
is
controlled
i >\i i i f\,
a/
or
cancelled
by
decreasing
U.
19
The
barrier function method
may be
summarized
as
follows.
Select
a
(k)
sequence
{u. }
such
that
for
each
k,
For
each
k,
minimize
B(X,
u ) to
find
X
v
' = X(^ '),
starting
the
l\j
* f\j *\j F
•unconstrained
minimization from
X ~ . The
initial starting point
X
must
be in F . The
stopping criteria
for
each unconstrained minimization
i
Ck\ (\r ~\} . ii (k^
Clf-1
">
i.
*
* j
l^/Tr\"'/\
*-
/TT
V
IV
-^ / \ I
ll\r\/
xrV^^yi
may
be
based
on | f
(X
) - f
(X
) | or || X - X ||.
(k)
Let
{X } be the
sequence generated
by the
method.
Then
any
limit
point
of
this sequence
is
optimal
for PI
(Zl)
. The
behavior
of
B(X,n)
may
be
interpreted
in the
following
way
(Rll).
Let
c.(X*)
= 0 for
some
i. As
-*
c
(X*)
= 0,
p.(c.(X
v
^))
°° and
O>
'X'.lfx,
1
ll/X,
0.
(k)
However
if U. is
decreased,
then
p
(c.(X
)) can be
allowed
to
increase
' i i
<\j
(k) (k)
without
increasing
B(X , U.) The
monotonically decreasing sequence
{ n
is
chosen
in
such
a way
that
(k)
(k)
(i) B(X , ix )
monotonically decreases.
'V
(ii)
B(X,
,. ') is
twice continuously differentiable
in F .
a.
(iii)
c.(X
(k)
)
» 0,
X
(k)
i- X*, and
f(X
(k)
)
-f
f(X*).
i-
% 'Vi ^ o> . 'v
As
the
search
for X* is
started
at X e F , the
barrier
at the
boundary
of
f\,
t\, L
F
restricts
the
search procedure
and the
sequence,
{X
v
}, of
minimizing
points
of
B(X,
U ) to the
interior
of F. The
method
is
therefore called
an
interior-point method.
The
strengths
and
weaknesses
of the
method
are
discussed
in
detail
in
reference
(Rll).
The
method facilitates
the
solution
of PI
using
an
unconstrained
minimization
technique
and the
constraints need
not be
20
accounted
for
explicitly.
The
convergence
of the
method
has
been established
(Fl) when
the
problem functions
are
continuous
and X* is at the
boundary
of
•Xi
F
or in the
closure
of F .
Fiacco
and
McCormick (Fl) established
that
there
(k)
exist
a
sequence
{M } and a
corresponding sequence
of
minimizing points
Ck")
generated
by the
algorithm
such that
X
v
'
-*
X* as k
«.
Similar
convergence
properties
and
convergence
of the
other related sequences have been proved
by
Luenberger (L5)
and
Zangwill
(Zl).
The
method does
not
require very strong constraint qualifications
and
it
converges
to a
local
minimum
of PI
where
the
Kuhn-Tucker conditions
may
(k)
or
may not
hold.
By
monitoring
the
convergence
of the
sequences {C(X
)}
(k)
and
{X },
structural information about
the
problem
PI may be
obtained.
The
most commonly sought structural information
is the set of
active
(k) (k) (k)
constraints
at X . The
vector
X is an
estimate
of X of the
f\j
"\» 'V
Lagrange multiplier vector
X* at X*. The
method converges even when
the
(k)
minimization
of
B(X,
u ) is
inexact
for
each
k
(Rll).
The
weaknesses
of the
barrier function method
are of a
computational
nature
and are
most serious
when
the
controlling parameter
u is
small.
The
numerical difficulties associated with
the
algorithm arise
due to the
ill-
(k)
conditioning
of the
Hessian
of
B(X,
M ). The
condition number
of the
^ *
(k) (k)
Hessian
of
B(X,
^ )
increases
as
decreases.
This
causes B(X,
n )
to
have steep-sided
valleys
and
makes
the
search
for an
unconstrained minimum
(k) (k)
of
B(X,
IJL
')
difficult.
In the
algorithm,
u is
gradually decreased
so
(k)
as
to
make B(X,
n )
twice continuously differentiable
and to
reduce
the
(k)
ill-conditioning,
of the
Hessian
of
B(X,
U ). The
feature
that
restricts
f
\j
f
the
general application
of the
method
is
that
it
requires
the
initial
point
21
to
be
feasible
and the
search
for X^
J
is as
difficult
as the
problem
to
ti
be
solved. Further
the
method cannot handle equality constraints.
14.2 Penalty Function Method
.
The
penalty function transformation
for PI may be
represented
as
m
P(X,
o) =
f(x)
+ a E
n.(c.(X)),
a > 0.
^ ^ . -, i i ^
properties
of the
loss
functions
n.(c.(X))
and
P(X,
a) are
discussed
in
i i 'v 'Xi
detail
in
Chapter
2.
Additional
information
may be
found
in
references
(Fl)
,
(L5)
,
(Zl)
,
(Z2)
. The
penalty function designed
to
impose
an
increasing
penalty
on the
objective function
as the
search point
X
moves away from
F
^
and
the
constraint violation increases.
The
loss functions
n.(t)
are
defined
•for
-oo < t < «> and
therefore
the
penalty function
is
defined
on E .
This
implies
that
both
equality
and
inequality constraints
can be
handled
by the
penalty function transformation technique. When
X e F, the
loss term
is
^
zero
and
when
X £ F
penalty
is
imposed
on
F(X)
depending
on how far X is
away
a,
>\, ^
from
F.
Therefore
the
algorithm
may be
started
at any X e E and
specially
X
(0
>
e
F or
X<°>
£
F.
^ r\,
The
loss functions
n. Vi are
usually
chosen
so
that
P(X,
a) is
twice
i
^
dif
ferentiable. However
the
following
loss functions also
are
used
in
souie
algorithms.
(i)
Zangwill's
loss
function
for
inequalities
c . (X) > 0
i
^ =
n,(c.(X))
=
-min
(0,
c.(X))
iio,
i »v
(ii) Absolute value loss function
for
equalities
c.(X)
= 0
...
i %
n.(c
1 1
=
c
a,
22
The
basic penalty function algorithm
is
described
in
Chapter
2.
(k)
The use of
monotonically increasing control parameter
o in the
algorithm
may be
interpreted
as
follows. When
X is in F, the
increase
'v
(k)
in
a
increases
the
penalty weight associated with
the
loss term
(k)
o Z n.
(c.(X)).
Due to
this increase
in the
penalty weight associated
±
* i *
with
the
loss term,
in the
subsequent unconstrained minimization
of
(X,
o>
(k.)
v
'
(k+1)
(k)
P(X,
o
v
') the
loss term
is
reduced
and
hence
c . (X
v
')
->-
0,
permitting
X*.
.The structure
of
P(X,
o)
also implies
that
for
large
a, the
a>
a. .a.
minimum
of
P(X,
a)
will
be in a
region where
a I
n.(c.(X))
is
small.
The
<\,
.
1
!
'V
gradual increase
in a is
designed
to
make P(X,
a)
continuously
dif
ferentiable
'Vi
and
reduce
the
ill-conditioning
of the
Hessian
of
P(X,
o).
(k)
The
convergence
of X to X* and the
existence
of a
corresponding
a. a,
(k)
monotonically increasing sequence
{a }
have been established
by
Fiacco
and
McCormick (Fl)
,
Luenberger (L5)
,
Zangwill
(Zl)
. The
condition number
of
the
Hessian
of P
increases
as 0
increases.
The
penalty function P(X,
a)
a.
forms increasingly steep-sided valley
as a
increases
and
this leads
to
numerical instabilities
in the
unconstrained minimization
of
P(X,
a). Due
a,
to
this
reason,
it is not
possible
to
solve
PI in one
step
via
P(X,
a) by
<\,
choosing
a
large
a. The
gradual increase
in o
makes
the
successive
unconstrained minimization problems easily
to
solve.
In the
penalty
function method
the
solution
X* is
approached from outside
F and
therefore
i)
the
method also
is
known
as the
exterior-point method. Lootsma (L3)
has
comprehensively reviewed
and
classified
the
loss functions
and
barrier
functions. Duality analysis
of the
methods
is
developed
in
references
(Fl), (L5)
and
(Zl)
.
23
14.3 Mixed
Interior
Point
-
Exterior
Point
Method
Fiacco
and
McCormick (Fl) proposed
and
developed
a
mixed interior
point
-
exterior point method
for
solving
P3. The
equality constraints
are
handled
by the
penalty function method
and
inequalities
are
taken into
account
using
the
barrier function method.
The
methods
that
solve
a
constrained problem
by
sequential unconstrained minimizations were termed
Sequential
Unconstrained Minimization Techniques
(SUMT)
by
Fiacco
and .
McCormick (Fl).
The
most popular form
of
SUMT uses
a
quadratic loss function
to
handle equalities
and a
logarithmic barrier function
for
inequalities.
P(X,
a) =
f(X)
= ^ E Sin
c
±
(X)
+ a I
(c
±
(X))
2
a > 0.
.
^ .. ^ i^E ^ ieE ^
In the
above function
E is the
index
set of
equality constraints.
The
properties
of the
mixed function
are the
same those reviewed above
for
(k)
penalty
and
barrier transformations.
The
sequence
X
converges
to X*
when
o
°°.
Additional information about
the
properties, convergence
and
computational considerations
of the
mixed methods
may be
found
in
reference
(L3), (Fl).
15.
Duality Theory
and
Duality
Gap
15.1
The
Primal Problem
Let the
primal problem
be
defined
as
follows.
P:
Minimize f(X),
X e L
^.E
n
subject
to c.
(X).
> 0 i = 1, 2, . . . , m
H.
f
\j " " "
f(X):
E
n
-> E,
c.(X):
E
n
+ E Vi.
The
problem functions
are
defined
on the
nonempty open convex
set L. The
problem
P is
assumed
to
have
at
least
one
feasible solution
and the set
{X:
X e L,
c
±
(X)
> cO
-n
in
E is
compact
and
nonnull
for
every choice
of a. e E .
These assumptions
imply
that
a
finite optimal value
of P is
attained
in the
feasible region
F
(R13)
.
Equivalently,
° < min (P) < °°. The
optimal value
of P, in
general,
is
inf (P) .
Equivalently,
the
optimal value
of P is the inf
f(X) subject
to
'Yi
X e L and c . (X) > 0.
However,
if X* is a
minimizer
of P in F,
then
min (P) =
% i i\, ^i
inf
(P) . The
conditions imposed
on P
imply
the
existence
of a
solution
X* to
a.
P.
Therefore
in
subsequent discussions
the
optimal value
of P is
denoted
as
min
(P) .
The
classical
Lagrangian, L(X,
X) ,
associated with
P is
defined
as
>\, i\j
follows
(R13).
L(X,
X) : E + E
<v <\j
- X
C(X),
X e
L(X,
X) = ^
°
otherwise
Since f(X)
>
L(X,
X), sup
L(X,
X) =
f(X) when
X is
feasible.
The
optimal
^ "~ % ^ %% ^ ^
value
of P
also
may be
represented
as
min
(P) = inf (P) = inf sup
L(X,
X).
XeL
A
vector
X* is a
Kuhn-Tucker vector
for P if
'V,
25
inf
(P) = inf
L(X, X*).
The
optimality
in P may be
characterized
by the
general saddle point condition,
X* e E
^ +
Min
L(X,
X*) =
L(X*,
X*) = Max
L(X*,
X)
XeL
^ ^ ^ * jn ^ ^
^
Xefc
15.2
The
Dual Problem
and
Duality
Gap
The
dual problem
is
defined
as
D:
Maximize v(X)
, X e E
m
v(X)
= inf
L(X,
X)
^ XeL ^ ^
The
optimal value
of the
dual
is
sup
(D) = sup inf
L(X,
X)
,
m XeL ^ ^
Xe
^
Since.
min (P) = inf sup
L(X,
X), min (P) > sup (D) or in
general,
XeL
c
m ^ ^
^ XeE
r\,
inf
(P) > sup (D) . If inf (P) > sup (D) , a
duality
gap is
said
to
exist
between
the
primal problem
P and the
dual problem
D. If
there exists
a X
'
'Y.
at
which
the
maximum
in D is
attained, then
sup (D) = max (D) . If X*
'Xj
solves
D and min (P) = max (D) ,
then
X* is a
Kuhn- Tucker vector
of P.
'Xj
15.3 Global Optimality
and
Primal-Dual
Method
The
necessary condition
for
optimality
may be
expressed
as
follows.
-25-
26
If
X* is a
global minimum
of P and min (P) = max
(D),
then
the
above saddle
point
condition holds.
The
sufficient condition
may be
reformulated
as
follows.
If X*
satisfies
the
above saddle point condition, then
it is a
global
min of^ P and min (P) - max
(D). Further
the
vector
X* in the
saddle
point
relation
is a
global maximizer
of D.
This vector
X* is a
Kuhn-Tucker
vector
for P.
The
saddle point condition
is
always
sufficient condition
for
optimality.
However
it is a
necessary condition that
is
required
to
establish
the
duality
relation
min (P) = max
(D). This duality relation
is
equivalent
to the
existence
of a
Kuhn-Tucker
vector
X* of P. The
primal-dual methods exploit
this
duality relation
to
solve
the
associated nonlinear problem.
In the
ideal case,
the
dual function v(X)
may be
maximized
to get X* and
then
L(X,
X*) may be
minimized
to get X*.
This
method
of
solving
P is
possible
only
for
some simple problems.
The
numerical algorithms based
on the
duality
(k)
(k)
relationship generate
a
maximizing
sequence
{X } for D and for
each
X ,
(k)
(k)
generate
X as a
solution
to min
L(X,
X ). The
sequences
are
generated
(k)
(k)
so
that
X ->' X* and X -> X*. The
saddle-point condition
may be
used
to
design primal-dual
numerical
algorithms
for
solving
P
only
if the
duality
relationship
min (P) = max (D)
holds.
The
satisfaction
of
this duality
relationship depends
on the
nature
of
problem functions
and the
form
of
the
Lagrangian function L(X,
X)
associated with
P.
15.4 Convex Duality
If
P is a
convex, program then
the
compactness assumption
is
fulfilled
when
the set
{X: X e I,
C(X)
> 0}
27
is
compact
and
nonnull.
The
duality theory
for
convex programs
has
been
reviewed
in
detail
by
Geoffrion
(Gl)
and
Rockafellar
(R12), (R13).
For a
convex program
a
Kuhn-Tucker vector
X*
ususally exists
and the
saddle-point
condition
is
always sufficient
for the
optimality
of P at X*.
Rockafellar
(R13)
established
that
for a
convex program
P, min (P) = sup (D)
(R13).
The
point
(X*,
X*) is a
saddle-point
of
L(X,
X) on L x E™, if and
only
if
X*
solves
P and X*
solves
D. If X*
solves
D,
then
for X* to
solve
P it is
necessary
and
sufficient
that
(R13)
.
(i) X*
minimizes
L(X,
X*) on X e L
(ii)
c
±
(X*)
> 0, X* > 0 for
c.(X*)
=0.
Geoffrion
(Gl)
and
Lasden
(Ll)
presented
the
computational applications
of
the
duality theory
for
convex programs. Several
other
possible "duals"
of
P
have been proposed using
the
Lagrangian function
L(X,
X) =
f(X)
-
*\j
'X/ 'X/
T m
X
C(X),
X e
E_j_
. The
following dual formulations
are
reviewed
and
compared
by
Geoffrion
(Gl).
(i)
Geoffrion dual
G
G:
Maximize
{ inf
(f(X)
- X
T
C(X))}
F
m
X e L ^ ^ ^ ^
A E t <v,
*\/
*4"
(ii)
Wolfe dual
W
W:
Maximize
f(X)
- X
T
C(X)
X
> 0
f\j
=
X
e L S E
n
Vf(X)
- E X Vc = 0
*x»
*\/ , i f\, i
i
28
(iii)
Stoer,
Mangasarian
and
Ponstein dual
SMP:
Maximize
f(X)
- X
T
C(X)
>\j
f\, f\, f\,
X
> 0
Oi
=
Subject
to
X
minimize
f(X)
- X
T
C(X)
over
L.
15.5 Duality
in
Nonconvex Programs
The
dual formulation
D of P is
based
on
L(X,
A) and has
inherent
limitations
(R13).
The
implicit feasible
set in D is
m
{A:
X e E
+
,
v(X)
>}
and it is
difficult
to
determine
a
representation
of
this
set.
This implies
that
it is
difficult
to
determine whether
the inf
L(X,
X)
over
X e L is
*"b
*"V/
*\*
finite
and
attained.' Further even
if X*
minimizes
L(X,
X*) and X*
solves
D,
X* may not
solve
P
unless
there
is
only
one
solution
to P. The
dual formulation
D is
meaningful only
in the
convex case, since only
in
this
case
it is
possible
to
establish
the
relation
min (P) = sup (D)
(R13).
Rockafellar
(R12), (R13)
and
Mangasarian
(M2)
showed
that
by
associating
a
different Lagrangian with
P, the
duality
gap in
nonconvex programs
may be
eliminated.
A
wide variety
of
Lagrangians
may be
associated with
P and
each
choice corresponds
to a
different dual problem. Even though great flexibility
is
afforded
by the
theory
in the
choice
of the
Lagrangian
for P, not all of
these
are of
practical value
in
computation.
The
Rockafellar's
augmented
Lagrangian
f(X,
A, o) is a
member
of a
wide class
of
Lagrangians
and has
*\r
*\j f\t
proved
to be
useful
to
developing primal-dual numerical algorithms
for
solving
P. The
duality theory
in
terms
of
y(X,
A, o) for
nonconvex programs
'V* *\t *\i
'29
is
reviewed
in
Chapter
3.
Detailed analysis
of the
duality theory based
on
, X, o) may be
found
in
(R6), (R12)
and
(R13).
The
duality theory based
'v.
,
X, o) for
convex programs
was
investigated
by
Rockafellar (R4), (R5)
15.6 Partial Duality
It
is not
necessary
to
include the.Lagrange multipliers
of all the
constraints
of a
problem
in the
definition
of the
dual function (Gl), (L5).
The
duality
can be
defined with respect
to any
subset
of the
constraints.
If
a
constraint
is
used
to
define
the
Lagrangian associated with
P, it has
a
dual variable
of its
own.
If a
constraint
is
assigned
to
define
the set
L,
it
will
not
possess
a
dual
variable.
Consider
the
convex problem
P
with
the
constraints partitioned
so
that
the
dual
is
defined with respect
to the
constraints
belonging
to the
index
set J. Let p be the
number
of
indices
in
J.
Then
the
partial dual
of P in
terms
of
L(X,
X) and
with respect
to the
set J may be
represented
as . . .
PD:
maximize v(X),
X e E
P
v(.X)
= inf
L(X,
X) X e L, X e E
P
c.(X)
> 0 i ^ J.
The
choice
of
assignment
of a
constraint depends
on the
structure
of the
problem,
or the
nature
of the
theoretical
analysis
or the
ease
of
evaluating
v(X).
'V-
30
APPENDIX
B
COMPUTERIZED
ALAG
ALGORITHM
AND
APPLICATION
1.
Introduction
An
ALAG penalty function algorithm
to
solve
the
equality
and
inequality constrained problem
P3'
(see Chapter
3) is
presented.
An
equality constrained problem
and an
inequality constrained problem
are
solved using this numerical algorithm.
The
algorithm
and the
examples
supplement
the
review
of the
ALAG penalty function technique reviewed
within
this
report.
This
numerical algorithm
was
investigated
by
Fletcher (F8)
.
This
algorithm incorporates
the
parameter iterations
that
have been proven
to be
efficient (F8)
. A
Quasi-Newton
method
that
_utilizes
a
complimentary
Davidson-Fletcher-Powell
update [F4]
for
solving
unconstrained problems
is
used
in the
inner iterations,
2.
ALAG Penalty Function Algorithm
for
Equality
and
Inequality
Constrained Problem.
The
equality
and
inequality constrained problem
P3 is
defined
in
section
3.4.1.
To
simplify
the
presentation
of the
numerical algorithm
in
the
next section,
following
notations
are
used.
E
: The
index
set of
equalities
Sc
. : The
scale factor
for ith
constraint
i
(k)
WW. : The
scaled constraint violation
for ith
constraint
in
i
iteration
k
Sc
i
WW.
31
(k)
n
E
and
c
>
g
i . Sc.
T
i = i
(k)
AKK
: The
largest
scaled .constraint
violation
in
iteration
k
AKK
(k
>
C
=
max!
WW.
(k)
{
1
I
l
J
AK
V
' :
Initial value
of
AKK
V/
in
iteration
k
AKMIN
: The
relative error tolerance required
in the
constraint
.
fkl
residuals
c..
When AKJC
' <
AKMIN
the
algorithm
is
terminated.
This
is the
stopping criterion
for the
outer
iteration.
EPS.
The
tolerance
in x. for
unconstrained minimization
i
i
(k)
c.
: The
current value
or
residual
of ith
constraint
in
iteration
k.
(k)
M(X
): The
index
set of
constraints
that
contribute
to the
ALAG
(k) I"
penalty
function. M(,X
) = 1 i : i e E or
i
i E and
c.
(k)
<
9.
(k)
j
(k)
p
: The
number
of
indices
in M(X ).
(k)-
(k)
(AX.)
: The
Powell-Hestenes correction
for X. in kth
iteration
(k) (k)
(AX
) : The
Newton correction
for X. in kth
iteration,
i N i
Algorithm
A3
(i)
Select
the
initial starting point
X ,
the
initial
estimate
of
parameter vector
9
the
initial penalty constants
a. , V i
32
(ii)
k = k + 1
(iii) Minimize
0 (X, 9 ', o
v
' ) to
find
(k)
(k) (k)
X = X_ (0 , a ),
starting
the
unconstrained
minimization
from
X
Use
Broyden's Quasi-Newton method
for
unconstrained
minimization
of 4> (X, 6
c/
k)
)
$ (X,
9
(k)
, o
(k)
)
=
f(X)
+ 1/2 -L
a.
(k)
[C.
-9.
(k)
]
ieE
"""
L 1
+
i E
0
,ao
(c
_
e
.oo
2
0, C. - 9. > 0
(c
- e )_ = 11-
(c
-
e.),
c. - 9. < o
During
the
unconstrained minimization
of $, an
estimate
of the
Hessian
of
$ is
built-up using
the
first order information about
f (X) , c. (X)
(k) (k)
and
the
change
in X. The
estimate
of the
Hessian
of $ at (X ,9 ,
(k)
(k)
o
) is
represented
as G(o ).
(iv) Estimate
(k)
(k) (k)
the
Lagrange multiplier estimates
X. = a. 9.
(k)
the
constraint residuals
c. ,
(k)
the
scaled constraint violations
WW. "" ,
(k)
the
largest scaled constraint violation
AKK
Ck")
If
AKK^
' <
AKMIN, stop.
Ck")
If
AKK
V
' > AK
V J
go to
(viii)
.
Otherwise
go to (v) .
(v)
Estimate
(AA
. )
i PH
i|E X. * 0, C. < 0,
X.
(k)
-
o.
(k)
C.
(k)
> 0
* i i i i .1
(k) _ (k) ,
PH
- ~
X
i
1
*
E
'
A
i * °'
C
i
<
°'
1
(k)
n
(k) (k)
A . O . - L: *• I
1
11
33
the
constraint tolerance AKMIN
the
tolerance EPS.
on
variable
x.
i i
the
constraint scale factors
Sc.
i
the
initial
upper
bound
on
constraint violation
AK
k = 0
If
k = 1, go to
(vi).
(k)
(k-1)
If
<
EPS., stop.
Otherwise
go to
(vi).
(vi) Find
R = j i : i e E or i i E and
A
. + 0 and c .
i
i
(k)
<
Let p be the
number
of
indices
in R and
Y
(k)
_
(Y
(k)
Y
(k) (k)
T
p
I ~ *
Y
! '
Y
2
'
----
'
Y
p
J e h
Estimate
Y. =
(AX.)
, i e R.
(k)
The Y. , i e R are
determined
by
solving
the
following
subproblem.
Min
Q(Y.
(k
>)
-
Z
C.
Y.
(k)
+
|
Y(N
G^
N)
ieR
1 x
where
G is the
estimate
of the
Hessian
of $ and the
columns
of N are the
gradients
of c.,
ieR.
34
(vii)
X.
(k+1)
=
A.
(k)
+
Y.
(k)
i e R.
2 <
k
> = 4
i
If
Z.(k)
> 1,
o.<
k+1
>
=
Z
.
(k
> o.
(k)
and
d.
(k)
=
(
2
.
(k)
-
l)a.
(k)
,
is R.
(k)
< 1
<
1, o.
n
(k) . , (k) _ ,
= o. and d. = U, i e R.
l i
)
=G(a
(k)
)
+
ND
(k
V
*
AK
(k)
=
AKK
(k)
go
to
(ix).
(viii)
Find
D = j i
Set
o.
i
(k+1)
AK
(k)
or
WW
<
4 WW
(k)?
.
andA.
11
i J
(k+1)
_ . (k)
= A .
1
The
change
in c is
=
9 , i e D, and
Ykl i
d
±
v
' = 0, i ^ D. Let
be
the
diagonal matrix
(k)
(k)
with
d. as
elements.
The
estimate
G(o ) of the
Hessian
of $ is
'adjusted
to
account
for the
change
in o
(k)
as
follows.
-
G(a
(k)
)
D
(k)
N
(k)T
35
(k)
The
columns
of N are the
gradients
of
constraints whose
indices.are
in D
(k) ..
(k-1)
If
- x.
<
EPS.,
stop.
Otherwise
go to
(ix).
(ix)
9.
(k+1)
= A
(k+1
Va
(k+1)
V
i
i i i
go
to
(ii).
3.
Numerical Examples
3.1. Example
1:
Equality constrained Problem
Minimize
: f (X) =
(
X;L
- I)
2
+ (^ - x^
2
+ (x
2
- x
Subject
to
C;L
(X) =
X]L
+ x
2
+ x^ - 2 - 3 /2 = 0
c
2
(X) = x
2
- x
2
+ X
A
+ 2 - 2 /2 = 0
c
3
(X) =
Xl
x
5
-2=0
Starting
point
X
(0)
=
(2,2,2,2,2)
Solution point
X =
(1.1911,
1.3626,
1.4728,
1.635,
1.679)
* _2
Optimal
objective
function
value
f =
7.8776
X 10
The
relative error tolerance
in
constraint residuals
AKMIN
=
0.0008
The
error tolerance
in
variables
EPS.
=
0.00001
V i
i
36
Outer
Iteration
1
X° =
(2,2,2,2,2)
9
(1)
=
(0,0,0)
.
SC
=
(7.75736, 1.0, 2.0)
o
(1)
=
(0.03323, 2.0, 0.5)
.AK
(1
>
= 10 X
10
6
°
Inner
iteration
X
(1)
=
(1.15955, 1.28716, 1.38550, 1.46505, 1.70426)
C
(1)
=
(-.76667, 0.00417, -0.023827)
WW
(1)
=
(0.09883, 0.00417,
0.01191)
=
0.09883
Updating
of
parameters
3
Active Constraints
i =
1,2,3
AX
(1)
=
(0.08186,
-
0.06302, -.12599), X
(1)
= 0
o
increased
to
0.29414
'
increased
to
52.44928
increased
to
23.15088
37
Outer
Iteration
2
X
(1)
=
(1.15955, 1.28716, 1.38550, 1.46505, 1.70426)
9
(2)
=
(.27830,-
-.00120, -.00544)
(2)
a
v
' =
(0.29414, 52.44928, 23.15088)
AK
(2)
=
0.09883
Inner iteration
=
(1.19807, 1.37601, 1.48774, 1.66416, 1.66479)
(2)
C
v
' =
(0.14175, -.00162, -0,00546)
WW
(2)
=
(0.01827, 0.00162, 0.00273)
AKK^
2
"*
=
0.01827
Updating
of
parameters
f
71
A
v
' =
(0.08186,
-0.06302, -0.1259.9)
AA
(2)
=
(-0.04160, 0.06244, 0.10922)
3
active constraints
i =
1,2,3.
(2)
a
increased
to
55.921
Outer
Iteration
3
X
(2)
=
(1.19807,
1.37601, 1.48774, 1.66416,
1.66479)
38
9 =
(.13687, -0.00001, -0.00072)
'
(•3-)
cT
' =
(0-.29414,
55.921,
23.1509)
=
0.0182)
Inner iteration
X
(3)
=
(1.1914, 1.36313, 1.47324, 1.63544, 1.67807)
C
v
' =
(0.00452, -0.00031,
-0.00074)
WW
(3)
=
(0.00058, 0.00031, 0.00037.),
AKK
V
~"
=
0..00058
f
(X
(3)
) =
0.07895.
This
is the
optimal solution
for
specified stopping criterion
AKMIN
=
0.0008.
3.2
Example
2:
Inequality constrained problem
Minimize
f (X) = 2 - -^
(x^x^x
)
Subject
to c. (X) = x. > 0 i =
1,2,
,5
C
i+5
(X) = i
~
X
i + °
i =
1>2
'
----
'
5
Starting
point
X =?
(2,2,2,2,2)
Solution
point
X =
(1,2,3,4,5)
'V
ft
Optimal
objective function value
f
=1.0.
The
relative error tolerance
in
constraint residuals AKMIN
=
0.0008
The
error tolerance
in
variables
EPS.
=
0.0001
i
39
Outer
Iteration
1
X°
=(2,2,2,2,2)
Sc
= 1.0 V i
i
a
(1)
=
3.46667
V i
AK
(1)
= 10 X
10
6
°
Inner iteration
X
(1)
=
(1.35159, 2.21458, 3.15082, 4.11547, 5.0933)
WW
(1)
=
(0,0,0,0,0,0,35159, .21458, .15082, .11547, .0933)
'
AKK
(1)
=
.35159
Updating
of
parameters
5
Active constraints
1=6,
7, 8, 9, 10.
AX
(1)
=
(0,0,0,0,0,
.99039, .4847412, .3084211, .2188432,
.1978085)
a
?
increased
to
4.83063
o
0
increased
to
5.6865
o
a
increased
to
6.2857
a
increased
to
5.3862
40
Outer
Iteration
2
X
(i;>
=
(1.35159,
2.21458,
3.15082,
4.11547,
5.0933)
(2)
T
' =
(0,0,0,0,0,
.28569,
0.10035,
0.05424, 0.03482, 0.03673)
(2)
cT
' =
(3:46667, 3.46667, 3.46667, 3.46667, 3.46667, 3.46667,
4.8306, 5.68664, 6.28569, 5.3862)
AK
(2)
=
0.35159
Inner iteration
X/
'' =
(1.00438,
2.004, 3.0049, 4.0054, 5.00085)
(2)
WW^
J
=
(0,0,0,0,0,
0.00438, 0.00395, 0.00486, 0.005399, 0.000854)
(2)
AKK
V
}
=
0.00539
Updating
of
parameters
(2)
X
v
' =
(0,0,0,0,0,
0.99039, 0.48474, 0.308421, 0.21884,
0.197808)
=
(0,0,0,0,0,
0.01006,
0.01533, 0.025028, 0.031898, 0.00273)
5
active constraints
i =
6,7,8,9,10
(2)
a,
increased
to
4.68405
.
6
C2) '
a
'
increased
to
8.76983
Outer
Iteration
3
=
C1.0043, 2.0039, 3.0049, 4.0054,
5.0009)
41
9
(3)
=
(0,0,0,0,0, 0.2136, 0.1035, 0.05864, 0.03989, 0.02287)
cT
' =
(3.46667,. 3.46667, 3.46667, 3.46667, 3.46667,
4.68405, 4.8306, 5.6866, 6.2857, 8.7698)
AK
(3)
=
0.00540
Inner iteration
=
(1,2,3,4,5)
WW
(3)
=
(0,0,0,0,0, 0.00011, 0.000031, 0.000031, 0.00013,
0.000067)
f
(X
(3)
)
=
1.00018,
AKK
(3)
=
0.00013
This
is the
optimal solution
for
specified stopping criterion
AKMIN
=
0.0008.
42
APPENDIX
C
COMPUTER PROGRAM DOCUMENTATION
This particular section
of the
report contains
the
pertinent
documentation
for the
computer programs designed
and
implemented
in
conjunction with this research grant. Three different computer programs
were developed
all
based upon
the
Augmented Lagrangian Penalty Function
technique
for
Nonlinear Programming. These programs differ from each
other
primarily
as a
function
of the
type
of
unconstrained optimizer
used.
These
programs
are
entitled
ALAG1
through ALAG3. ALAG1
and
ALAG2
require
closed form gradient equations
for the
functions
to be
optimized.
Whereas ALAG3 does
not
require gradient information
be
supplied
by the .
"user.
TABLE
I.
Unconstrained Optimizers
for
ALAG
Computer
Programs
Computer
Program
. .
Unconstrained Optimizer
ALAG
1
Fletcher algorithm using
a
quasi-
Newton complimentary Davidon-Fletcher-
Powell update formula (P4).
ALAG
2
Variable metric
method
without line
searches
as
proposed
and
analyzed
by
Powell
(P5)
ALAG
3
Same method
as
ALAG1
except derivatives
are
estimated
by
differences
43
COMPUTER PROGRAM: ALAG
1
LANGUAGE:
FORTRAN
TECHNICAL
REFERENCES: (F8), (P4)
ALAG
1
1.
PURPOSE:
To
minimize
a
function
F (x) = f
(X-, ...,
X )
~~
1 n
subject
to
both equality
and
inequality constraints.
Derivatives
of all
functions must
be
supplied
in a
user
subroutine
entitled ALAGB (see item
5). An
initial estimate
of the
solution (not necessarily
feasible) must
be
specified. This computer program
is
developed from algorithm
of
section
2.
USE: CALL ALAG1
(N,M,K,X,EPS,
AKMIN, DFN, MAXFN, IPR1,
IPR2,
IW,
MODE)
N An
INTEGER
set to the
number
of
variables
n (N _> 2).
M An
INTEGER
set to the
total number
of
constraints
m (M >_ 1) .
K An
INTEGER
set to the
total
number
of
equality constraints
k.
X A
REAL
array
of N
elements
in
which
the
initial
estimate
of the
solution must
be
set. ALAG1
returns
the
solution
x in X.
EPS A
REAL array
of N
elements,
in
which
the
tolerances
for the
unconstrained
minimizations
must
be
set.
EPS (I)
should
be set so
that
EPS
(I)/X
(I) =
AKMIN, roughly
speaking.
AKMIN
A
REAL number
in
which
the
relative error
.-
tolerance required
in the
constraint residuals
must
be
set. ALAG1
will
exit when
max{|c.(x)|/
scaling factor
for c.}
_<_
AKMIN
for the
active
constraints {i}.
DFN A.
REAL number
in
which
the
likely reaction
in
F(x)
must
be
set. This
is
done
in the
same
way as for
QNWTA
- see the
QNWTA description.
MAXFN
An
INTEGER
in
which
the
maximum number
of
calls
of
ALAGB
on any one
unconstrained minimization
must
be
set.
IPR1
An
INTEGER controlling
the
frequency
of
printing
from
ALAG1.
Printing occurs every IPR1 iterations,
except
for
details
of
increases
to the c.
which
are
always
printed.
No
printing
at all
occurs
(except
for
error diagnostics)
if
IPR1
= 0.
IPR2
An
INTEGER controlling
the
frequency
of
printing
fr'om
QNWTA. IPR2 should
be set as
described
in
the
QNWTA documentation.
IW
An
INTEGER
giving
the
amount
of
storage available
in
COMMON/ALAGL/W(.).
Set to
2500
unless
wishing
to
change
the
restrictions (see Section
5).
MODE
An
INTEGER controlling
the
mode
of
operation
of
ALAG1.
If any
positive definite estimate
is
available
of the
hessian matrix
of the
penalty
46
function,
set
|MODE|
= 2 or 3,
otherwise
set
|MODE|
= 1
(see
QNWTA description).
If
estimates
of the a. and 0.
parameters
are
available
(see
item
8) set
MODE
< 0,
otherwise
set
MODE
> 0. A
normal setting
for a
one-off
job
with
no
information available
is
MODE
= 1.
3. .
LABELED
COMMON AREAS:
.
Certain labeled COMMON areas must
be
declared
and set on
entry
to
ALAG1.
COMMON/ALGAGE/C(150)
Set
scale factors
(>0)
for the
constraints
in
C(l),
C(2),....,C(M).
Choose
the
magnitude
of
these scale factors
to
give
an
indication
of
the
constraints evaluated about
the
initial
approximation
x. If any
constraints
are
violated
by
an
amount greater
in
modulus than
that
which
is
set,
then
the
setting
is
increased accordingly.
These scale factors
are
transferred
to
C(M+1),
C(M+2)
,C(2M)
by
ALAG1.
COMMON/ALAGF/GC(25,50)
Set the
derivatives
of any
linear constraints
on
entry rather than
in
ALAGB. This
is the
most
efficient
and the
numbers
are not
disturbed.
The
manner
of
setting
is
described
in
item
4.
If
MODE
< 0 is
used, then
set the
parameters
G
1
,0
2
,
...,
0
m
in
T(l), T(2),...,T(M)
and the
parameters
o^c^,
.. . ,o^ in
T(M+1),T(M+2),...,T(2M).
The
meaning
of
these parameters
may be
found
in
section
of
this report.
COMMON/ALAGG/T(150)
COMMON/ALAGI/G2P(325)
If
|MODE|
= 2 or 3 set the
estimated
hessian
matrix
of the
penalty function
in
G2P(1)
,'.'..,
G2P(N-(N+l)/2).
The
manner
of
setting
is
that
described
in
QNWTA
under
the
heading
MODE.
Local
storage
for
ALAG1
is
through labeled COMMON
areas.
These
have been
set on the
assumption
that
N <_ 25 and M
<_-50.
If it is
desired
to
remove either
or
both
of
these .restrictions, then
it is
necessary
to
increase
the
storage
available
in
some
or all of
these areas.
This
can
be
done
by
defining
the
named COMMON areas
in the
users
MAIN
with
the
increased storage settings,
in
which
case
the
extra storage
will
be
effective throughout
the
whole program.
The
complete list
of
labeled
COMMON used
by
ALAG1
and the
corresponding
values
of N and M are as
follows.
COMMON/ALAGC/F,M,K,IS,MK,NU
independent
of N and M
D/G(50)
2N
E/C(150)
3M
" .
F/GC(25,50)
N,M
"
G/T(150)
3M
H/GP(50)
u (u =
max(M,N))
"
I(G2P(325)
. N-
(N+D/2
J/V(50)
u
K/WW(150)
3y
L/W(2500)
u
2
M/ZZUOO)
2p
N/LTUOO)
. 2M
48
4.
ACCURACY: This iterative.algorithm terminates normally when
.the
following convergence condition
is
met:
max { | c . (x) |
/scaling factor
for c.}
_<_
AKMIN
for
i an
element
of the set of
active constraint indices.
A
diagnostic message
for
abnormal termination
is
printed
when
the
program
is
unable
to
achieve
the
requested
accuracy. This
may be due to (i) a
mistake
in
programming ALAGB,
(ii)
there
is no
feasible
point
(in
which case
a
.-*•<*>
and c.
->
constant
^ 0),
(iii)
EPS has
been
set too
large relative
to
AKMIN,
(iv)
the
problem
is too
ill-conditioned.
OTHER ROUTINES: ALAG1 requires
the use of
ALAGB, ALAGZ, BQDMA,
MULDA, MULDB, MULDE,
and
QNWTA
ALAGB: USER SUBROUTINE
The
user must define
a
subroutine headed
by
SUBROUTINE
ALAGB(N,M,X)
REAL
X(l)
COMMON/ALAGC/F
COMMON/ALAGD/G(50)
"
COMMON/ALAGE/C(150)
COMMON/ALAGF/GC(25,50)
This subroutine takes
the
vector
X and
sets
(1)
F(x)
in F; (2)
CjCx),...,
c
m
(x)
in
C(l),...,C(M);
(3)
(3F/
3X
,...,3F/
9X
)|- in
G(l),...,G(N);
In
(4) (8c /
,...,3c
/ )|- in
A
l n
GC
(N,I)
for I =
1,;..,M.
ALAGZ: This subroutine evaluates
the
augmented function comprised
of
the
original objective function
and
penalty terms
that
is to be
optimized.
SUBROUTINE ALAGZ
(N, X,
PHI, GPHI)
N
and X as
previously defined.
PHI is the
value
of the
augmented function evaluated
at X.
GPHI
is the
gradient
of the
augmented function evaluated
at X.
BQDMA:
The
purpose
of
BQDMA
is to
find
the
values
that
minimize
a
quadratic
of n
variables subject
to
upper
and
lower bounds
on
some
or
all of the
variables.
.
The
quadratic
is
defined
by
Q(X)
= 1/2 X*- AX - B^t
Subject
to:
BL. <_ X. <_ BU. i = 1,. . . ,N.
SUBROUTINE BQDMA (N,A,IA,B,BL,BU,X,Q,LT,K,G)
N
. an
INTEGER which must
be.set
by the
user
to the
number
of
variables.
A
a
REAL,
two
dimensional array, each dimension
at
least
N;
the
elements
in the
upper triangle
A(I,J)
Ij^J^N must
be set by the
user
to the
corresponding
A., in
(1), and.
.
will
remain untouched
by the
subroutine. Elements
A(I,J)
N>I>J
are
used,
as
working
space.
IA
an
INTEGER giving
the
first dimension
of A in the
statement which
assigns
space
to A.
B
a
REAL
array
of at
least
N
elements.
The
user must
set
B(I).
B is not
overwritten
by
BQDMA.
50
BL
a
REAL array
of at
least
N
elements.
The
user must
set
BL(I)
to the
lower bound
on the I
variable.
If
the
bound
is
non-existent,
set it to a
very small
number like
-1E75.
BL is not
overwritten
by
BQDMA.
BU
a
REAL array
of at
least
N
elements.
The
user must
set
BU(I)
to the
upper bound
on the I
variable.
If
the
bound
is
non-existent,
set it to a
very large
number.
BU is not
overwritten
by
BQDMA.
X a
REAL array
of at
least
N
elements. BQDMA returns
the
solution
in
X(I).
Q a
REAL variable
in
which BQDMA returns
the
solution
value
of the
quadratic.
LT an
INTEGER array
of at
least
N
elements,
set by
BQDMA
to a
permutation
of the
integers
1,2,...,N
(see
K and G
below)
K an
INTEGER
set by
BQDMA
to the
number
of
free
variables
at the
solution (those
not 'on
their bounds).
These
are the
variables
LT(1), LT(2),...,LT(K).
G a
REAL array
of at
least
3*N
elements.
G(l), ,G(N)
are set by
BQDMA
to the
gradient evaluated
at the
solution point.
G is
indirectly addressed
so
that
G(I)
contains
the
gradient with respect
to the
LT(I)
variable, whence
G(l),....,G(K)
will
be
found
to be
zero.
G(N+1),...,G(3*N)
are
used
by
BQDMA
as
working
space.
51
MULDA
is a
subroutine
for use in
problems which involve
the
T
addition
or
subtraction
of
rank
one
matrices
a zz to
positive
definite
or
semi-definite symmetric matrices
A
stored
in
factored
T
form
A = LDL ,
such that
the
resulting
N x N
matrix
T
A
= A + a zz
is
also
known
to be
positive definite
or
semi-definite. Note
that
L is
lower triangular with £..=1,
and D is
diagonal with
d. > 0.
11
i
SUBROUTINE MULDA
(A, N, Z,
SIG,
W, IR, MK,
EPS)
A
A
REAL
one
dimensional array
of
N*(N+l)/2
elements
T
in
which
the
matrix A=LDL must
be
given
in
factored
form.
The
order
in
which elements
of L and D are
stored
is
d
1
>
£
21'
£
31'""
£
Nl'
d
2'
£
32"
<
"
£
N2'
d,,
,,£,„,
,
,rd,,.
The
factors
of the
matrix
N-l
N,N-1
N .
T
A
= A + a zz
will overwrite those
of A on
exit.
N
An
INTEGER (N>jL) which must
be set to the
dimension
of
the
problem.
Z A
REAL
one
dimensional array
of N
elements
in
which
the
vector
z
must
be
set,
The
array
Z is
overwritten
by the
routine.
SIG
A
REAL variable
in
which
the
scalar
a
must
be
set.
SIG is not
restricted
to +., but if
SIG<0 then
it
must
be
known
from other considerations that
A is
positive definite
or
semi-definite, apart from
the
effects
of
round-off error.
52
W
A
REAL array
of N
elements.
If
SIG>0 then
W is
not
used,
and the
name
of any pne
dimensional array
can be
inserted
in the
calling sequence.
If
SIG<0 then
W is
used
as.
work space.
In
addition
for .
SIG<0
it may be
possible
to
save time
by
setting
in
W the
vector
v
defined
by
Lv=z.
The
ways
in
which
this
can
occur
are
described under
MK
below.
IR An
INTEGER
to be
.set
so
that ,|lR|
is the
rank
of A.
If
the
rank
of A is
expected
to be
different from
that
of
A, set
IR<0.
On
exit from MULDA,
IR(^O)
will
contain
the
rank
of A.
MK An
INTEGER
to be set
only when SIG<0,
as
follows.
If
the
vector
v
defined
by
Lv=z
has not
been calculated
previously,
set
MK=0.
If
MULDA
has
been used previously
to
calculate
A ' z,
then
v is a
by-product
of
this
calculation
and is
stored
in the W
parameter
of
MULDE.
In
this case transfer
v to the W
parameter
of
MULDA
and
set
MK=1.
If z has
been calculated
as z = Au for
some arbitrary vector
u
using
MULDD,
then again
v is
a
by-product
of the
calculation
and is
available
in the
W
parameter
of
MULDD.
In
this case
(or any
other
in
which
v is
known)
set v in the W
parameter
of
MULDA
and
set
MK=2.
EPS A
REAL variable
to be set
only
when
SIG<0
and A is
expected
to
have
the
same
rank
as A. In
certain ill-
53
conditioned cases
a
non-zero diagonal element
of
D
might become
so
small
as to be
indeterminate.
. Two
courses
of
action
are
possible.
One is to
introduce
a
small perturbation
in
order that
A
keeps
the
same rank
as A.
This
is the
normal course
of
action
and is
achieved
by
setting
EPS
equal
to
the
relative machine precision
e. The
other course
of
action
is to let the
rank
of A be one
less than
the.
rank
of A.
This
is
achieved
by
setting
EPS
equal zero.
MULDB
-
factorizes
a
positive definite symmetric matrix given
in
.
. A.
This matrix
is
then
used
in
MULDA.
SUBROUTINE MULDB
(A, N, 1R)
A
Must contain
the
elements
of A in the
order
a
ll'
a
21'
' ' *
'
a
Nl'
a
22'
a
32'
' '
-
3
N2'
' ' '
'Vl.N-l^N.N-l^
that
is as
successive columns
of its
lower triangle).
On
exit
A
will
be
overwritten
by the
factors
L and D
in the
form described
in
MULDA.
N .
Order
of the
matrix
A,
IR
An
INTEGER
set by
MULDB
to the
rank
of the
factori-
zation.
If the
factorization
has
been performed
successfully IR=N will
be
set.
.If
IR<N
then
the
factorization
has
failed because
A is not
positive
definite (possibly
due to
round-off
error).
In
this
case
the
factors
of a
positive semi-definite matrix
54
of
rank
IR
will
be
found
in A.
However
the
results
of
this calculation
are
unpredictable,
. and
MULDB should
not be
used
in an
attempt
to
factorize
positive semi-definite matrices.
MULDE calculates
the
vector
z* = A z
where
A is in
factored form
SUBROUTINE MULDE
(A, N, Z, W, IR)
A
Must
be set in
factored form.
N
Order
of the
matrix
A.
2 A
REAL array
of N
elements
to be set to the
vector
z.
_1
On
exit
Z
contains
the
vector
z* = A z.
W A
REAL array
of N
elements which
is set by
MULDE
to
be
vector
v
defined
by
Lv=z.
If
this vector
is
not
of
interest, replace
W by Z in the
calling
sequence
to
obviate
the
need
to
supply extra storage.
IR An
INTEGER which must
be set to the
rank
of A.
QNWTA finds
the
minimum
of a
function F(x)
of
several variables
given that
the
gradient vector
can be
calculated. This routine
is
based
upon
a
quasi-Newton method described
by
Fletcher
in
(F8).
SUBROUTINE QNWTA
(FUNCT,
N, X, F, G, H, W,
DFN, EPS, MODE, MAXFN,
IPRINT,
IEXIT).
FUNCT
An
IDENTIFIER
of the
users subroutine.
N An
INTEGER
to be set to the
number
of
variables
(N
_>_
2) .
X A
REAL ARRAY
of N
elements
in
which
the
current estimate
of
the
solution
is
stored.
An
initial approximation
55
must
be set in X on
entry
to
QNWTA
and the
best
estimate
obtained will
be
returned
on
exit.
F A
REAL number
in
which
the
best value
of
F(x)
corresponding
to X
above
will
be
returned.
G A
REAL ARRAY
of N
elements
in
which
the
gradient
vector
corresponding
to X
above will
be
returned.
Not
to be set on
entry.
H A
REAL ARRAY
of
N*(N+l)/2
elements
in
which
an
estimate
of the
hessian matrix
is
stored.
The
matrix
is
represented
in the
product
form
LDL
where
L is a
lower triangular matrix with unit
diagonals
and D is a
diagonal matrix.
The
lower
triangle
of L is
stored
by
columns
in H
excepting
that
the
unit diagonal elements
are
replaced
by
the
corresponding elements
of D. The
setting
of
H on
entry
is
controlled
by the
parameter MODE.
W
A
REAL ARRAY
of 3*N
elements used
as
working space.
DFN ' . A
REAL number which must
be set so as to
give QNWTA
an
estimate
of the
likely
reduction
to be
obtained
in
F (x) .. DFN is
used only
on the
first
iteration
so
an
order
of
magnitude estimate will suffice.
The
information
can be
provided
in
different ways
depending upon
the
sign
of DFN
which should
be set
in
one of the
following
ways:
56
DFN>0
the
setting
of DFN
itself
will
be
taken
as the
likely reduction
to be
obtained
in
F(x).
DFN=0
it
will
be
assumed that
an
estimate
of
the
minimum value
of
F(x)
has
been
set
in
argument
F, and the
likely reduction
in
F(x)
will
be
computed
according
to
the
initial function value.
DFN<0
a
multiple |DFN|
of the
modulus
of the
initial function value will
be
taken
as
an
estimate
of the
likely reduction.
EPS A
REAL ARRAY
of N
elements
to be set on
entry
to
the
accuracy required
in
each element
of X.
MODE
An
INTEGER
which controls
the
setting
of the
initial
estimate
of the
hessian matrix
in the
parameter
H.
The
following settings
of
MODE
are
permitted.
MODE=1
An
estimate corresponding
to a
unit
matrix
is set in H by
QNWTA.
MODE=2 QNWTA assumes
that
the
hessian matrix
itself
has
been
set in H by
columns
of
its
lower triangle,
and the
conversion
.
p
x
to
LDL
form
is
carried
out by
QNWTA.
The
hessian matrix must
be
positive definite.
MODE=3 QNWTA
assumes
that
the
hessian matrix
has
been
set in H in
product form.
This
is.
57
MAXFN
IPRINT
IEXIT
convenient when using
the H
matrix
from
one
problem
as an
initial
estimate
for
another,
in
which case
the
contents
of H are
passed
on
unchanged.
An
INTEGER
set to the
maximum number
of
calls
of
FUNCT
permitted.
An
INTEGER controlling printing. Printing occurs
every
|IPRINT|
iterations
and
also
on
exit,
in the
form
Iteration
No, No of
calls
of
FUNCT,
IEXIT
(on
exit
only).
Function value
X(1),X(2),...,X(N)
8 to a
line
.
G(l),G(2),...,G(N)
8 to a
line
The
values
of. X and G can be
suppressed
on
inter-
mediate iterations
by
setting IPRINT<0.
All
intermediate printing
can be
suppressed
by
setting
IPRINT=MAXFN+1.
All
printing
can be
suppressed
by
setting
.IPRINT=0.
An
INTEGER giving
the
reason
for
exit from QNWTA.
This will
be set by
QNWTA
as
follows:
IEXIT=0
(MODE=2
only).
The
estimate
of the
hessian matrix
is not
positive definite.
IEXIT=1
The
normal exit
in
which
|DX(1)|<EPS(I)
for
all
1=1,2,...N,
where
DX(I)
is the
change
in X on an
iteration.
58
T
IEXIT=2
G
DX>^0.
Not
possible without rounding
error. Probable cause
is
that
EPS is
set
too
small
for
computer
word length.
IEXIT=3
FUNCT called MAXFN times.
59
COMPUTER PROGRAM: ALAG
2
LANGUAGE: FORTRAN
TECHNICAL
REFERENCES: (F8), (P5)
60
The
ALAG2 program differs from
ALAG1
only
in the
type
of
uncon-
strained
optimizer routine employed. Therefore,
this
section will only
document
this, routine
and the
user
is
referred
to the
documentation
on
ALAG1 (except
for the
ALAG1 routine,
QNWTA)
as
being applicable
to
ALAG2.
The
unconstrained
optimizer
routine
for
ALAG2
is
VAMMA. The.
purpose
of
VAMM
is to
calculate
the
minimum value
of a
multivariate
function. This routine uses
the
BFGS variable metric
method
without
line searches
of the
type
analyzed
by
Powell (P5).
SUBROUTINE VAMMA (FUNG,
N, X, F, G,
SCALE, ACC,
W,
MAXFN)"
FUNG
The
name
of the
subroutine provided
by the
user.
It
must
be
declared
in an
EXTERNAL
statement.
N
An
integer whose value must
be set to the
number
of
variables.
X An
array
of at
least
n
elements,
set by the
user
to
initial values
of the
variables
(x, ,x
,...,x
).
Usually computing time
is
1 2. n
saved
if
these
estimates
are
close
to the
final
solution.
They
are
changed automatically
to
the
values that give
the
least calculated
value
of the
objective function.
F A
real variable
that
is set
automatically
to the
least calculated value
of the
objective function.
G An
array
of at
least
n
elements
that
are set
automatically
to the
components
of the
first
61
SCALE
ACC
W
MAXFN
derivative
vector
of F for the
final values
of
the
variables. Small values indicate
a
successful calculation.
An
array
of at
least
n
elements, whose
i
component
(l<i<n)
must
be set to a
positive value
that
is a
suitable change
to
make
to x.
initially
in
the
minimization calculation. About
10% of
the
total
expected
change
in x. is
often
a
good
value. This array
is
called SCALE because
its
elements should reflect
the
relative
sizes
of
V.X.,
, X . . . , X ) .
1 2. n
A
real number that defines
the
required accuracy.
The
calculation finishes when,
for
i=l,2,...,n.
changes
in x. of
size
ACC*SCALE(i)
do not
reduce
the
objective function. When
in
doubt
about
the
value
of ACC it is
usually best
to
choose
a
small
value.
An
array
of at
least
-rn(n+13)
elements that
is
used
as
working
space.
On
exit from
the
subroutine
the
first
yn(n+l)
locations
of W
give
the
final approxi-
mation
of the
second derivative matrix, stored
in
the
factored form used
by
subroutine
MULDA.
An
INTEGER
set to the
maximum number
of
calls
of
FUNC permitted.
62
BLOCK COMMON
for
VAMMA
COMMON/VAMMA/IPRINT,LP,MAXFUN,MODE,NFUN
The
five integers called IPRINT,LP,MAXFUN,MODE
and
NFUN
are
present
in
a
common block
in
order that they
can be
reached
by the
user.
In
most calculations they
can be
ignored,
but
sometimes they
are
useful, their purpose being
as
follows.
IPRINT
This
has a
default value
of
zero,
and is
unchanged
by
VAMMA.
If
IPRINT=0,
then
no
printing
occurs except perhaps
the
diagnostic
message mentioned below. Otherwise
the
value
of
the
.objective function
is
printed every
|IPRINT|
iterations.
If
IPRINT>0
the
values
of
X(.)
and
G(.)
are
printed also.
If
IPRINT/0
the
final values
of
F,X(.)
and
G(.)
are
always
printed.
LP .
This
has a
default value
of 6, and is the
stream
number
for any
output
from VAMMA.
MAXFUN This
has a
default value
of
zero,
in
which case
it
does
not
influence
the
calculation. However,
if
it is
positive, then VAMMA
finishes
automatically
when
the
user subroutine
is
called MAXFUN times.
Normal convergence
can
occur earlier.
MODE This
has a
default value
of
one,
in
which case
the
initial approximation
to the
second derivative
matrix
is set
automatically
to a
positive diagonal
63
matrix.
However,
if a
suitable positive
definite approximation
is
known,
then
it may
be
passed
to
VAMMA
in the
first
-rn(n+l)
locations
of W by
setting MODE=2
or
MODE=3.
When MODE=2 these elements
of W
must
contain
the
lower triangle
of the
Hessian approximation,
B
say,
in the
order
B
u>
B
2
i'
B
3i'
»
B
n
i'
B
,B
,...,B ,...,B
B B .
When
2.2.
32 n2 n-1 n-1 n n-1 n n
MODE=3
the
Hessian approximation
must
be
given
in
the
factored form used
by
subroutine MULDA,
which
is
also
the
form used
to
provide
the
Hessian
approximation
in W at the
return from
VAMMA.
A
check
for
positive
definiteness
is
made
automatically
by
VAMMA,
and if it
fails
a
diagnostic
message
is
printed.
In
this case
the
calculation
proceeds
as
though
MODE-1,
but the
actual value
of
MODE
is not
changed.
NFUN
This
integer
is set by
VAMMA
to the
number
.of
times
it-calls
the
user subroutine.
64
COMPUTER
PROGRAM:
ALAG3
LANGUAGE: FORTRAN
TECHNICAL REFERENCES: (F8)
65
The
ALAG3 program differs from ALAG1
and
ALAG2
in the
type
of
unconstrained optimizer routine employed. Therefore, this section
.
will
only document this routine
and the
user
is
referred
to the
documentation
on
ALAG1
as
being applicable
to
ALAG3.
The
unconstrained
optimizer routine employed within ALAG3
is
referred
to as
FDQNW.
The
purpose
of
FDQNW
is to
calculate
the
minimum value
of a
multivariate
function.
The
method
used
is the
quasi-Newton method
of
ALAG1
in
which
derivatives
are
estimated
by
finite difference techniques.
SUBROUTINE FDQNW (FUNCT,
N, X, F, G, H, W,
DFN,
XM, HH,
EPS, MODE,
MAXFN, IPRINT,
IEXIT)
FUNCT
The
name
of the
subroutine provided
by the
user.
It
must
be
declared
in an
EXTERNAL
statement.
N ' An
INTEGER
to be set to the
number
of
variables
(N>.2).
X . . " A
REAL ARRAY
of N
elements
in
which
the
current
estimate
of the
solution
is
stored.
An
initial
approximation must
be set in X on
entry
to
FNQNW
and
the
best estimate obtained will
be
returned
on
exit.
F ' A
REAL number
in
which
the
best value
of
F(x)
corresponding
to X
above will
be
returned.
G A
REAL ARRAY
of N
elements which
is
used
to
store
an
estimate
of the
gradient vector
VF(x).
Not
to
be set on
entry.
66
H A
REAL ARRAY
of
N*(N+l)/2
elements
in
which
an
2
estimate
of the
hessian matrix
9
F/(3x.8x.)
is
stored.
The
matrix
is
represented
in the
product
T
form
LDL
where
L is a
lower triangular matrix
with unit diagonals
and D is a
diagonal matrix.
The
lower triangle
of L is
stored
by
columns
in
H
excepting
that
the
unit diagonal elements
are
replaced
by the
corresponding elements
of D. The
setting
of H on
entry
is
controlled
by the
parameter MODE.
W A
REAL ARRAY
of 3*N
elements used
as
working
space.
DFN A
REAL number which must
be set so as to
give FDQNW
an
estimate
of the
likely reduction
to be
obtained
in
F
(x).
DFN is
used only
on the
first iteration
so
an
order
of
magnitude estimate
will
suffice.
The
information
can be
provided
in
different
ways
depending
upon
the
sign
of DFN
which should
be set
in one of the
following
ways:
DFN>0
the
setting
of DFN
itself
will
be
taken
as the
likely reduction
to be
obtained
in F
(x).
DFN=0
it.will
be
assumed
that
an
estimate
of
the
minimum value
of F (x) has
been
set
in
argument
F, and the
likely reduction
in
F (x)
will
be
computed according
to
the
initial
function value.
67
DFN<0
a
multiple
|DFN|
of the
modulus
of
the
initial function value will
be
taken
as
an
estimate
of the
likely reduction.
XM . A
REAL ARRAY
of N
elements
to be set on
entry
so
that
XM(I)
> 0
contains
an
indication
of the
magnitude
of
X(I).
This quantity need
not be set
precisely
as it is
merely
used
in
scaling
the
problem.
HH A
REAL number
to be set so
that
HH*XM(I)
contains
a
step
length
to be
used
in
calculating
G(I)
by
differences.
Set HH
equal
to 2
where
t is the
number
of
significant binary digits
in the
calculation
•of
F.
EPS A
REAL number
to be set on
entry
so
that
the
accuracy
required
in
X(I)
is
EPS*XM(I)
for all I,
(EPS
> 0).
MODE
An
INTEGER which controls
the
setting
of the
initial
estimate
of the
hessian matrix
in the
parameter
H.
The
following settings
of
MODE
are
permitted.
MODE=1
An
estimate corresponding
to a
unit
matrix
is set in H by
FDQNW.
MODE=2 FDQNW assumes that
the
hessian matrix
itself
has
been
set in H by
columns
of its
lower triangle,
and the
conversion
T
to
LDL
form
is
carried
out by
FDQNW.
The
hessian matrix must
be
positive definite.
68
MAXFN
IPRINT
IEXIT
MODE=3 FDQNW assumes
that
the
hessian matrix
has
been
set in H in
product
form.
This
is
convenient when using
the H
matrix from
one
problem
as an
initial estimate
for
another,
in
which case
the
contents
of H are
passed
on
unchanged.
An
INTEGER
set to. the
maximum number
of
calls
of
FUNCT permitted.
Up to 2N
more calls
may be
taken
if
the
limit
is
exceeded whilst evaluating
a
gradient
vector
by
differences.
An
INTEGER controlling printing. Printing occurs
every
|IPRINT|
iterations
and
also
on
exit,
in the
form
Iteration
No.,
No of
calls
of
FUNCT,
IEXIT
(on
exit
only).
Function value
A(1),X(2),
.-.
.,X(N)
8 to a
line.
G(1),G(2),...,G(N)
8 to a
line
The
values
of X and G can be
suppressed
on
inter-
mediate iterations
by
setting IPRINT<0.
All
intermediate
printing
can be
suppressed
by
setting
IPRINT=MAXFN+1.
All
printing
can be
suppressed
by
setting
IPRINT=0.
An
INTEGER giving
the
reason
for
exit from FDQNW.
This
will
be set by
FDQNW
as
follows:
69
IEXIT=0
(MODE=2
only).
The
estimate
of
the
hessian matrix
is not
positive
definite.
IEXIT=1
The
normal exit
in
which
|DX(I)|<EPS(I)
for
all
1=1,2,...,N, where
DX(I)
is
the
change
in X on an
iteration.
T
IEXIT=2
G
DX>0.
Either
due to
rounding errors
because
EPS is set too
small
for the
computer
word length,
or to the
truncation error
in the
finite difference
formula
for G
being dominant.
IEXIT=3
FUNCT called MAXFN times.
70
REFERENCES
Al.
Arrow, K.J.,
et al. ,
Studies
in
Linear
and
Nonlinear Programming,
Stanford
University
Press,
Stanford, California, 1958.
A2.
Arrow, K.J.,
et
al.,
"A
General Saddle Point Result
for
Cons'trained
Optimization", Mathematical Programming,
5, pp.
225-234,
1973.
A3.
Armacost, R.L.,
and
A.V. Fiacco, "Exact Sensitivity Using
Augmented
Lagrangians",
The
George Washington University, School
of
Engineering
and
Applied
Science,
Scri.nl
T-349,
22
March
1977.
Bl.
Bertsekas, D.P., "Convergence
Rate
of
Penalty
and
Multiplier
Methods",
Proceedings
of
1973 IEEE Conference
on
Decision
and
Control,
S.an Diego, California,
pp.
260-264.
IV2.
Hertsekas, D.P., "Combined Primal-Dual
and
Penalty Methods
for
Constrained
Minimization", SIAM Jourmal
on
Control,
13(3),
pp.
521-544,
May
1975.
B3.
Bertsekas, D.P.,
"On
Penalty
and
Multiplier
Methods
for
Constrained
Minimization",
in
Nonlinear Programming
2, (
eds.
,
O.L. Mangasarian,
et
al. ),
Academic Press,
pp.
165-189,
1975.
B4.
Bertsekas, D.P., "Multiplier Methods
: A
Survey", Automatica,
12,
pp.
133-145, 1976.
B5.
Betts,
J.T.,
"An
Improved Penalty Function
Method
for
Solving
Constrained
Parameter Optimization Problems", Journal
of
Optimization
Theory
and
Applications,
16(1/2),
pp.
1-24, July 1975.
B6.
Betts, J.T.,
"An
Accelerated Multiplier
Method
for
Nonlinear
Programming", Journal
of
Optimization Theory
and
Applications,.
21(2),
pp.
137-174, Feb. 1977.
B7.
Buys, J.D., Dual Algorithms
for
Constrained Optimization,
University
of
Leiden, Leiden, Holland,
Ph. D.
Thesis, 1972.
B8.
Bertsekas, D.P.,
"On the
Convergence Properties
of
Second-order
Multiplier
Methods", Journal
of
Optimization Theory
and
Applications,
25(3)",
pp.
443-449,
July 1978.
B9.
Buys,
J.I).,
and R.
Gonin,
"The
use of
Augmented
Lagrnngian
Functions
Tor
Sensitivity Analysis
in
Nonlinear Programming",
Mathematical
Programming,
12, pp.
281-284,
1977.
.
71
Dl.
Davis,
D., and
W.H. Swann, "Review
of
Constrained Optimization",
in
Optimization, (ed.
, R.
Fletcher), Academic Press,
pp.
187-202, 1969.
D2.
Dixon,
L.C.W.,
"Nonlinear Programming
: A
View
of the
State
of
the
Art",
in
Towards Global Optimization, (eds., L.C.W..Dixon
and
G.P.-
Szego), North Holland Publishing Co., 1975.
D3.
Dixon,
L.C.W.,
"Optimization
in
Action", Academic
Press,
1976.
El.
Evans, J.P.,
et al. ,
"Exact Penalty Functions
in
Nonlinear
Programming", Mathematical Programming,
4, pp.
72-97,
1973.
E2.
Everett,
H. ,
"Generalized Lagrange Multiplier Method
for
Solving
Problems
of
Optimum Allocation Resources",
Operations
Research,
11, pp.
399-417,
1963.
Fl.
Fiacco, A.V.,
and
G.P. McCormick, Nonlinear Programming
:
Sequential
Unconstrained Minimization Techniques, John Wiley, 1968.
F2.
Fletcher,R.,
and A. P.
McCann, "Acceleration Techniques
for
Nonlinear
Programming",
in
Optimization, (ed.
, R.
Fletcher),
Academic
Press,
pp.
203-214, 1969.
F3.
Fletcher,
R., "A
Class
of
Methods
for
Nonlinear Programming
with
Termination
and
Convergence Properties",
in
Integer
and
Nonlinear
Programming, (ed.,
J.
Abadie),
North
Holland
Publishing
Company,
pp.
157-175, 1970.
F4.
Fletcher,
R., and
S.A. Lill,
"A
Class
of
Methods
for
Nonlinear
Programming,
II
Computational Experience",
in
Nonlinear Programming,
(eds.,
J.B. Rosen,
et
al.), Academic Press,
pp.
67-92,
1970.
F5.
Fletcher,
R., "A
Class
of
Methods
for
Nonlinear Programming,
III
Rates
of
Convergence",
in
Numerical Methods
for
Nonlinear
Optimization, (ed., F.A. Lootsma), Academic Press,
pp.
371-
1972.
F6.
Fletcher,
R., "An
Exact Penalty Function
for
Nonlinear
Programming with
Inequalities",
Mathematical Programming,
5,
pp.
129-150, 1973.
F7.
Fletcher,
R.,
"Methods Related
to
Lagrangian Functions",
in
Numerical
Methods
for
Constrained Optimization,
(eds.,
P.E. Gill
and
W.
Murray), Academic Press,
pp.
219-240, 1974.
F8.
Fletcher,
R., "An
Ideal Penalty Function
for
Constrained
Optimization",
in
Nonlinear Programming,
2,
(eds., O.L. Mangasarian
et
nl.),
Academic
Press,
pp.
121-163, 1975.
72
Gl.
Geoffrion, A.M., "Duality
in
Nonlinear Programming
: A
Simplified
Applications-Oriented Development",
in
Perspectives
on
Optimization, (ed., A.M.
Geoffrion),
Addison Wesley Publishing
Company,
1972.
G2.
Gill, P.E.,
and W.
Murray, "Algorithms
and
Future Developments"
in
Numerical Methods
for
Constrained Optimization, (eds.
,
P.E."
Gill
and W.
Murray),
Academic Press,
pp.
249-251, 1974.
G3.
Gill, P.E.,
and W.
Murray, Numerical Methods
for
Constrained
Optimization,
Academic Press, 1974.
G4.
Gruver, W.A.,
and
N.H. Engersbach,
"A
Variable Metric Algorithm
for
Constrained Minimization Based
on an
Augmented
Lagrangian",
International
Journal
for
Numerical
Methods
in
Engineering,
10, pp.
1047-1056,
1976.
G5.
Griver, W.A.,
and C.
Shafroth, "Gradient Projection
and
Reduced
Gradient
Algorithms Based
on an
Augmented
Lagrangian",
North
Carolina
State
University
at
Raleigh, O.R.
Report
No.
108,
March
25,
1976.
HI.
Haarhoff, P.C.,
and
J.D. Buys,
"A New
Method
for the
Optimization
of
a
Nonlinear Function Subject
to
Nonlinear Constraints",
Computer
Journal,
13(2),
pp.
178-184,
May
1970.
H2.
Ilestenes,
M.R., "Multiplier
and
Gradient Methods",
in
Computing
Methods
in
Optimization Problems,
2,
(eds.,
L.A. Zadeh,
et
al.)
,
Academic Press,
New
York,
pp.
143-164,
1969.
H3.
Hestenes, M.R., "Multiplier
and
Gradient Methods", Journal
of
Optimization Theory
and
Applications, 4(5),
pp.
303-320,
1969.
H4.
Hes-tenes,
M.R.
,
Optimization
Theory
: The
Finite Dimensional
Case, John Wiley, 1975.
H5.
Householder, A.S.,
The
Theory
of
Matrices
in
Numerical Analysis,
Blaisdell,
New
York, 1964.
J.I.
John,
F. ,
"Extreme Problems with Inequalities
and
Side
Conditions",
in
Studies
in
Essays, Courant Anniversary
Volume
(eds.,
K.O.
Friedricks
et
al.), Wiley,
New
York,
pp.
187-204,
1948.
Kl.
Karush,
W.,
"Minima
of
Several Variables with Inequalities
as
side
Conditions",
Master
of
Science Dissertation,
Department
of
Mntlienuitics,
University
of
Chicago,
December
1939.
73
K2.
Kort, B.W.
, and
D.P. Bertsekns,
"A New
Penalty
Function
Method
for
Constrained Minimization", Proceedings
of
1972
Conference
on
Decision
and
Control,
New
Orleans, La.,
pp.
162-166,
Dec. 1972.
K.3.
Kort, B.W.
, and
D.P. Bertsekas, "Multiplier
Methods
for
Convex Programming", Proceedings
of
1973 IEEE Conference
on
Decision
and
Control, San.Diego, California,
pp.
428-432,
Dec. 1973.
K4.
Kort, B.W., "Rate
of
Convergence
of the
Method
of
Multipliers
with Inexact Minimization",
in
Nonlinear Programming,
2,
(eds., O.L. Mangasarian,
et
al.), Academic Press,
pp.
193-214, 1975.
K5.
Kuhn, H.W.,
and
A.W. Tucker, Nonlinear Programming,
Proceedings
of the
Second Berkeley Symposium
on
Mathematical
Statistics
and
Probability
(ed.
, J.
Neymnn),
University
.of
California Press, Berkeley,
pp.
481-492,
1951.
K6.
Kuhn, H.W.
,
"Nonlinear Programming
: A
Historical Review",
in
Nonlinear Programming, Volume
IX,
SIAM
- AMS
Proceedings,
AMS, 1976.
LI.
Lasdon,
L. ,
Optimization Theory
for
Large Systems Macmillan,
New
York, 1970.
L2.
Lill, S.A., "Generalization
of an
Exact Method
for
Solving
Equality
Constrained Problem
to
Deal with Inequality Constraints",
in
Numerical Methods
for
Nonlinear Optimization, (ed., F.A.
Lootsma), Academic
Press,
pp.
383-394,
1973.
L3..
Lootsma, F.A.
, "A
Survey
of
Methods
for
Solving Constrained
Minimization Problems
via
Unconstrained Minimization",
in
Numerical
Methods
for
Nonlinear Optimization, (ed.
,
F.A. Lootsma),
Academic
Press,
pp.
313-348,
1972.
L4.
Luenberger, D.G., Optimization
by
Vector Space Methods,
John
Wiley,
New
York,
1969.
L5.
Luenberger, D.G., Introduction
to
Linear
and
Nonlinear
Programming, Addison Wesley Company, 1973.
Ml.
Mangasarian, O.L.
,
Nonlinear Programming, McGraw Hill, 1969.
M2.
Mangasarinn, O.L.
,
"Unconstrained Lagrangians
in
Nonlinear
Programming", SIAM Journal
on
Control,
13(4),
pp.
772-791,1975.
M3.
Mangasarinn, O.L. /'Unconstrained Methods
in
Nonlinear Programming",
SIAM
- AMS
Proceedings,
9, pp.
169-184, 1976.
74
MA.
Martensson,
K., "A New
Approach
to
Constrained Function
Optimization", Journal
of
Optimization Theory
and
Applications,.
12(6),
pp.
531-554,
1973.
M5.
McCormick, G.P., "Penalty Function Versus Nonpenalty Function
•Methods
for
Constrained Nonlinear Programming Problems",
Mathematical Programming,
1, pp.
217-238,
Nov. 1971.
M6.
Miele,
A., et
al., "Use
of the
Augmented Penalty Function
in
Mathematical
Programming Problems, Part
1",
Journal
of
Optimization Theory
and
Applications, 8(2),
pp.
115-130, 1971.
M7.
Miele,
A., et
al., "Use
of the
Augmented Penalty Function
in
Mathematical
Programming Problems, Part
2",
Journal
of
Optimization
Theory
and
Applications, 8(2),
pp.
131-153, 1971.
M8.
Miele,
A., et
al.,
"A
Modification
of the
Method
of
Multipliers
for
Mathematical
Programming Problems",
in
Techniques
of
Optimization,
(ed., A.V. Balakrishnan), Academic Press,
pp.
247-260,
1972.
M9.
Miele,
A., et
al.,
"On the
Method
of
Multipliers
for
Mathematical Programming
Problems",
Journal
of
Optimization
Theory
and
Applications,
10, pp.
1-33, 1972.
M10. Murray,
W., "An
Algorithm
for
Constrained Minimization",
in
Optimization (ed.
, R.
Fletcher), Academic Press, 1969.
Mil.
Murray,
W.,
Numerical Methods
for
Unconstrained Optimization,
Academic
Press, 1972.
M12. Murray,
W.,
"Methods
for
Constrained Optimization",
in
Optimization.in Action, (ed.,
L.C.W.
Dixon),
Academic Press,
pp.
217-354,
352-363,
1976.
M.13.
McCormick, G.P., "Optimality Criteria
in
Nonlinear Programming",
.
in
Nonlinear Programming, Volume
IX,
SIAM -.AMS Proceedings,
AMS, 1976.
01.
Osborne, M.R.,
and
D.M. Ryan,
"A
Hybrid Algorithm
for
Nonlinear Programming",
in
Numerical Methods
for
Nonlinear
Optimization, (ed., F.A.
Lootsma),
Chapter
28,
Academic
Press,
1972.
PI.
Pietrzykowski,
T., "An
Exact Potential
Method
for
Constrained
Maxima",
SIAM Journal
on
Numerical Analysis,
6, pp.
299-304,
1969.
P2.
Pierre, D.A.,
and
M.J.
Lowe,
Mathematical Programming
via
Augmented
l-.-igranginns
: An
Introduction
with Computer Programs
Addison-Wesley Publishing Company, 1975.
75
P3.
Powell,
M.J.D.,
"A
Method
for
Nonlinear Constraints
in
Minimization Problems",
in
Optimization, (ed.
, R.
Fletcher),
Academic
Press,
pp.
283-293,
1969.
P4.
Powell, M.J.D., "Problems
Related
to
Unconstrained Optimization",
in
Numerical Methods
for
Unconstrained Optimization,
(ed.',
W.
Murray),
Academic Press, 1972.
P5.
Powell,
M.J.D.,"A
View
of
Unconstrained Optimization",
in
Optimization
in
Action
,
(ed.,
L.C.W.
Dixon),
Academic
Press,
1976.
P6.
Powell, M.J.D., "Algorithms
for
Nonlinear Constraints
that
use
Lagrangian Functions", Mathematical Programming,
14,
pp.
224-248,
1978.
. . .
Rl.
Rockafellar, R.T., Convex Analysis, Princeton University
Press, 1970.
R2.
Rockafellar, R.T., "New Applications
of
Duality
in
Convex
Programming", Proceedings
of
Fourth Conference
on
Probability,
Brasov, Romania, 1971.
R3.
Rockafellar, R.T., "Penalty Functions
and
Augmented
Lagrangians
in
Nonlinear Programming",
in
Lecture Notes
in
Computer
Science
: 5th
Conference
on
Optimization
Techniques, Part
I,
(eds.,
R.
Conti
and A.
Ruberti),
Springler-Verlag,
pp.
418-425,
1973.
R4.
Rockafellar R.T., "The Multiplier Method
of
Hestenes
and
Powell
Applied
to
Convex Programming", Journal
of
Optimization Theory
and
Applications,
12(6),
pp.
555-562,
1973.
R5.
Rockafellar, R.T.,
"A
Dual Approach
to
Solving Nonlinear
Programming Problems
by
Unconstrained Optimization",
Mathematical Programming,
5, pp.
354-373,
1973.
R6.
Rockafellar, R.T., "Augmented Lagrange Multiplier Functions
and
Duality
in
Nonconvex Programming", SIAM Journal
on
Control,
12(2),
pp.
268-285,
1974.
R7.
Rockafellar, R.T., "Augmented Lagrangians
and
Applications
of
the
Proximal Point Algorithm
in
Convex Programming",
Mathemntica
of
Operations Research, 1(2),
pp.
97-116, 1976.
R8.
Kootle,
.I.D.
,
"Generalized Lagrangian Functions
and
Mathematical
Programming",
in
Optimization, (ed.,
R.
Fletcher),
pp.
327-338,
1969.
R9.
Hupp,
R.I).,
"On the
Combination
of the
Multiplier
Method
of
Hestenes
and
Powell with Newton Methods", Journal
of
Optimization Theory
and
Applications,
15, pp.
167-187,
1975.
76
RIO.
Rupp,
R.D., "Convergence
and
Duality
for the
Multiplier
and
Penalty Methods", Journal
of
Optimization Theory
and
Applications,
16(1/2),
pp.
99-118,
1975.
Rll. Ryan, D.M., "Penalty
and
Barrier Functions",
in
Numerical
Methods
for
Constrained Optimization,
(eds.,
P.E. Gill
and
W.
Murray),
Academic
Press,
pp.
175-190, 1974.
R12. Rockafellar, R.T., "Lagrange Multipliers
in
Nonlinear
Programming",
in
Nonlinear Programming, Volume
IX,
SIAM
- AMS
Proceedings, AMS, 1976.
R13. Rockafellar, R.T., "Solving
a
Nonlinear Problem
by Way of a
Dual Problem", Symposia Mathematica,
V , pp.
135-160
51.
Schuldt,
S.B.,
"A
Method
of
Multipliers
for
Mathematical
Programming
Problems with Equality
and
Inequality Constraints",
Journal
of
Optimization Theory
and
Applications, 17(1/2),
pp.
155-161, 1975.
52.
Sciutti,
D., "On a
Characterization
o£
Convergence
for the
Hestenes
Method
of
Multipliers", Journal
of
Optimization
.Theory
and
Applications,
22(2),
pp.
227-236,
June 1977.
53.
Shis-Ping Han, "Dual Variable Matrix Algorithms
for
Constrained
Optimization", SIAM
J.
Control
and
Optimization,
15(4),
pp.
546-565,
1977.
Tl.
Tapia, R.A., "Diagonalized Multiplier Methods
and
Ouasi-Newton
Methods
for
Constrained Optimization", Journal
of
Optimization
Theory
and
Applications,
22(2),
pp.
135-190, June 1977.
T2.
Tripathi, S.S.,
and
K.S. Narendrea, "Constrained Optimization
Problems Using Multiplier Methods", Journal
of
Optimization
Theory
and
Applications, 9(1),
pp.
59-70,
1972.
Zl.
Znagwill, W.I.,
Nonlinear
Programming
: A
Unified Approach,
Prentice-Hall, Englewood Cliffs, 1969.
Z2.
Zoutendijk,
G.,
Mathematical Programming Methods, North
Holland Publishing Company, 1976.