arXiv:1412.0791v1 [math.OC] 2 Dec 2014
1
A Simple Convergence Time Analysis of
Drift-Plus-Penalty for Stochastic Optimization and
Convex Programs
Michael J. Neely
University of Southern California
AbstractThis paper considers the problem of minimizing
the time average of a stochastic process subject to time average
constraints on other processes. A canonical example is minimizing
average power in a data network subject to multi-user throughput
constraints. Another example is a (static) convex program. Under
a Slater condition, the drift-plus-penalty algorithm is known to
provide an O(ǫ) approximation to optimality with a convergence
time of O(1
2
). This paper proves the same result with a simpler
technique and in a more general context that does not require the
Slater condition. This paper also emphasizes application to basic
convex programs, linear programs, and distributed optimization
problems.
I. INTRODUCTION
Fix K as a positive integer. Consider a discrete time system
that operates over time slots t {0, 1, 2, . . .}. Every slot t,
the controller observes a random event ω(t). Assume that
events ω(t) are elements in an abstract set , and that they
are independent and identically distributed (i.i.d.) over slots.
The set can have arbitrary (possibly uncountably infinite)
cardinality. Every slot t, a system controller observes the
current ω(t) and then chooses a decision vector y(t) =
(y
0
(t), y
1
(t), . . . , y
K
(t)) within an option set Y(ω(t))
R
K+1
that possibly depends on ω(t). That is, Y(ω(t)) is the
set of vector options available under the random event ω(t).
The sets Y(ω(t)) are arbitrary and are only assumed to have
a mild boundedness property (specified in Section II).
The goal is to minimize the expected time average of the
resulting y
0
(t) process subject to time average constraints
on the y
k
(t) processes for k {1, . . . , K}. Specifically, for
integers t > 0, and for each k {0, 1, . . . , K}, define:
y
k
(t)
=
1
t
t1
X
τ =0
E [y
k
(τ)]
Let c
1
, . . . , c
K
be a given collection of real numbers. The goal
is to solve the following stochastic optimization problem:
Minimize: lim sup
t→∞
y
0
(t) (1)
Subject to: lim sup
t→∞
y
k
(t) c
k
(2)
y(t) Y(ω(t)) t {0, 1, 2, . . .} (3)
Assume the problem is feasible, so that it is possible to
satisfy the constraints (2)-(3). Define y
opt
0
as the infimum
value of the objective (1) over all algorithms that satisfy the
The author is with the Electrical Engineering department at the University
of Southern California, Los Angeles, CA.
This work is supported in part by the NSF Career grant CCF-0747525.
constraints (2)-(3). The drift-plus-penalty algorithm from [1]
is known to satisfy constraints (2)-(3) and to ensure:
lim sup
t→∞
y
0
(t) y
opt
0
+ ǫ (4)
where ǫ > 0 is a parameter used in the algorithm. This is
done by defining virtual queues Q
k
(t) for each constraint k
{1, . . . , K} in (2):
Q
k
(t + 1) = max[Q
k
(t) + y
k
(t) c
k
, 0] (5)
where y
k
(t) acts as a virtual arrival process and c
k
acts as a
constant virtual service rate.
1
The intuition behind (5) is that if
Q
k
(t) is stable, the time average arrival rate must be less than
or equal to the time average service rate, which implies the
desired time average constraint (2). Under an additional Slater
condition, it is also known that the drift-plus-penalty algorithm
provides an O(1) bound on the time average expected size
of all virtual queues:
lim sup
t→∞
Q
k
(t) O(1) k {1, . . . , K} (6)
where
Q
k
(t) is defined for t > 0 by:
Q
k
(t)
=
1
t
t1
X
τ =0
E [Q
k
(τ)]
More recently, it was shown that the convergence time required
for the desired time averages to “kick in” is O(1
2
), provided
that the Slater condition still holds (see Appendix C in
[2]). Specifically, an algorithm is said to produce an O(ǫ)
approximation with convergence time T if for all t T one
has:
y
0
(t) y
opt
0
+ O(ǫ) (7)
y
k
(t) c
k
+ O(ǫ) k {1, . . . , K} (8)
A. Contributions of the current paper
The current paper focuses on the issue of convergence time.
The main result is a proof that convergence time is O(1
2
) for
general problems that have an associated Lagrange multiplier.
It can be shown that a Lagrange multiplier exists whenever the
Slater condition exists, but not vice-versa. Hence the proof in
the current paper is more general than the prior result [2] that
uses a Slater condition. To appreciate this distinction, note
that a Slater condition is equivalent to assuming there exists a
1
In an actual queueing system, arrivals and service rates are always non-
negative. However, in this virtual queue, the y
k
(t) and c
k
values can possibly
be negative.
2
value δ > 0 and a decision policy under which all constraints
can be satisfied with at least δ slackness:
lim sup
t→∞
y
k
(t) c
k
δ k {1, . . . , K}
This Slater condition is impossible in many problems of
interest. For example, a problem with a time average equality
constraint lim
t→∞
x(t) = c can be treated using two inequal-
ity constraints of the type (2):
lim sup
t→∞
x(t) c
lim sup
t→∞
[
x(t)] c
However, it is impossible for a Slater condition to exist
with the above two inequality constraints. Indeed, that would
require:
lim sup
t→∞
x(t) c δ (9)
lim sup
t→∞
[
x(t)] c δ (10)
Yet, (10) implies:
c + δ lim sup
t→∞
[
x(t)]
= lim inf
t→∞
x(t)
lim sup
t→∞
x(t)
c δ
where the final inequality follows from (9). This means that
c + δ c δ, a contradiction when δ > 0.
Another contribution of the current paper is the application
of this stochastic result to standard (static) convex programs
and linear programs. Of course, static problems are a special
case of stochastic problems. Nevertheless, this paper clearly
illustrates that point, and shows that the drift-plus-penalty
algorithm can be applied to convex programs and linear
programs to produce an ǫ-approximation with convergence
time O(1
2
). This was previously shown in [3] under a Slater
condition. A collection of simplified example problems of
distributed optimization, similar to those presented in [3], are
given to demonstrate the method.
B. Applications
The problem (1)-(3) is useful in a variety of settings,
including problems of stochastic network utility maximization
[4][5][6] and problems of minimizing average power in a
network subject to queue stability [7]. Indeed, the drift-plus-
penalty technique was developed in [4][5][6][7] for use in
these particular applications.
As an example, consider a multi-user wireless downlink
problem where random data arrivals a
k
(t) arrive to the
base station every slot t, intended for different users k
{1, . . . , K}. Suppose the network controller can observe the
current channel state vector S(t) = (S
1
(t), . . . , S
K
(t)), which
specifies current conditions on the channel for each user.
The controller also observes the vector of new data arrivals
a(t) = (a
1
(t), . . . , a
K
(t)). Let ω(t) = (S(t); a(t)) be a
concatenated vector with this channel and arrival information,
and let be the set of all possible ω(t) vectors. Let p(t) =
(p
1
(t), . . . , p
K
(t)) be the power used for transmission, chosen
within some abstract set P every slot t. Let µ
k
(p(t), S(t)) be a
function that specifies the transmission rate on channel k under
the power vector p(t) and the channel state vector S(t) [7].
Define r
k
(t) = µ
k
(p(t), S(t)). The goal is to minimize total
average power expenditure subject to ensuring the average
transmission rate for each channel is greater than or equal
to the arrival rate:
Minimize: lim sup
t→∞
P
K
k=1
p
k
(t)
Subject to: lim sup
t→∞
[a
k
(t) r
k
(t)] 0
p(t) P t {0, 1, 2, . . .}
This problem has the form (1)-(3) by defining:
y
0
(t) =
K
X
k=1
p
k
(t)
y
k
(t) = a
k
(t) µ
k
(p(t), S(t)) k {1, . . . , K}(11)
c
k
= 0 k {1, . . . , K}
and by defining Y(ω) for each ω = (S, a) as the set of
all (y
0
, y
1
, . . . , y
K
) R
K+1
such that there is a vector p P
that satisfies:
y
0
=
K
X
k=1
p
k
y
k
= a
k
µ
k
(p, S) k {1, . . . , K}
In this example, the virtual queue equations (5) reduce to the
following for all k {1, . . . , K}:
Q
k
(t + 1) = max[Q
k
(t) + a
k
(t) µ
k
(p(t), S(t)), 0]
This “virtual queue” corresponds to an actual network queue,
where a
k
(t) is the actual arriving data on slot t, and
µ
k
(p(t), S(t)) is the actual transmission rate offered on slot t.
C. Prior work
The drift method for queue stability was developed in [8][9],
which resulted in max-weight and backpressure algorithms for
data networks. The drift-plus-penalty method was developed
for network utility maximization problems in [4][5] and energy
optimization problems in [7]. Generalized tutorial results are
in [1][6]. The works [1][6] prove that, under a Slater condition,
the drift-plus-penalty algorithm gives an O(ǫ) approximation
to optimality with an average queue size tradeoff of O(1/ǫ).
Recent work in [2] shows that convergence time is O(1
2
)
under a Slater condition. Application to convex programs are
given in [3], again under a Slater condition.
Related work in [10] derives a similar algorithm for utility
maximization in a wireless downlink via a different analysis
that uses Lagrange multipliers. Lagrange multiplier analysis
was used in [11] to improve queue bounds to O(log(1/ǫ))
in certain piecewise linear cases. Work in [12] demonstrates
near-optimal convergence time of O(lo g(1)) for one-link
problems with piecewise linearity. Improved convergence time
bounds of O(1/ǫ) are recently shown in [13] for deterministic
problems with piecewise linearity assumptions. Work in [14]
3
considers the special case of a deterministic convex program
with linear constraints, and uses a different method for obtain-
ing O(1) convergence time. The work [14] also considers
distributed implementation over a graph. While the works
[12][13][14] demonstrate convergence time that is superior to
the O(1
2
) result of the current paper, those results hold only
for special case systems.
The drift-plus-penalty algorithm is closely related to the
dual subgradient algorithm for convex programs [15]. Related
work in [16] uses a dual subgradient approach for non-
stochastic problems of network scheduling for utility maxi-
mization. Network scheduling with stochastic approximation is
considered in [17]. A different primal-dual approach is consid-
ered for network utility maximization in [18][19][20][21][22].
II. ALGORITHM AND BASIC ANALYSIS
This section presents the basic results needed from [1].
A. Boundedness assumption
Assume there are non-negative constants h
0
, h
1
, . . . , h
K
such that under any policy for making decisions and for any
given slot t, the first moment of y
0
(t) and the second moments
of y
k
(t) for k {1, . . . , K} satisfy:
E [|y
0
(t)|] h
0
(12)
E
y
k
(t)
2
h
k
k {1, . . . , K} (13)
That is, the first moment of y
0
(t) is uniformly bounded for all
t, and the second moments of y
k
(t) for k {1, . . . , K} are
also uniformly bounded.
These boundedness conditions (12)-(13) are satisfied, for
example, whenever there is a bounded set Y R
K+1
such
that Y(ω) Y for all ω . It can also hold when y
k
(t)
is not necessarily bounded. This is useful in the wireless
downlink example with y
k
(t) = a
k
(t) µ
k
(p(t), S(t)), as
defined by (11). Suppose that µ
k
(·) always takes values in the
bounded interval [0, r
max
] for some real number r
max
> 0.
In this case, y
k
(t) satisfies (13) whenever E
a
k
(t)
2
is finite.
However, particular values of y
k
(t) can be arbitrarily large
if the arrivals a
k
(t) can be arbitrarily large. For example, if
a
k
(t) is a Poisson process, it can take arbitrarily large values
but has a finite second moment.
B. Compactness assumption
Assume that for all ω , the set Y(ω) is a compact
subset of R
K+1
(recall that a subset is compact if it is closed
and bounded). This compactness assumption is not crucial
to the analysis, but it simplifies exposition.
2
Indeed, such
compactness ensures that, given any ω , there is always
an optimal solution to problems of the following type:
Minimize:
P
K
k=0
w
k
y
k
Subject to: (y
0
, . . . , y
K
) Y(ω)
2
If Y (ω) is not compact, one can still obtain optimality results by assuming
the drift-plus-penalty algorithm comes within an additive constant C of
minimizing the desired expression for all slots t. This is called a C-additive
approximation [1].
where w
0
, . . . , w
K
are a given set of real numbers. The drift-
plus-penalty algorithm will be shown to make decisions every
slot t according to such a minimization.
The sets Y(ω(t)) are not required to have any additional
structure beyond the boundedness and compactness assump-
tions specified in Sections II-A and II-B. In particular, the sets
Y(ω(t)) might be finite, infinite, convex, or non-convex.
C. The set R of all average vectors
Recall that random events ω(t) are i.i.d. over slots. The
distribution for ω(t) is possibly unknown. Imagine observing
ω(t) and randomly choosing vector y(t) in the set Y(ω(t)) ac-
cording to a distribution that depends on ω(t). The expectation
vector E [y(t)] is with respect to the randomness of ω(t) and
the conditional randomness of y(t) given ω(t). Define R as the
set of all expectation vectors E [y(t)] = E [(y
0
(t), . . . , y
K
(t))]
that can be achieved, considering all ω and all possible
conditional distributions over the set Y(ω) given that ω(t) =
ω. A probabilistic mixture of two randomized choices is again
a randomized choice, and so the set R is a convex subset of
R
K+1
. The boundedness assumptions (12)-(13) further imply
that R is bounded.
Every slot τ {0, 1, 2, . . .}, a general algorithm chooses
y(τ) as a (possibly random) vector in the set Y(ω(τ)) (with
distribution that possibly depends on the observed history),
and so E [y(τ )] R for all slots τ. Fix t > 0. It follows that
y(t) =
1
t
P
t1
τ =0
E [y(τ)] is a convex combination of vectors
in R, and so it is also in R (since R is a convex set). That is:
y(t) R t {1, 2 , 3, . . .} (14)
D. Optimality
Define
R as the closure of R. Since R is a bounded and
convex subset of R
K+1
, the set
R is a compact and convex
subset of R
K+1
. Consider the problem:
Minimize: y
0
(15)
Subject to: y
k
c
k
k {1, . . . , K} (16)
(y
0
, y
1
, . . . , y
K
)
R (17)
In [1] it is shown that the above problem (15)-(17) is feasible
if and only if the original stochastic optimization problem (1)-
(3) is feasible. Further, assuming feasibility, the problems (15)-
(17) and (1)-(3) have the same optimal objective value y
opt
0
.
Throughout this paper it is assumed that problem (1)-
(3) is feasible, and hence problem (15)-(17) is feasible. Let
(y
opt
0
, y
opt
1
, . . . , y
opt
K
) be an optimal solution to (15)-(17). Such
an optimal solution exists because the problem (15)-(17) is
feasible and the set
R is compact. This optimal solution must
satisfy the constraints of problem (15)-(17), and so:
y
opt
k
c
k
k {1, . . . , K} (18)
E. Lyapunov optimization
Define Q(t) = (Q
1
(t), . . . , Q
K
(t)) as the vector of queue
backlogs. The squared norm of the backlog vector is:
||Q(t)||
2
=
K
X
k=1
Q
k
(t)
2
4
Define L(t) =
1
2
||Q(t)||
2
, called a Lyapunov function. The
drift-plus-penalty algorithm observes the current vector Q(t)
and random event ω(t) every slot t, and then makes a decision
y(t) Y(ω(t)) to greedily minimize a bound on the drift-plus-
penalty expression:
∆(t) + V y
0
(t)
where V is a positive weight that affects a performance
tradeoff. Setting V = 1 results in an O(ǫ) approximation
to optimality [1]. This fact is reviewed in the remainder of
this section, as several of the key results are needed in the
new convergence analysis of Section III.
To bound ∆(t), fix k {1, . . . , K}, square the queue
equation (5), and use the fact that max[z, 0]
2
z
2
to obtain:
Q
k
(t + 1)
2
Q
k
(t)
2
+ (y
k
(t) c
k
)
2
+ 2Q
k
(t)(y
k
(t) c
k
)
Summing the above over k {1, . . . , K} and dividing by 2
gives:
∆(t) B(t) +
K
X
k=1
Q
k
(t)(y
k
(t) c
k
)
where B(t) is defined:
B(t) =
1
2
K
X
k=1
(y
k
(t) c
k
)
2
(19)
Adding V y
0
(t) to both sides gives the following bound:
∆(t)+V y
0
(t) B(t)+V y
0
(t)+
K
X
k=1
Q
k
(t)(y
k
(t)c
k
) (20)
Every slot t, the drift-plus-penalty algorithm observes
Q(t), ω(t) and chooses (y
0
(t), y
1
(t), . . . , y
K
(t)) in the set
Y(ω(t)) to minimize the last two terms on the right-hand-side
of (20).
F. Drift-plus-penalty algorithm
Initialize Q
k
(0) = 0 for all k {1, . . . , K}. Perform the
following steps every slot t {0, 1, 2, . . .}:
Observe Q(t) = (Q
1
(t), . . . , Q
K
(t)) and ω(t), and
choose (y
0
(t), . . . , y
K
(t)) Y(ω(t)) to minimize:
V y
0
(t) +
K
X
k=1
Q
k
(t)y
k
(t) (21)
Update queues Q
k
(t) for k {1, . . . , K} via:
Q
k
(t + 1) = max[Q
k
(t) + y
k
(t) c
k
, 0] (22)
A key feature of this algorithm is that it reacts to the
observed state ω(t), and does not require knowledge of the
probability distribution associated with ω(t). Notice that once
the queue vector Q(t) is observed on slot t, its components
act as known weights in the minimization of (21). Hence,
this minimization indeed has the form specified in Section
II-B. Specifically, every slot a vector y(t) Y(ω(t)) is
chosen to minimize a linear function of the components
y
0
(t), y
1
(t), . . . , y
K
(t). Complexity of this decision depends
on the structure of the sets Y(ω(t)). If these sets consist of
a finite and small number of points, the decision amounts
to testing each option and choosing the one with the least
weighted sum. The decision can be complex if the sets Y(ω(t))
consist of a finite but large number of points, or if these sets
are infinite but non-convex.
For simplicity, it is assumed throughout that y(t) is chosen
to exactly minimize the expression (21) (this is possible via the
compactness assumption of Section II-B). Similar analytical
results can be obtained under the weaker assumption that y(t)
comes within an additive constant of minimizing (21), called
a C-additive approximation (see [1]).
G. Constraint satisfaction via queue stability
The queue backlog gives a simple bound on constraint
violation. Indeed, for all slots τ {0, 1, 2, . . .} one has from
(22) and the fact that max[z, 0] z:
Q
k
(τ + 1) Q
k
(τ) + y
k
(τ) c
k
Thus:
Q
k
(τ + 1) Q
k
(τ) y
k
(τ) c
k
Summing over τ {0, 1, . . . , t 1} for some integer t > 0
gives:
Q
k
(t) Q
k
(0)
t1
X
τ =0
y
k
(τ) tc
k
Dividing by t and using the fact that Q
k
(0) = 0 gives:
Q
k
(t)
t
1
t
t1
X
τ =0
y
k
(τ) c
k
Taking expectations gives:
E [Q
k
(t)]
t
y
k
(t) c
k
Rearranging terms gives the desired constraint violation
bound:
y
k
(t) c
k
+
E [Q
k
(t)]
t
(23)
It follows that the desired constraints (2) hold if all queues
k {1, . . . , K} satisfy:
lim
t→∞
E [Q
k
(t)]
t
= 0 (24)
A queue that satisfies (24) is said to be mean rate stable [1].
H. Objective function analysis
Fix τ {0, 1, 2, . . .}. Because the drift-plus-penalty deci-
sion minimizes the last two terms on the right-hand-side of
the drift-plus-penalty bound (20), one has:
∆(τ) + V y
0
(τ) B(τ) + V y
0
(τ)
+
K
X
k=1
Q
k
(τ)(y
k
(τ) c
k
) (25)
for all vectors (y
0
(τ), . . . , y
K
(τ)) Y(ω(τ)), including
vectors that are chosen randomly over Y(ω(τ)). Fix a vector
(y
0
, . . . , y
K
) R. Let y
(τ) = (y
0
(τ), . . . , y
K
(τ)) be chosen
as a random function of ω(t) according to a conditional
5
distribution that yields expectation E [y
(τ)] = (y
0
, . . . , y
K
),
but with conditional decisions that are independent of history.
Since ω(τ) is itself independent of history, it follows that for
all k {1, . . . , K}, y
k
(τ) is independent of Q
k
(τ), and:
E [y
k
(τ)Q
k
(τ)] = E [y
k
(τ)] E [Q
k
(τ)] = y
k
E [Q
k
(τ)] (26)
Taking expectations of (25) (assuming y
(τ) is this random-
ized policy) and substituting (26) gives:
E [∆(τ)] + V E [y
0
(τ)] E [B(τ)] + V y
0
+
K
X
k=1
E [Q
k
(τ)] (y
k
c
k
) (27)
Let B 0 be a finite constant that satisfies the following for
all slots τ:
E [B(τ)] B (28)
Such a constant B exists by the second moment boundedness
assumption (13). Substituting B into (27) gives:
E [∆(τ)] + V E [y
0
(τ)] B + V y
0
+
K
X
k=1
E [Q
k
(τ)] (y
k
c
k
)
The above inequality holds for all (y
0
, . . . , y
K
) R. Take a
limit as (y
0
, . . . , y
K
) approaches the point (y
opt
0
, . . . , y
opt
K
)
R to obtain:
E [∆(τ)]+V E [y
0
(τ)] B+V y
opt
0
+
K
X
k=1
E [Q
k
(τ)] (y
opt
k
c
k
)
Substituting (18) into the right-hand-side of the above inequal-
ity gives:
E [∆(τ)] + V E [y
0
(τ)] B + V y
opt
0
(29)
The inequality (29) holds for all slots τ {0, 1 , 2, . . .}. Fix
t > 0. Summing (29) over τ {0, 1, . . . , t 1} gives:
E [L(t)] E [L (0)] + V
t1
X
τ =0
E [y
0
(τ)] (B + V y
opt
0
)t
Dividing by t and using the fact that E [L(0)] = 0 gives:
E [L(t)]
t
+ V
y
0
(t) B + V y
opt
0
(30)
Dividing by V and using E [L(t)] 0 gives:
y
0
(t) y
opt
0
+ B/V (31)
That is, (31) ensures that for all slots t > 0, the time average
expectation
y
0
(t) is at most O(1/V ) larger than the optimal
objective function value y
opt
0
. Fix ǫ > 0. Using the parameter
V = 1 gives an O(ǫ) approximation to optimal utility.
It remains to show that the desired constraints are also
satisfied. If a Slater assumption holds, it can be shown that
queue averages are O(1). The Slater assumption also ensures
convergence time is O(1
2
) [2]. The next subsection presents
a new analysis to develop O(1 /ǫ
2
) convergence time without
the Slater assumption.
III. CONVERGENCE TIME ANALYSIS
A. Lagrange multipliers
Assume the problem (15)-(17) is feasible. Since this prob-
lem is convex, a hyperplane in R
K+1
exists that passes
through the point (y
opt
0
, c
1
, . . . , c
K
) and that contains the set
R on one side [15]. Specifically, there are non-negative values
γ
0
, γ
1
, . . . , γ
K
such that:
γ
0
y
0
+
K
X
k=1
γ
k
y
k
γ
0
y
opt
0
+
K
X
k=1
γ
k
c
k
(y
0
, . . . , y
K
)
R
The hyperplane is said to be non-vertical if γ
0
6= 0 [15].
If the hyperplane is non-vertical, one can divide the above
inequality by γ
0
, define µ
k
= γ
k
0
for all k {1, . . . , K},
and conclude:
y
0
+
K
X
k=1
µ
k
y
k
y
opt
0
+
K
X
k=1
µ
k
c
k
(y
0
, . . . , y
K
)
R (32)
The non-negative vector (µ
1
, . . . , µ
K
) in (32) is called a
Lagrange multiplier vector. A Lagrange multiplier vector that
satisfies (32) exists whenever the separating hyperplane is non-
vertical. It can be shown that the separating hyperplane is non-
vertical whenever a Slater condition holds. Such a non-vertical
hyperplane also exists in more general situations without a
Slater condition (see “regularity conditions” specified in [15]).
Thus, the assumption that a Lagrange multiplier vector exists
is a mild assumption.
B. Bounding the violations
Assume a (non-negative) Lagrange multiplier vector
(µ
1
, . . . , µ
K
) exists so that (32) holds. Fix t > 0. Recall that
(14) ensures
y(t) = (y
0
(t), . . . , y
K
(t)) R. Since R R,
by (32) one has:
y
0
(t) +
K
X
k=1
µ
k
y
k
(t) y
opt
0
+
K
X
k=1
µ
k
c
k
Rearranging the above gives:
y
opt
0
y
0
(t)
K
X
k=1
µ
k
(y
k
(t) c
k
)
K
X
k=1
µ
k
E [Q
k
(t)]
t
(33)
where the final inequality holds by (23).
On the other hand, one has by (30):
E [L(t)]
t
B + V (y
opt
0
y
0
(t))
B + V
K
X
k=1
µ
k
E [Q
k
(t)]
t
(34)
B +
V
t
||µ|| · ||E [Q(t)]|| (35)
where (34) is obtained by substituting (33), and (35) is due
to the fact that the dot product of two vectors is less than or
6
equal to the product of their norms. Substituting the definition
L(t) =
1
2
||Q(t)||
2
in the left-hand-side of (35) gives:
1
2t
E
||Q(t)||
2
B +
V
t
||µ|| · ||E [Q(t)]||
Since E
||Q(t)||
2
||E [Q(t)]||
2
, one has:
1
2t
||E [Q(t)]||
2
B +
V
t
||µ|| · ||E [Q(t)]||
Therefore:
||E [Q(t)]||
2
2V ||µ|| · ||E [Q(t)]|| 2Bt 0
Define x = ||E [Q(t)]||, b = 2V ||µ||, c = 2Bt. Then:
x
2
+ bx + c 0 (36)
The largest value of x that satisfies (36) is equal to the largest
root of the quadratic equation x
2
+ bx + c = 0, and so:
x
b +
b
2
4c
2
= V ||µ|| +
p
V
2
||µ||
2
+ 2Bt
Therefore, for all t > 0 one has:
||E [Q(t)]|| V ||µ|| +
p
V
2
||µ||
2
+ 2Bt
It follows from (23) that for all k {1, . . . , K} the constraint
violations satisfy:
y
k
(t) c
k
+
E [Q
k
(t)]
t
c
k
+
||E [Q(t)]||
t
c
k
+
V ||µ|| +
p
V
2
||µ||
2
+ 2Bt
t
(37)
This leads to the following theorem.
Theorem 1: Fix ǫ > 0 and define V = 1. If the problem
(1)-(3) is feasible and the Lagrange multiplier assumption (32)
holds, then for all t 1
2
one has:
y
0
(t) y
opt
0
+ O(ǫ) (38)
y
k
(t) c
k
+ O(ǫ) k {1, . . . , K} (39)
and so the drift-plus-penalty algorithm with V = 1 provides
an O(ǫ ) approximation with convergence time O(1
2
).
Proof: Inequality (38) holds from (31) and the fact that
B/V = Bǫ = O(ǫ). Inequality (39) holds from (37) and the
fact that:
V ||µ|| +
p
V
2
||µ||
2
+ 2Bt
t
=
||µ||
ǫt
+
r
||µ||
2
ǫ
2
t
2
+
2B
t
||µ||ǫ +
p
||µ||
2
ǫ
2
+ 2Bǫ
2
= ||µ||ǫ + ǫ
p
||µ||
2
+ 2B
= O(ǫ)
IV. EQUALITY CONSTRAINTS
A similar analysis can be used to treat problems with
explicit equality constraints. Specifically, consider choosing a
vector h(t) = (y
0
(t), y
1
(t), . . . , y
K
(t), w
1
(t), . . . , w
M
(t)) in
a set H(ω(t)) to solve:
Minimize: lim sup
t→∞
y
0
(t) (40)
Subject to: lim sup
t→∞
y
k
(t) c
k
k {1, . . . , K} (41)
lim
t→∞
w
i
(t) = d
i
i {1, . . . , M} (42)
h(t) H(ω(t)) t {0, 1, 2, . . .} (43)
where c
1
, . . . , c
K
and d
1
, . . . , d
M
are given real numbers. One
approach is to change each inequality constraint (42) into two
inequality constraints:
lim sup
t→∞
w
i
(t) d
i
lim sup
t→∞
[
w
i
(t)] d
i
This would involve two virtual queues for each i
{1, . . . , M }. A notationally easier method is to simply change
the structure of the virtual queue for equality constraints
i {1, . . . , M } as follows [1]:
Z
i
(t + 1) = Z
i
(t) + w
i
(t) d
i
i {1, . . . , M} (44)
The inequality constraints (41) have the same virtual queues
from before:
Q
k
(t + 1) = max[Q
k
(t) + y
k
(t) c
k
, 0] (45)
The resulting algorithm is as follows: Initialize Z
i
(0) =
Q
k
(0) = 0 for all i {1, . . . , M } and k {1, . . . , K}. Every
slot t {0 , 1, 2, . . .} do:
Observe Q
1
(t), . . . , Q
K
(t) and Z
1
(t), . . . , Z
M
(t) and
ω(t) and choose h (t) H(ω(t)) to minimize:
V y
0
(t) +
K
X
k=1
Q
k
(t)y
k
(t) +
M
X
i=1
Z
i
(t)w
i
(t)
Update Q
k
(t) for k {1, . . . , K} and Z
i
(t) for i
{1, . . . , M } via (45) and (44).
The analysis of this scenario with equality constraints is
similar and is omitted for brevity (see [1]).
V. CONVEX PROGRAMS
Fix N as a positive integer. Consider the problem of finding
a vector x = (x
1
, . . . , x
N
) R
N
to solve:
Minimize: f(x) (46)
Subject to: g
k
(x) c
k
k {1, . . . , K} (47)
x X (48)
where X is a convex and compact subset of R
N
, functions
f(x), g
1
(x), . . . , g
K
(x) are continuous and convex functions
over x X, and c
1
, . . . , c
K
are given real numbers. The
problem (46)-(48) is a convex program. Assume the problem
is feasible, so that there exists a vector that satisfies the
constraints (47)-(48). The compactness and continuity assump-
tions ensure there is an optimal solution x
X that solves the
7
problem (46)-(48). Define f
= f (x
) as the optimal objective
function value.
This convex program is equivalent to a problem of the
form (1)-(3), and hence can be solved by the drift-plus-
penalty method [3]. To see this, define Y as the set of all
(y
0
, y
1
, . . . , y
K
) vectors in R
K+1
such that there exists a
vector x X that satisfies:
y
0
= f(x)
y
k
= g
k
(x) k {1, . . . , K}
Consider a system defined over slots t {0, 1, 2, . . .}. Every
slot t, a controller chooses a vector x(t) = (x
1
(t), . . . , x
N
(t))
in the (deterministic) set X. Define:
y
0
(t) = f(x(t))
y
k
(t) = g
k
(x(t)) k {1, . . . , K}
The goal is to choose x(t) over slots to solve:
Minimize: lim sup
t→∞
y
0
(t) (49)
Subject to: lim sup
t→∞
y
k
(t) c
k
k {1, . . . , K} (50)
x(t) X t {0, 1, 2, . . .} (51)
Lemma 1: If {x(t)}
t=0
is a random or deterministic process
that satisfies x(t) X for all t, then:
a) For all t > 0, one has
1
t
P
t1
τ =0
x(τ) X, and:
f
1
t
t1
X
τ =0
x(τ)
!
1
t
t1
X
τ =0
y
0
(τ)
g
k
1
t
t1
X
τ =0
x(τ)
!
1
t
t1
X
τ =0
y
k
(τ) k {1, . . . , K}
b) For all t > 0, x(t) X, and:
f(
x(t)) y
0
(t)
g
k
(
x(t)) y
k
(t) k {1, . . . , K}
Proof: Part (a) follows immediately from convexity of
X and Jensens inequality on the convex functions f(x)
and g
k
(x). Part (b) follows by taking expectations of the
inequalities in part (a) and again using Jensen’s inequality.
Formally, it also uses the fact that if X is a random vector
that takes values in a convex set X, and if E [X] is finite, then
E [X] X.
Lemma 2: If x
is an optimal solution to the convex pro-
gram (46)-(48), then x(t) = x
for all t {0, 1, 2, . . .} is an
optimal solution to (49)-(51). Further, the optimal objective
function value in both problems (46)-(48) and (49)-(51) is f
.
Proof: Recall that f
is defined as the optimal objective
function value for (46)-(48). Let x
be an optimal solution to
(46)-(48), so that x
X, g
k
(x
) c
k
for all k {1, . . . , K},
and f (x
) = f
. Define x(t) = x
for all t. Then (51) clearly
holds. Further, for all t > 0 one has:
y
k
(t) =
1
t
t1
X
τ =0
g
k
(x
) = g
k
(x
) c
k
k {1, . . . , K}
and so the constraints (50) hold. Similarly,
y
0
(t) = f (x
) =
f
for all t. Thus, x(t) satisfies the constraints of problem
(49)-(51) and gives an objective function value of f
. It
follows that f
y
0
, where y
0
is defined as the infimum
objective function value over all x(t) functions that meet the
constraints of problem (49)-(51).
It remains to show that f
y
0
(so that f
= y
0
). To this
end, let x(t) be any (possibly random) process that satisfies
the constraints of problem (49)-(51). Since x(t) X for all
t, it follows that
x(t) X for all t. Since X is compact, the
Bolzano-Wierstrass theorem implies there is a subsequence of
times t
m
that increase to infinity such that:
lim
m→∞
x(t
m
) = ˆx (52)
for some fixed vector ˆx = (ˆx
1
, . . . , ˆx
N
) X. Furthermore,
Jensen’s inequality (specifically, Lemma 1b) implies that for
any time t
m
> 0:
f(
x(t
m
)) y
0
(t
m
) (53)
g
k
(
x(t
m
)) y
k
(t
m
) k {1, . . . , K} (54)
Therefore, by continuity of g
k
(x):
g
k
(ˆx) = lim
m→∞
g
k
(
x(t
m
)) (55)
lim
m→∞
y
k
(t
m
) (56)
lim s up
t→∞
y
k
(t)
c
k
(57)
where (55) holds by (52), (56) holds by (54), and (57) holds
because (50) is satisfied. Thus, ˆx satisfies the constraints of
problem (46)-(48). It follows that f(ˆx) f
, and so:
f
f(ˆx)
= lim
m→∞
f(
x(t
m
)) (58)
lim
m→∞
y
0
(t
m
) (59)
lim s up
t→∞
y
0
(t)
where (58) holds by (52) and continuity of f(x), and (59)
holds by (53). Thus:
f
lim sup
t→∞
y
0
(t)
This says that f
is less than or equal to the objective function
value for any random process x(t) that satisfies the constraints
of problem (49)-(51). It follows that f
y
0
.
A. Drift-plus-penalty for convex programs
The drift-plus-penalty algorithm to solve (49)-(51) defines
virtual queues Q
k
(t) for k {1, . . . , K} by:
Q
k
(t + 1) = max[Q
k
(t) + y
k
(t) c
k
, 0]
Since y
k
(t) = g
k
(x(t)), this is equivalent to:
Q
k
(t + 1) = max[Q
k
(t) + g
k
(x(t)) c
k
, 0] (60)
The queues are initialized to zero. Then every slot t
{0, 1, 2, . . .}:
Observe (Q
1
(t), . . . , Q
K
(t)) and choose x(t) X to
minimize:
V f (x(t)) +
K
X
k=1
Q
k
(t)g
k
(x(t)) (61)
8
Update Q
k
(t) via (60) for each k {1, . . . , K}.
Fix ǫ > 0. The next subsection shows that by defining
V = 1, the average of values
1
t
P
t1
τ =0
x(τ) obtained from
the above algorithm converges to an O(ǫ) approximation of
(46)-(48) with convergence time O(1 /ǫ
2
). The above drift-
plus-penalty algorithm in this special case of a (deterministic)
convex program is similar to the basic dual subgradient algo-
rithm with step size 1 /V (see, for example, [15]). However, a
traditional analysis of the dual subgradient algorithm relies
on strict convexity assumptions to ensure that the primal
values x(t) converge to a O(ǫ)-approximation of a (unique)
optimal solution x
. The above requires only convexity (not
strict convexity), and so there may be more than one optimal
solution to (46)-(48). It then takes a time average of the primals
to obtain an O(ǫ)-approximation.
B. Convex progam performance
There is no random event process ω(t) for this convex
programming problem, and so the drift-plus-penalty algorithm
makes purely deterministic decisions to minimize (61) every
slot t. Indeed, assume that if there are ties in the decision
(61), the tie is broken using some deterministic method. The
resulting sequence {x(t)}
t=0
is deterministic. It follows that
all expectations in the analysis of the previous section can be
removed.
3
Thus, for all t > 0:
y(t) =
1
t
t1
X
τ =0
y(τ)
x(t) =
1
t
t1
X
τ =0
x(τ)
For this convex programming problem, the Lagrange mul-
tiplier condition (32) reduces to the existence of a vector
(µ
1
, . . . , µ
K
) with non-negative components such that:
f(x) +
K
X
k=1
µ
k
g
k
(x) f (x
) +
K
X
k=1
µ
k
c
k
x X
Fix ǫ > 0. It follows by Theorem 1 that if the problem is
feasible and has a Lagrange multiplier vector, then the drift-
plus-penalty method with V = 1 yields the following for
all t 1
2
:
y
0
(t) f
+ O(ǫ)
y
k
(t) c
k
+ O(ǫ) k {1, . . . , K}
On the other hand, it is clear by Lemma 1 (Jensen’s inequality)
that for all t > 0:
f(
x(t)) y
0
(t)
g
k
(x(t)) y
k
(t) k {1, . . . , K}
and hence
x(t) X for all t > 0, and:
f(x(t)) f
+ O(ǫ)
g
k
(x(t)) c
k
+ O(ǫ) k {1, . . . , K}
3
Alternatively, one can repeat the same analysis of the previous section in
the special case of no randomness, redefining
x(t) and y
k
(t) to be pure
time averages without an expectation, to obtain the same results for this
deterministic convex program.
Thus, the drift-plus-penalty algorithm produces an O(ǫ) ap-
proximation to the convex program with convergence time
O(1/ǫ
2
).
C. Application to linear programs
Consider the special case of a linear program, so that the
f(x) and g
k
(x) functions are linear and the set X is replaced
by a hyper-rectangle:
Minimize:
P
N
i=1
b
i
x
i
Subject to:
P
N
i=1
a
ki
x
i
c
k
k {1, . . . , K}
x
i,min
x
i
x
i,max
i {1, . . . , N}
where x
i,min
, x
i,max
, b
i
, a
ki
, and c
k
are given real numbers
for all i {1 , . . . , N } and k {1, . . . , K}. It is assumed that
x
i,min
< x
i,max
for all i {1, . . . , N}. This fits the form of
the convex program (46)-(48) via:
f(x) =
N
X
i=1
b
i
x
i
g
k
(x) =
N
X
i=1
a
ki
x
i
X = {x R
N
|x
i,min
x
i
x
i,max
i {1, . . . , N}}
The resulting drift-plus-penalty algorithm defines virtual
queues:
Q
k
(t + 1) = max
"
Q
k
(t) +
N
X
i=1
a
ki
x
i
(t) c
k
, 0
#
(62)
The queues are initialized to 0. Then every slot t
{0, 1, 2, . . .}, a vector x(t) X is chosen to minimize:
V
N
X
i=1
b
i
x
i
+
K
X
k=1
Q
k
(t)
"
N
X
i=1
a
ki
x
i
(t)
#
This results in the following simple and separable optimization
over each variable x
i
(t). Every slot t {0, 1, 2, . . .}:
Observe Q
1
(t), . . . , Q
K
(t). For each i {1, . . . , N}
choose:
x
i
(t) =
x
i,max
if V b
i
+
P
K
k=1
Q
k
(t)a
ki
0
x
i,min
otherwise
Update Q
k
(t) for k {1, . . . , K} via (62).
Update
x(t) via x(t + 1) = x(t)
t
t+1
+
x(t)
t+1
.
This algorithm always chooses x
i
(t) within the 2-element
set {x
i,min
, x
i,max
}. Thus, the x(t) vectors themselves cannot
converge to an approximate solution if the resulting solution
is not a corner point on the hyper-rectangle X (for example,
optimality might require x
1
= (x
1,min
+x
1,max
)/2). However,
Theorem 1 ensures the time averages
x(t) converge to an
O(ǫ)-approximation with convergence time O(1
2
).
9
VI. DISTRIBUTED OPTIMIZATION OVER A CONNECTED
GRAPH
Consider a directed graph with N nodes. Let N =
{1, . . . , N} be the set of nodes. Let L be the set of all
directed links. Each node n N has a vector of its own
variables x
(n)
= (x
(n)
1
, . . . , x
(n)
M
n
) R
M
n
, where M
n
is a
positive integer for each n N. In addition, there is a vector
θ = (θ
1
, . . . , θ
G
) R
G
of common variables (for some
positive integer G). The goal is to solve the problem in a
distributed way, so that each node makes decisions based only
on information available from its neighbors. The problem and
approach in this section is a variation on the work in [3].
Each node n N must choose variables x
(n)
X
(n)
,
where X
(n)
is a convex and compact subset of R
M
n
. In
addition, the nodes must collectively choose θ Θ, where Θ
is a convex and compact subset of R
G
. The goal is to solve:
Minimize:
P
N
n=1
f
(n)
(x
(n)
, θ) (63)
Subject to: g
(n)
(x
(n)
, θ) c
(n)
n N (64)
x
(n)
X
(n)
n N (65)
θ Θ (66)
where f
(n)
(x
(n)
, θ) and g
(n)
(x
(n)
, θ) are convex functions
over X
(n)
× Θ, defined for each n N.
The goal is to solve this problem by making distributed
decisions at each node. The difficulty is that the θ variables
must be chosen collectively. The next subsection clarifies
the challenges by specifying the drift-plus-penalty algorithm.
Subsection VI-B modifies the problem (without affecting
optimality) to produce a distributed solution.
A. The direct drift-plus-penalty approach
The problem (63)-(66) is a convex program. The drift-plus-
penalty method defines virtual queues Q
(n)
(t) for each n N
to enforce the constraints (64):
Q
(n)
(t+1) = max[Q
(n)
(t)+g
(n)
(x
(n)
(t), θ(t))c
(n)
, 0]n N
(67)
Every slot t {0, 1, 2, . . .}, the algorithm chooses x
(n)
(t)
X
(n)
for all n N, and chooses θ(t) Θ to minimize:
N
X
n=1
V f
(n)
(x
(n)
(t), θ(t)) +
N
X
n=1
Q
(n)
(t)g
(n)
(x
(n)
(t), θ(t))
The difficulty is the joint selection of the θ(t) variables, which
couples all terms together in a centralized optimization.
B. A distributed approach
This subsection specifies a distributed solution, along the
lines of the general solution methodology from [3]. The idea
is to introduce estimation vectors θ
(n)
(t) Θ at each node
n N. Consider the following problem:
Minimize:
P
N
n=1
f
(n)
(x
(n)
, θ
(n)
) (68)
Subject to: g
(n)
(x
(n)
, θ
(n)
) c
(n)
n N (69)
θ
(n)
= θ
(j)
(n, j) L (70)
x
(n)
X
(n)
n N (71)
θ
(n)
Θ n N (72)
The constraints (70) are vector equality constraints. Specifi-
cally, if θ
(n)
= (θ
(n)
1
, . . . , θ
(n)
G
), then the constraints are:
θ
(n)
i
= θ
(j)
i
i {1, . . . , G}, (n, j) L (73)
Now assume that if one changes the directed graph to an
undirected graph by changing all directed links to undirected
links, then the resulting undirected graph is connected (so that
there is a path from every node to every other node in the
undirected graph). With this connectedness assumption, the
problem (68)-(72) is equivalent to the original problem (63)-
(66). That is because for any nodes n and m in N, there is
a path in the undirected graph from n to m, and the equality
constraints (70) ensure that each node j on this path has
θ
(j)
= θ
(n)
. It follows that the constraints (70) ensure that
the estimation vectors θ
(n)
are the same for all nodes n N.
The problem (68)-(72) can be solved via the drift-plus-
penalty framework of Section IV. For each inequality con-
straint (69) (that is, for each n N), define:
Q
(n)
(t + 1) = max[Q
(n)
(t) + g
(n)
(x
(n)
(t), θ
(n)
(t)) c
(n)
, 0]
(74)
For each equality constraint (73) (that is, for each i
{1, . . . , G} and (n, j) in L) define:
Z
(n,j)
i
(t + 1) = Z
(n,j)
i
(t) + θ
(n)
i
(t) θ
(j)
i
(t) (75)
Each node n N is responsible for updating queues Q
(n)
(t)
and Z
(n,j)
i
(t) for all i {1, . . . , G} and all j such that (n, j)
L. Every slot t, decisions are made to minimize:
N
X
n=1
V f
(n)
(x
(n)
(t), θ
(n)
(t))
+
X
n∈N
Q
(n)
(t)g
(n)
(x
(n)
(t), θ
(n)
(t))
+
G
X
i=1
X
(n,j)∈L
Z
(n,j)
i
(t)(θ
(n)
i
(t) θ
(j)
i
(t))
This is a separable optimization in each of the local vari-
ables x
(n)
(t) and θ
(n)
(t) associated with individual nodes
n N. Each node n N needs to know only its own internal
queues and the queue values Z
(a,n)
i
(t) of its neighbors. It is
assumed that these values can be obtained via message passing
on the links associated with each neighbor. The resulting
algorithm is as follows: Initialize all queues to 0. Every slot
t {0, 1, 2, . . .} do:
Each node n N observes Q
(n)
(t) and the queues
Z
(n,j)
i
(t) and Z
(a,n)
i
(t) for all (n, j) L and all
(a, n) L, and all i {1, . . . , G}. It then chooses
(x
(n)
(t), θ
(n)
(t)) X
(n)
× Θ to minimize:
V f
(n)
(x
(n)
(t), θ
(n)
(t)) + Q
(n)
(t)g
(n)
(x
(n)
(t), θ
(n)
(t))
+
G
X
i=1
θ
(n)
i
(t)
X
j|(n,j)∈L
Z
(n,j)
i
(t)
X
a|(a,n)∈L
Z
(a,n)
i
(t)
Each node n N updates Q
(n)
(t) via (74) and updates
Z
(n,j)
i
(t) for (n, j) L via (75). The Z
(n,j)
i
(t) update
for node n requires all neighbors j such that (n, j) L
10
to first pass their chosen θ
(j)
(t) vectors to node n, so that
the right-hand-side of (75) can be computed.
Fix ǫ > 0. Using V = 1, the resulting time averages
x
(n)
(t) and θ
(n)
(t) converge to an O(ǫ) approximation with
convergence time O(1/ǫ
2
).
C. A different type of constraint
The problem (68)-(72) specifies one constraint of the form
g
(n)
(x
(n)
, θ) c
(n)
for each node n N. Suppose the
problem is changed so that these constraints (69) are replaced
by a single constraint of the form:
X
n∈N
g
(n)
(x
(n)
, θ
(n)
) c (76)
for some given real number c. In principle, this could be treated
using a virtual queue:
J(t + 1 ) = max
"
J(t) +
X
n∈N
g
(n)
(x
(n)
(t), θ
(n)
(t)) c, 0
#
However, it is not clear which node should implement this
queue. Further, every slot t, that node would need to know
values of g
(n)
(x
(n)
(t), θ
(n)
(t)) for all nodes n N, which is
difficult in a distributed context.
One way to avoid this difficulty is as follows: Form new
variables x
(n,m)
X
(n)
for all n, m N. The variable x
(n,m)
can be interpreted as the node m estimate of the optimal value
of x
(n)
. The constraint (76) is then replaced by:
X
n∈N
g
(n)
(x
(n,1)
, θ
(1)
) c (77)
x
(n,m)
= x
(n,j)
n N, (m, j) L (78)
x
(n,m)
X
(n)
n N (79)
Node 1 is responsible for the constraint (77) and maintains a
virtual queue:
J(t + 1 ) = max
"
J(t) +
X
n∈N
g
(n)
(x
(n,1)
(t), θ
(1)
(t)) c, 0
#
Each node m N is responsible for the vector equality
constraints x
(n,m)
= x
(n,j)
for all n N and all (m, j) L.
These are enforced in the same manner as the constraints (70).
VII. CONCLUSIONS
This paper proves O(1
2
) convergence time for the drift-
plus-penalty algorithm in a general situation where a Lagrange
multiplier vector exists, without requiring a Slater condition.
This holds for both stochastic optimization problems and
for (deterministic) convex programs. Special case implemen-
tations were given for convex programs, including linear
programs. Example solutions were also presented for solving
convex programs in a distributed way over a connected graph.
REFERENCES
[1] M. J. Neely. Stochastic Network Optimization with Application to
Communication and Queueing Systems. Morgan & Claypool, 2010.
[2] M. J. Neely. Distributed stochastic optimization via correlated schedul-
ing. ArXiv technical report, arXiv:1304.7727v2, May 2013.
[3] M. J. Neely. Distributed and secure computation of convex programs
over a network of connected processors. DCDIS Conf., Guelph, Ontario,
July 2005.
[4] M. J. Neely, E. Modiano, and C. Li. Fairness and optimal stochastic
control for heterogeneous networks. IEEE/ACM Transactions on Net-
working, vol. 16, no. 2, pp. 396-409, April 2008.
[5] M. J. Neely. Dynamic Power Allocation and Routing for Satellite
and Wireless Networks with Time Varying Channels. PhD thesis,
Massachusetts Institute of Technology, LIDS, 2003.
[6] L. Georgiadis, M. J. Neely, and L. Tassiulas. Resource allocation and
cross-layer control in wireless networks. Foundations and Trends in
Networking, vol. 1, no. 1, pp. 1-149, 2006.
[7] M. J. Neely. Energy optimal control for time varying wireless networks.
IEEE Transactions on Information Theory, vol. 52, no. 7, pp. 2915-2934,
July 2006.
[8] L. Tassiulas and A. Ephremides. Stability properties of constrained
queueing systems and scheduling policies for maximum throughput in
multihop radio networks. IEEE Transacations on Automatic Control,
vol. 37, no. 12, pp. 1936-1948, Dec. 1992.
[9] L. Tassiulas and A. Ephremides. Dynamic server allocation to parallel
queues with randomly varying connectivity. IEEE Transactions on
Information Theory, vol. 39, no. 2, pp. 466-478, March 1993.
[10] A. Eryilmaz and R. Srikant. Fair resource allocation in wireless networks
using queue-length-based scheduling and congestion control. IEEE/ACM
Transactions on Networking, vol. 15, no. 6, pp. 1333-1344, Dec. 2007.
[11] L. Huang and M. J. Neely. Delay reduction via Lagrange multipliers
in stochastic network optimization. IEEE Transactions on Automatic
Control, vol. 56, no. 4, pp. 842-857, April 2011.
[12] M. J. Neely. Energy-aware wireless scheduling with near optimal
backlog and convergence time tradeoffs. ArXiv technical report,
arXiv:1411.4740, Nov. 2014.
[13] S. Supittayapornpong, L. Huang, and M. J. Neely. Time-average
optimization with nonconvex decision set and its convergence. In Proc.
IEEE Conf. on Decision and Control (CDC), Los Angeles, California,
Dec. 2014.
[14] E. Wei and A. Ozdaglar. On the O(1/k) convergence of asynchronous
distributed alternating direction method of multipliers. In Proc. IEEE
Global Conference on Signal and Information Processing, 2013.
[15] D. P. Bertsekas, A. Nedic, and A. E. Ozdaglar. Convex Analysis and
Optimization. Boston: Athena Scientific, 2003.
[16] X. Lin and N. B. Shroff. Joint rate control and scheduling in multihop
wireless networks. Proc. of 43rd IEEE Conf. on Decision and Control,
Paradise Island, Bahamas, Dec. 2004.
[17] J. W. Lee, R. R. Mazumdar, and N. B. Shroff. Opportunistic power
scheduling for dynamic multiserver wireless systems. IEEE Transactions
on Wireless Communications, vol. 5, no.6, pp. 1506-1515, June 2006.
[18] H. Kushner and P. Whiting. Asymptotic properties of proportional-fair
sharing algorithms. Proc. 40th Annual Allerton Conf. on Communica-
tion, Control, and Computing, Monticello, IL, Oct. 2002.
[19] R. Agrawal and V. Subramanian. Optimality of certain channel aware
scheduling policies. Proc. 40th Annual Allerton Conf. on Communica-
tion, Control, and Computing, Monticello, IL, Oct. 2002.
[20] A. Stolyar. Maximizing queueing network utility subject to stability:
Greedy primal-dual algorithm. Queueing Systems, vol. 50, no. 4, pp.
401-457, 2005.
[21] A. Stolyar. Greedy primal-dual algorithm for dynamic resource alloca-
tion in complex networks. Queueing Systems, vol. 54, no. 3, pp. 203-220,
2006.
[22] A. Eryilmaz and R. Srikant. Joint congestion control, routing, and MAC
for stability and fairness in wireless networks. IEEE Journal on Selected
Areas in Communications, Special Issue on Nonlinear Optimization of
Communication Systems, vol. 14, pp. 1514-1524, Aug. 2006.