A Simple Convergence Time Analysis of Drift-Plus-Penalty for

arXiv:1412.0791v1 [math.OC] 2 Dec 2014

A Simple Convergence Time Analysis of

Drift-Plus-Penalty for Stochastic Optimization and

Convex Programs

Michael J. Neely

University of Southern California

Abstract—This paper considers the problem of minimizing

the time average of a stochastic process subject to time average

constraints on other processes. A canonical example is minimizing

average power in a data network subject to multi-user throughput

constraints. Another example is a (static) convex program. Under

a Slater condition, the drift-plus-penalty algorithm is known to

provide an O(ǫ) approximation to optimality with a convergence

time of O(1/ǫ

). This paper proves the same result with a simpler

technique and in a more general context that does not require the

Slater condition. This paper also emphasizes application to basic

convex programs, linear programs, and distributed optimization

problems.

I. INTRODUCTION

Fix K as a positive integer. Consider a discrete time system

that operates over time slots t ∈ {0, 1, 2, . . .}. Every slot t,

the controller observes a random event ω(t). Assume that

events ω(t) are elements in an abstract set Ω, and that they

are independent and identically distributed (i.i.d.) over slots.

The set Ω can have arbitrary (possibly uncountably inﬁnite)

cardinality. Every slot t, a system controller observes the

current ω(t) and then chooses a decision vector y(t) =

(t), y

(t), . . . , y

(t)) within an option set Y(ω(t)) ⊆

K+1

that possibly depends on ω(t). That is, Y(ω(t)) is the

set of vector options available under the random event ω(t).

The sets Y(ω(t)) are arbitrary and are only assumed to have

a mild boundedness property (speciﬁed in Section II).

The goal is to minimize the expected time average of the

resulting y

(t) process subject to time average constraints

on the y

(t) processes for k ∈ {1, . . . , K}. Speciﬁcally, for

integers t > 0, and for each k ∈ {0, 1, . . . , K}, deﬁne:

(t)

△

t−1

τ =0

E [y

(τ)]

Let c

, . . . , c

be a given collection of real numbers. The goal

is to solve the following stochastic optimization problem:

Minimize: lim sup

t→∞

(t) (1)

Subject to: lim sup

t→∞

(t) ≤ c

(2)

y(t) ∈ Y(ω(t)) ∀t ∈ {0, 1, 2, . . .} (3)

Assume the problem is feasible, so that it is possible to

satisfy the constraints (2)-(3). Deﬁne y

opt

as the inﬁmum

value of the objective (1) over all algorithms that satisfy the

The author is with the Electrical Engineering department at the University

of Southern California, Los Angeles, CA.

This work is supported in part by the NSF Career grant CCF-0747525.

constraints (2)-(3). The drift-plus-penalty algorithm from [1]

is known to satisfy constraints (2)-(3) and to ensure:

lim sup

t→∞

(t) ≤ y

opt

+ ǫ (4)

where ǫ > 0 is a parameter used in the algorithm. This is

done by deﬁning virtual queues Q

(t) for each constraint k ∈

{1, . . . , K} in (2):

(t + 1) = max[Q

(t) + y

(t) − c

, 0] (5)

where y

(t) acts as a virtual arrival process and c

acts as a

constant virtual service rate.

The intuition behind (5) is that if

(t) is stable, the time average arrival rate must be less than

or equal to the time average service rate, which implies the

desired time average constraint (2). Under an additional Slater

condition, it is also known that the drift-plus-penalty algorithm

provides an O(1/ǫ) bound on the time average expected size

of all virtual queues:

lim sup

t→∞

(t) ≤ O(1/ǫ) ∀k ∈ {1, . . . , K} (6)

where

(t) is deﬁned for t > 0 by:

(t)

△

t−1

τ =0

E [Q

(τ)]

More recently, it was shown that the convergence time required

for the desired time averages to “kick in” is O(1/ǫ

), provided

that the Slater condition still holds (see Appendix C in

[2]). Speciﬁcally, an algorithm is said to produce an O(ǫ)

approximation with convergence time T if for all t ≥ T one

has:

(t) ≤ y

opt

+ O(ǫ) (7)

(t) ≤ c

+ O(ǫ) ∀k ∈ {1, . . . , K} (8)

A. Contributions of the current paper

The current paper focuses on the issue of convergence time.

The main result is a proof that convergence time is O(1/ǫ

) for

general problems that have an associated Lagrange multiplier.

It can be shown that a Lagrange multiplier exists whenever the

Slater condition exists, but not vice-versa. Hence the proof in

the current paper is more general than the prior result [2] that

uses a Slater condition. To appreciate this distinction, note

that a Slater condition is equivalent to assuming there exists a

In an actual queueing system, arrivals and service rates are always non-

negative. However, in this virtual queue, the y

(t) and c

values can possibly

be negative.

value δ > 0 and a decision policy under which all constraints

can be satisﬁed with at least δ slackness:

lim sup

t→∞

(t) ≤ c

− δ ∀k ∈ {1, . . . , K}

This Slater condition is impossible in many problems of

interest. For example, a problem with a time average equality

constraint lim

t→∞

x(t) = c can be treated using two inequal-

ity constraints of the type (2):

lim sup

t→∞

x(t) ≤ c

lim sup

t→∞

[−

x(t)] ≤ −c

However, it is impossible for a Slater condition to exist

with the above two inequality constraints. Indeed, that would

require:

lim sup

t→∞

x(t) ≤ c − δ (9)

lim sup

t→∞

[−

x(t)] ≤ −c − δ (10)

Yet, (10) implies:

c + δ ≤ −lim sup

t→∞

[−

x(t)]

= lim inf

t→∞

x(t)

≤ lim sup

t→∞

x(t)

≤ c − δ

where the ﬁnal inequality follows from (9). This means that

c + δ ≤ c − δ, a contradiction when δ > 0.

Another contribution of the current paper is the application

of this stochastic result to standard (static) convex programs

and linear programs. Of course, static problems are a special

case of stochastic problems. Nevertheless, this paper clearly

illustrates that point, and shows that the drift-plus-penalty

algorithm can be applied to convex programs and linear

programs to produce an ǫ-approximation with convergence

time O(1/ǫ

). This was previously shown in [3] under a Slater

condition. A collection of simpliﬁed example problems of

distributed optimization, similar to those presented in [3], are

given to demonstrate the method.

B. Applications

The problem (1)-(3) is useful in a variety of settings,

including problems of stochastic network utility maximization

[4][5][6] and problems of minimizing average power in a

network subject to queue stability [7]. Indeed, the drift-plus-

penalty technique was developed in [4][5][6][7] for use in

these particular applications.

As an example, consider a multi-user wireless downlink

problem where random data arrivals a

(t) arrive to the

base station every slot t, intended for different users k ∈

{1, . . . , K}. Suppose the network controller can observe the

current channel state vector S(t) = (S

(t), . . . , S

(t)), which

speciﬁes current conditions on the channel for each user.

The controller also observes the vector of new data arrivals

a(t) = (a

(t), . . . , a

(t)). Let ω(t) = (S(t); a(t)) be a

concatenated vector with this channel and arrival information,

and let Ω be the set of all possible ω(t) vectors. Let p(t) =

(t), . . . , p

(t)) be the power used for transmission, chosen

within some abstract set P every slot t. Let µ

(p(t), S(t)) be a

function that speciﬁes the transmission rate on channel k under

the power vector p(t) and the channel state vector S(t) [7].

Deﬁne r

(t) = µ

(p(t), S(t)). The goal is to minimize total

average power expenditure subject to ensuring the average

transmission rate for each channel is greater than or equal

to the arrival rate:

Minimize: lim sup

t→∞

k=1

(t)

Subject to: lim sup

t→∞

(t) − r

(t)] ≤ 0

p(t) ∈ P ∀t ∈ {0, 1, 2, . . .}

This problem has the form (1)-(3) by deﬁning:

(t) =

k=1

(t)

(t) = a

(t) − µ

(p(t), S(t)) ∀k ∈ {1, . . . , K}(11)

= 0 ∀k ∈ {1, . . . , K}

and by deﬁning Y(ω) for each ω = (S, a) ∈ Ω as the set of

all (y

, y

, . . . , y

) ∈ R

K+1

such that there is a vector p ∈ P

that satisﬁes:

k=1

= a

− µ

(p, S) ∀k ∈ {1, . . . , K}

In this example, the virtual queue equations (5) reduce to the

following for all k ∈ {1, . . . , K}:

(t + 1) = max[Q

(t) + a

(t) − µ

(p(t), S(t)), 0]

This “virtual queue” corresponds to an actual network queue,

where a

(t) is the actual arriving data on slot t, and

(p(t), S(t)) is the actual transmission rate offered on slot t.

C. Prior work

The drift method for queue stability was developed in [8][9],

which resulted in max-weight and backpressure algorithms for

data networks. The drift-plus-penalty method was developed

for network utility maximization problems in [4][5] and energy

optimization problems in [7]. Generalized tutorial results are

in [1][6]. The works [1][6] prove that, under a Slater condition,

the drift-plus-penalty algorithm gives an O(ǫ) approximation

to optimality with an average queue size tradeoff of O(1/ǫ).

Recent work in [2] shows that convergence time is O(1/ǫ

)

under a Slater condition. Application to convex programs are

given in [3], again under a Slater condition.

Related work in [10] derives a similar algorithm for utility

maximization in a wireless downlink via a different analysis

that uses Lagrange multipliers. Lagrange multiplier analysis

was used in [11] to improve queue bounds to O(log(1/ǫ))

in certain piecewise linear cases. Work in [12] demonstrates

near-optimal convergence time of O(lo g(1/ǫ)/ǫ) for one-link

problems with piecewise linearity. Improved convergence time

bounds of O(1/ǫ) are recently shown in [13] for deterministic

problems with piecewise linearity assumptions. Work in [14]

considers the special case of a deterministic convex program

with linear constraints, and uses a different method for obtain-

ing O(1/ǫ) convergence time. The work [14] also considers

distributed implementation over a graph. While the works

[12][13][14] demonstrate convergence time that is superior to

the O(1/ǫ

) result of the current paper, those results hold only

for special case systems.

The drift-plus-penalty algorithm is closely related to the

dual subgradient algorithm for convex programs [15]. Related

work in [16] uses a dual subgradient approach for non-

stochastic problems of network scheduling for utility maxi-

mization. Network scheduling with stochastic approximation is

considered in [17]. A different primal-dual approach is consid-

ered for network utility maximization in [18][19][20][21][22].

II. ALGORITHM AND BASIC ANALYSIS

This section presents the basic results needed from [1].

A. Boundedness assumption

Assume there are non-negative constants h

, h

, . . . , h

such that under any policy for making decisions and for any

given slot t, the ﬁrst moment of y

(t) and the second moments

of y

(t) for k ∈ {1, . . . , K} satisfy:

E [|y

(t)|] ≤ h

(12)



(t)



≤ h

∀k ∈ {1, . . . , K} (13)

That is, the ﬁrst moment of y

(t) is uniformly bounded for all

t, and the second moments of y

(t) for k ∈ {1, . . . , K} are

also uniformly bounded.

These boundedness conditions (12)-(13) are satisﬁed, for

example, whenever there is a bounded set Y ⊆ R

K+1

such

that Y(ω) ⊆ Y for all ω ∈ Ω. It can also hold when y

(t)

is not necessarily bounded. This is useful in the wireless

downlink example with y

(t) = a

(t) − µ

(p(t), S(t)), as

deﬁned by (11). Suppose that µ

(·) always takes values in the

bounded interval [0, r

max

] for some real number r

max

> 0.

In this case, y

(t) satisﬁes (13) whenever E



(t)



is ﬁnite.

However, particular values of y

(t) can be arbitrarily large

if the arrivals a

(t) can be arbitrarily large. For example, if

(t) is a Poisson process, it can take arbitrarily large values

but has a ﬁnite second moment.

B. Compactness assumption

Assume that for all ω ∈ Ω, the set Y(ω) is a compact

subset of R

K+1

(recall that a subset is compact if it is closed

and bounded). This compactness assumption is not crucial

to the analysis, but it simpliﬁes exposition.

Indeed, such

compactness ensures that, given any ω ∈ Ω, there is always

an optimal solution to problems of the following type:

Minimize:

k=0

Subject to: (y

, . . . , y

) ∈ Y(ω)

If Y (ω) is not compact, one can still obtain optimality results by assuming

the drift-plus-penalty algorithm comes within an additive constant C of

minimizing the desired expression for all slots t. This is called a C-additive

approximation [1].

where w

, . . . , w

are a given set of real numbers. The drift-

plus-penalty algorithm will be shown to make decisions every

slot t according to such a minimization.

The sets Y(ω(t)) are not required to have any additional

structure beyond the boundedness and compactness assump-

tions speciﬁed in Sections II-A and II-B. In particular, the sets

Y(ω(t)) might be ﬁnite, inﬁnite, convex, or non-convex.

C. The set R of all average vectors

Recall that random events ω(t) are i.i.d. over slots. The

distribution for ω(t) is possibly unknown. Imagine observing

ω(t) and randomly choosing vector y(t) in the set Y(ω(t)) ac-

cording to a distribution that depends on ω(t). The expectation

vector E [y(t)] is with respect to the randomness of ω(t) and

the conditional randomness of y(t) given ω(t). Deﬁne R as the

set of all expectation vectors E [y(t)] = E [(y

(t), . . . , y

(t))]

that can be achieved, considering all ω ∈ Ω and all possible

conditional distributions over the set Y(ω) given that ω(t) =

ω. A probabilistic mixture of two randomized choices is again

a randomized choice, and so the set R is a convex subset of

K+1

. The boundedness assumptions (12)-(13) further imply

that R is bounded.

Every slot τ ∈ {0, 1, 2, . . .}, a general algorithm chooses

y(τ) as a (possibly random) vector in the set Y(ω(τ)) (with

distribution that possibly depends on the observed history),

and so E [y(τ )] ∈ R for all slots τ. Fix t > 0. It follows that

y(t) =

t−1

τ =0

E [y(τ)] is a convex combination of vectors

in R, and so it is also in R (since R is a convex set). That is:

y(t) ∈ R ∀t ∈ {1, 2 , 3, . . .} (14)

D. Optimality

Deﬁne

R as the closure of R. Since R is a bounded and

convex subset of R

K+1

, the set

R is a compact and convex

subset of R

K+1

. Consider the problem:

Minimize: y

(15)

Subject to: y

≤ c

∀k ∈ {1, . . . , K} (16)

, y

, . . . , y

) ∈

R (17)

In [1] it is shown that the above problem (15)-(17) is feasible

if and only if the original stochastic optimization problem (1)-

(3) is feasible. Further, assuming feasibility, the problems (15)-

(17) and (1)-(3) have the same optimal objective value y

opt

Throughout this paper it is assumed that problem (1)-

(3) is feasible, and hence problem (15)-(17) is feasible. Let

opt

, y

opt

, . . . , y

opt

) be an optimal solution to (15)-(17). Such

an optimal solution exists because the problem (15)-(17) is

feasible and the set

R is compact. This optimal solution must

satisfy the constraints of problem (15)-(17), and so:

opt

≤ c

∀k ∈ {1, . . . , K} (18)

E. Lyapunov optimization

Deﬁne Q(t) = (Q

(t), . . . , Q

(t)) as the vector of queue

backlogs. The squared norm of the backlog vector is:

||Q(t)||

k=1

(t)

Deﬁne L(t) =

||Q(t)||

, called a Lyapunov function. The

drift-plus-penalty algorithm observes the current vector Q(t)

and random event ω(t) every slot t, and then makes a decision

y(t) ∈ Y(ω(t)) to greedily minimize a bound on the drift-plus-

penalty expression:

∆(t) + V y

(t)

where V is a positive weight that affects a performance

tradeoff. Setting V = 1/ǫ results in an O(ǫ) approximation

to optimality [1]. This fact is reviewed in the remainder of

this section, as several of the key results are needed in the

new convergence analysis of Section III.

To bound ∆(t), ﬁx k ∈ {1, . . . , K}, square the queue

equation (5), and use the fact that max[z, 0]

≤ z

to obtain:

(t + 1)

≤ Q

(t)

+ (y

(t) − c

)

+ 2Q

(t)(y

(t) − c

)

Summing the above over k ∈ {1, . . . , K} and dividing by 2

gives:

∆(t) ≤ B(t) +

k=1

(t)(y

(t) − c

)

where B(t) is deﬁned:

B(t) =

k=1

(t) − c

)

(19)

Adding V y

(t) to both sides gives the following bound:

∆(t)+V y

(t) ≤ B(t)+V y

(t)+

k=1

(t)(y

(t)−c

) (20)

Every slot t, the drift-plus-penalty algorithm observes

Q(t), ω(t) and chooses (y

(t), y

(t), . . . , y

(t)) in the set

Y(ω(t)) to minimize the last two terms on the right-hand-side

of (20).

F. Drift-plus-penalty algorithm

Initialize Q

(0) = 0 for all k ∈ {1, . . . , K}. Perform the

following steps every slot t ∈ {0, 1, 2, . . .}:

• Observe Q(t) = (Q

(t), . . . , Q

(t)) and ω(t), and

choose (y

(t), . . . , y

(t)) ∈ Y(ω(t)) to minimize:

V y

(t) +

k=1

(t)y

(t) (21)

• Update queues Q

(t) for k ∈ {1, . . . , K} via:

(t + 1) = max[Q

(t) + y

(t) − c

, 0] (22)

A key feature of this algorithm is that it reacts to the

observed state ω(t), and does not require knowledge of the

probability distribution associated with ω(t). Notice that once

the queue vector Q(t) is observed on slot t, its components

act as known weights in the minimization of (21). Hence,

this minimization indeed has the form speciﬁed in Section

II-B. Speciﬁcally, every slot a vector y(t) ∈ Y(ω(t)) is

chosen to minimize a linear function of the components

(t), y

(t), . . . , y

(t). Complexity of this decision depends

on the structure of the sets Y(ω(t)). If these sets consist of

a ﬁnite and small number of points, the decision amounts

to testing each option and choosing the one with the least

weighted sum. The decision can be complex if the sets Y(ω(t))

consist of a ﬁnite but large number of points, or if these sets

are inﬁnite but non-convex.

For simplicity, it is assumed throughout that y(t) is chosen

to exactly minimize the expression (21) (this is possible via the

compactness assumption of Section II-B). Similar analytical

results can be obtained under the weaker assumption that y(t)

comes within an additive constant of minimizing (21), called

a C-additive approximation (see [1]).

G. Constraint satisfaction via queue stability

The queue backlog gives a simple bound on constraint

violation. Indeed, for all slots τ ∈ {0, 1, 2, . . .} one has from

(22) and the fact that max[z, 0] ≥ z:

(τ + 1) ≥ Q

(τ) + y

(τ) − c

Thus:

(τ + 1) − Q

(τ) ≥ y

(τ) − c

Summing over τ ∈ {0, 1, . . . , t − 1} for some integer t > 0

gives:

(t) − Q

(0) ≥

t−1

τ =0

(τ) − tc

Dividing by t and using the fact that Q

(0) = 0 gives:

(t)

≥

t−1

τ =0

(τ) − c

Taking expectations gives:

E [Q

(t)]

≥

(t) − c

Rearranging terms gives the desired constraint violation

bound:

(t) ≤ c

E [Q

(t)]

(23)

It follows that the desired constraints (2) hold if all queues

k ∈ {1, . . . , K} satisfy:

lim

t→∞

E [Q

(t)]

= 0 (24)

A queue that satisﬁes (24) is said to be mean rate stable [1].

H. Objective function analysis

Fix τ ∈ {0, 1, 2, . . .}. Because the drift-plus-penalty deci-

sion minimizes the last two terms on the right-hand-side of

the drift-plus-penalty bound (20), one has:

∆(τ) + V y

(τ) ≤ B(τ) + V y

∗

(τ)

k=1

(τ)(y

∗

(τ) − c

) (25)

for all vectors (y

∗

(τ), . . . , y

∗

(τ)) ∈ Y(ω(τ)), including

vectors that are chosen randomly over Y(ω(τ)). Fix a vector

∗

, . . . , y

∗

) ∈ R. Let y

∗

(τ) = (y

∗

(τ), . . . , y

∗

(τ)) be chosen

as a random function of ω(t) according to a conditional

distribution that yields expectation E [y

∗

(τ)] = (y

∗

, . . . , y

∗

but with conditional decisions that are independent of history.

Since ω(τ) is itself independent of history, it follows that for

all k ∈ {1, . . . , K}, y

(τ) is independent of Q

(τ), and:

E [y

(τ)Q

(τ)] = E [y

(τ)] E [Q

(τ)] = y

∗

E [Q

(τ)] (26)

Taking expectations of (25) (assuming y

∗

(τ) is this random-

ized policy) and substituting (26) gives:

E [∆(τ)] + V E [y

(τ)] ≤ E [B(τ)] + V y

∗

k=1

E [Q

(τ)] (y

∗

− c

) (27)

Let B ≥ 0 be a ﬁnite constant that satisﬁes the following for

all slots τ:

E [B(τ)] ≤ B (28)

Such a constant B exists by the second moment boundedness

assumption (13). Substituting B into (27) gives:

E [∆(τ)] + V E [y

(τ)] ≤ B + V y

∗

k=1

E [Q

(τ)] (y

∗

− c

)

The above inequality holds for all (y

∗

, . . . , y

∗

) ∈ R. Take a

limit as (y

∗

, . . . , y

∗

) approaches the point (y

opt

, . . . , y

opt

) ∈

R to obtain:

E [∆(τ)]+V E [y

(τ)] ≤ B+V y

opt

k=1

E [Q

(τ)] (y

opt

−c

)

Substituting (18) into the right-hand-side of the above inequal-

ity gives:

E [∆(τ)] + V E [y

(τ)] ≤ B + V y

opt

(29)

The inequality (29) holds for all slots τ ∈ {0, 1 , 2, . . .}. Fix

t > 0. Summing (29) over τ ∈ {0, 1, . . . , t − 1} gives:

E [L(t)] − E [L (0)] + V

t−1

τ =0

E [y

(τ)] ≤ (B + V y

opt

Dividing by t and using the fact that E [L(0)] = 0 gives:

E [L(t)]

+ V

(t) ≤ B + V y

opt

(30)

Dividing by V and using E [L(t)] ≥ 0 gives:

(t) ≤ y

opt

+ B/V (31)

That is, (31) ensures that for all slots t > 0, the time average

expectation

(t) is at most O(1/V ) larger than the optimal

objective function value y

opt

. Fix ǫ > 0. Using the parameter

V = 1/ǫ gives an O(ǫ) approximation to optimal utility.

It remains to show that the desired constraints are also

satisﬁed. If a Slater assumption holds, it can be shown that

queue averages are O(1/ǫ). The Slater assumption also ensures

convergence time is O(1/ǫ

) [2]. The next subsection presents

a new analysis to develop O(1 /ǫ

) convergence time without

the Slater assumption.

III. CONVERGENCE TIME ANALYSIS

A. Lagrange multipliers

Assume the problem (15)-(17) is feasible. Since this prob-

lem is convex, a hyperplane in R

K+1

exists that passes

through the point (y

opt

, c

, . . . , c

) and that contains the set

R on one side [15]. Speciﬁcally, there are non-negative values

, γ

, . . . , γ

such that:

k=1

≥ γ

opt

k=1

∀(y

, . . . , y

) ∈

The hyperplane is said to be non-vertical if γ

6= 0 [15].

If the hyperplane is non-vertical, one can divide the above

inequality by γ

, deﬁne µ

= γ

/γ

for all k ∈ {1, . . . , K},

and conclude:

k=1

≥ y

opt

k=1

∀(y

, . . . , y

) ∈

R (32)

The non-negative vector (µ

, . . . , µ

) in (32) is called a

Lagrange multiplier vector. A Lagrange multiplier vector that

satisﬁes (32) exists whenever the separating hyperplane is non-

vertical. It can be shown that the separating hyperplane is non-

vertical whenever a Slater condition holds. Such a non-vertical

hyperplane also exists in more general situations without a

Slater condition (see “regularity conditions” speciﬁed in [15]).

Thus, the assumption that a Lagrange multiplier vector exists

is a mild assumption.

B. Bounding the violations

Assume a (non-negative) Lagrange multiplier vector

(µ

, . . . , µ

) exists so that (32) holds. Fix t > 0. Recall that

(14) ensures

y(t) = (y

(t), . . . , y

(t)) ∈ R. Since R ⊆ R,

by (32) one has:

(t) +

k=1

(t) ≥ y

opt

k=1

Rearranging the above gives:

opt

−

(t) ≤

k=1

(t) − c

)

≤

k=1

E [Q

(t)]

(33)

where the ﬁnal inequality holds by (23).

On the other hand, one has by (30):

E [L(t)]

≤ B + V (y

opt

−

(t))

≤ B + V

k=1

E [Q

(t)]

(34)

≤ B +

||µ|| · ||E [Q(t)]|| (35)

where (34) is obtained by substituting (33), and (35) is due

to the fact that the dot product of two vectors is less than or

equal to the product of their norms. Substituting the deﬁnition

L(t) =

||Q(t)||

in the left-hand-side of (35) gives:



||Q(t)||



≤ B +

||µ|| · ||E [Q(t)]||

Since E



||Q(t)||



≥ ||E [Q(t)]||

, one has:

||E [Q(t)]||

≤ B +

||µ|| · ||E [Q(t)]||

Therefore:

||E [Q(t)]||

− 2V ||µ|| · ||E [Q(t)]||− 2Bt ≤ 0

Deﬁne x = ||E [Q(t)]||, b = −2V ||µ||, c = −2Bt. Then:

+ bx + c ≤ 0 (36)

The largest value of x that satisﬁes (36) is equal to the largest

root of the quadratic equation x

+ bx + c = 0, and so:

x ≤

−b +

√

− 4c

= V ||µ|| +

||µ||

+ 2Bt

Therefore, for all t > 0 one has:

||E [Q(t)]|| ≤ V ||µ|| +

||µ||

+ 2Bt

It follows from (23) that for all k ∈ {1, . . . , K} the constraint

violations satisfy:

(t) ≤ c

E [Q

(t)]

≤ c

||E [Q(t)]||

≤ c

V ||µ|| +

||µ||

+ 2Bt

(37)

This leads to the following theorem.

Theorem 1: Fix ǫ > 0 and deﬁne V = 1/ǫ. If the problem

(1)-(3) is feasible and the Lagrange multiplier assumption (32)

holds, then for all t ≥ 1/ǫ

one has:

(t) ≤ y

opt

+ O(ǫ) (38)

(t) ≤ c

+ O(ǫ) ∀k ∈ {1, . . . , K} (39)

and so the drift-plus-penalty algorithm with V = 1/ǫ provides

an O(ǫ ) approximation with convergence time O(1/ǫ

Proof: Inequality (38) holds from (31) and the fact that

B/V = Bǫ = O(ǫ). Inequality (39) holds from (37) and the

fact that:

V ||µ|| +

||µ||

+ 2Bt

||µ||

ǫt

||µ||

≤ ||µ||ǫ +

||µ||

+ 2Bǫ

= ||µ||ǫ + ǫ

||µ||

+ 2B

= O(ǫ)

IV. EQUALITY CONSTRAINTS

A similar analysis can be used to treat problems with

explicit equality constraints. Speciﬁcally, consider choosing a

vector h(t) = (y

(t), y

(t), . . . , y

(t), w

(t), . . . , w

(t)) in

a set H(ω(t)) to solve:

Minimize: lim sup

t→∞

(t) (40)

Subject to: lim sup

t→∞

(t) ≤ c

∀k ∈ {1, . . . , K} (41)

lim

t→∞

(t) = d

∀i ∈ {1, . . . , M} (42)

h(t) ∈ H(ω(t)) ∀t ∈ {0, 1, 2, . . .} (43)

where c

, . . . , c

and d

, . . . , d

are given real numbers. One

approach is to change each inequality constraint (42) into two

inequality constraints:

lim sup

t→∞

(t) ≤ d

lim sup

t→∞

[−

(t)] ≤ −d

This would involve two virtual queues for each i ∈

{1, . . . , M }. A notationally easier method is to simply change

the structure of the virtual queue for equality constraints

i ∈ {1, . . . , M } as follows [1]:

(t + 1) = Z

(t) + w

(t) − d

∀i ∈ {1, . . . , M} (44)

The inequality constraints (41) have the same virtual queues

from before:

(t + 1) = max[Q

(t) + y

(t) − c

, 0] (45)

The resulting algorithm is as follows: Initialize Z

(0) =

(0) = 0 for all i ∈ {1, . . . , M } and k ∈ {1, . . . , K}. Every

slot t ∈ {0 , 1, 2, . . .} do:

• Observe Q

(t), . . . , Q

(t) and Z

(t), . . . , Z

(t) and

ω(t) and choose h (t) ∈ H(ω(t)) to minimize:

V y

(t) +

k=1

(t)y

(t) +

i=1

(t)w

(t)

• Update Q

(t) for k ∈ {1, . . . , K} and Z

(t) for i ∈

{1, . . . , M } via (45) and (44).

The analysis of this scenario with equality constraints is

similar and is omitted for brevity (see [1]).

V. CONVEX PROGRAMS

Fix N as a positive integer. Consider the problem of ﬁnding

a vector x = (x

, . . . , x

) ∈ R

to solve:

Minimize: f(x) (46)

Subject to: g

(x) ≤ c

∀k ∈ {1, . . . , K} (47)

x ∈ X (48)

where X is a convex and compact subset of R

, functions

f(x), g

(x), . . . , g

(x) are continuous and convex functions

over x ∈ X, and c

, . . . , c

are given real numbers. The

problem (46)-(48) is a convex program. Assume the problem

is feasible, so that there exists a vector that satisﬁes the

constraints (47)-(48). The compactness and continuity assump-

tions ensure there is an optimal solution x

∗

∈ X that solves the

problem (46)-(48). Deﬁne f

∗

= f (x

∗

) as the optimal objective

function value.

This convex program is equivalent to a problem of the

form (1)-(3), and hence can be solved by the drift-plus-

penalty method [3]. To see this, deﬁne Y as the set of all

, y

, . . . , y

) vectors in R

K+1

such that there exists a

vector x ∈ X that satisﬁes:

= f(x)

= g

(x) ∀k ∈ {1, . . . , K}

Consider a system deﬁned over slots t ∈ {0, 1, 2, . . .}. Every

slot t, a controller chooses a vector x(t) = (x

(t), . . . , x

(t))

in the (deterministic) set X. Deﬁne:

(t) = f(x(t))

(t) = g

(x(t)) ∀k ∈ {1, . . . , K}

The goal is to choose x(t) over slots to solve:

Minimize: lim sup

t→∞

(t) (49)

Subject to: lim sup

t→∞

(t) ≤ c

∀k ∈ {1, . . . , K} (50)

x(t) ∈ X ∀t ∈ {0, 1, 2, . . .} (51)

Lemma 1: If {x(t)}

∞

t=0

is a random or deterministic process

that satisﬁes x(t) ∈ X for all t, then:

a) For all t > 0, one has

t−1

τ =0

x(τ) ∈ X, and:

t−1

τ =0

x(τ)

≤

t−1

τ =0

(τ)

t−1

τ =0

x(τ)

≤

t−1

τ =0

(τ) ∀k ∈ {1, . . . , K}

b) For all t > 0, x(t) ∈ X, and:

x(t)) ≤ y

(t)

(

x(t)) ≤ y

(t) ∀k ∈ {1, . . . , K}

Proof: Part (a) follows immediately from convexity of

X and Jensen’s inequality on the convex functions f(x)

and g

(x). Part (b) follows by taking expectations of the

inequalities in part (a) and again using Jensen’s inequality.

Formally, it also uses the fact that if X is a random vector

that takes values in a convex set X, and if E [X] is ﬁnite, then

E [X] ∈ X.

Lemma 2: If x

∗

is an optimal solution to the convex pro-

gram (46)-(48), then x(t) = x

∗

for all t ∈ {0, 1, 2, . . .} is an

optimal solution to (49)-(51). Further, the optimal objective

function value in both problems (46)-(48) and (49)-(51) is f

∗

Proof: Recall that f

∗

is deﬁned as the optimal objective

function value for (46)-(48). Let x

∗

be an optimal solution to

(46)-(48), so that x

∗

∈ X, g

∗

) ≤ c

for all k ∈ {1, . . . , K},

and f (x

∗

) = f

∗

. Deﬁne x(t) = x

∗

for all t. Then (51) clearly

holds. Further, for all t > 0 one has:

(t) =

t−1

τ =0

∗

) = g

∗

) ≤ c

∀k ∈ {1, . . . , K}

and so the constraints (50) hold. Similarly,

(t) = f (x

∗

) =

∗

for all t. Thus, x(t) satisﬁes the constraints of problem

(49)-(51) and gives an objective function value of f

∗

. It

follows that f

∗

≥ y

∗

, where y

∗

is deﬁned as the inﬁmum

objective function value over all x(t) functions that meet the

constraints of problem (49)-(51).

It remains to show that f

∗

≤ y

∗

(so that f

∗

= y

∗

). To this

end, let x(t) be any (possibly random) process that satisﬁes

the constraints of problem (49)-(51). Since x(t) ∈ X for all

t, it follows that

x(t) ∈ X for all t. Since X is compact, the

Bolzano-Wierstrass theorem implies there is a subsequence of

times t

that increase to inﬁnity such that:

lim

m→∞

x(t

) = ˆx (52)

for some ﬁxed vector ˆx = (ˆx

, . . . , ˆx

) ∈ X. Furthermore,

Jensen’s inequality (speciﬁcally, Lemma 1b) implies that for

any time t

> 0:

x(t

)) ≤ y

) (53)

(

x(t

)) ≤ y

) ∀k ∈ {1, . . . , K} (54)

Therefore, by continuity of g

(x):

(ˆx) = lim

m→∞

(

x(t

)) (55)

≤ lim

m→∞

) (56)

≤ lim s up

t→∞

(t)

≤ c

(57)

where (55) holds by (52), (56) holds by (54), and (57) holds

because (50) is satisﬁed. Thus, ˆx satisﬁes the constraints of

problem (46)-(48). It follows that f(ˆx) ≥ f

∗

, and so:

∗

≤ f(ˆx)

= lim

m→∞

x(t

)) (58)

≤ lim

m→∞

) (59)

≤ lim s up

t→∞

(t)

where (58) holds by (52) and continuity of f(x), and (59)

holds by (53). Thus:

∗

≤ lim sup

t→∞

(t)

This says that f

∗

is less than or equal to the objective function

value for any random process x(t) that satisﬁes the constraints

of problem (49)-(51). It follows that f

∗

≤ y

∗

A. Drift-plus-penalty for convex programs

The drift-plus-penalty algorithm to solve (49)-(51) deﬁnes

virtual queues Q

(t) for k ∈ {1, . . . , K} by:

(t + 1) = max[Q

(t) + y

(t) − c

, 0]

Since y

(t) = g

(x(t)), this is equivalent to:

(t + 1) = max[Q

(t) + g

(x(t)) − c

, 0] (60)

The queues are initialized to zero. Then every slot t ∈

{0, 1, 2, . . .}:

• Observe (Q

(t), . . . , Q

(t)) and choose x(t) ∈ X to

minimize:

V f (x(t)) +

k=1

(t)g

(x(t)) (61)

• Update Q

(t) via (60) for each k ∈ {1, . . . , K}.

Fix ǫ > 0. The next subsection shows that by deﬁning

V = 1/ǫ, the average of values

t−1

τ =0

x(τ) obtained from

the above algorithm converges to an O(ǫ) approximation of

(46)-(48) with convergence time O(1 /ǫ

). The above drift-

plus-penalty algorithm in this special case of a (deterministic)

convex program is similar to the basic dual subgradient algo-

rithm with step size 1 /V (see, for example, [15]). However, a

traditional analysis of the dual subgradient algorithm relies

on strict convexity assumptions to ensure that the primal

values x(t) converge to a O(ǫ)-approximation of a (unique)

optimal solution x

∗

. The above requires only convexity (not

strict convexity), and so there may be more than one optimal

solution to (46)-(48). It then takes a time average of the primals

to obtain an O(ǫ)-approximation.

B. Convex progam performance

There is no random event process ω(t) for this convex

programming problem, and so the drift-plus-penalty algorithm

makes purely deterministic decisions to minimize (61) every

slot t. Indeed, assume that if there are ties in the decision

(61), the tie is broken using some deterministic method. The

resulting sequence {x(t)}

∞

t=0

is deterministic. It follows that

all expectations in the analysis of the previous section can be

removed.

Thus, for all t > 0:

y(t) =

t−1

τ =0

y(τ)

x(t) =

t−1

τ =0

x(τ)

For this convex programming problem, the Lagrange mul-

tiplier condition (32) reduces to the existence of a vector

(µ

, . . . , µ

) with non-negative components such that:

f(x) +

k=1

(x) ≥ f (x

∗

) +

k=1

∀x ∈ X

Fix ǫ > 0. It follows by Theorem 1 that if the problem is

feasible and has a Lagrange multiplier vector, then the drift-

plus-penalty method with V = 1/ǫ yields the following for

all t ≥ 1/ǫ

(t) ≤ f

∗

+ O(ǫ)

(t) ≤ c

+ O(ǫ) ∀k ∈ {1, . . . , K}

On the other hand, it is clear by Lemma 1 (Jensen’s inequality)

that for all t > 0:

x(t)) ≤ y

(t)

(x(t)) ≤ y

(t) ∀k ∈ {1, . . . , K}

and hence

x(t) ∈ X for all t > 0, and:

f(x(t)) ≤ f

∗

+ O(ǫ)

(x(t)) ≤ c

+ O(ǫ) ∀k ∈ {1, . . . , K}

Alternatively, one can repeat the same analysis of the previous section in

the special case of no randomness, redeﬁning

x(t) and y

(t) to be pure

time averages without an expectation, to obtain the same results for this

deterministic convex program.

Thus, the drift-plus-penalty algorithm produces an O(ǫ) ap-

proximation to the convex program with convergence time

O(1/ǫ

C. Application to linear programs

Consider the special case of a linear program, so that the

f(x) and g

(x) functions are linear and the set X is replaced

by a hyper-rectangle:

Minimize:

i=1

Subject to:

i=1

≤ c

∀k ∈ {1, . . . , K}

i,min

≤ x

i,max

∀i ∈ {1, . . . , N}

where x

i,min

, x

i,max

, b

, a

, and c

are given real numbers

for all i ∈ {1 , . . . , N } and k ∈ {1, . . . , K}. It is assumed that

i,min

< x

i,max

for all i ∈ {1, . . . , N}. This ﬁts the form of

the convex program (46)-(48) via:

f(x) =

i=1

(x) =

i=1

X = {x ∈ R

i,min

≤ x

i,max

∀i ∈ {1, . . . , N}}

The resulting drift-plus-penalty algorithm deﬁnes virtual

queues:

(t + 1) = max

(t) +

i=1

(t) − c

, 0

(62)

The queues are initialized to 0. Then every slot t ∈

{0, 1, 2, . . .}, a vector x(t) ∈ X is chosen to minimize:

i=1

k=1

(t)

i=1

(t)

This results in the following simple and separable optimization

over each variable x

(t). Every slot t ∈ {0, 1, 2, . . .}:

• Observe Q

(t), . . . , Q

(t). For each i ∈ {1, . . . , N}

choose:

(t) =



i,max

if V b

k=1

(t)a

≤ 0

i,min

otherwise

• Update Q

(t) for k ∈ {1, . . . , K} via (62).

• Update

x(t) via x(t + 1) = x(t)

t+1

x(t)

t+1

This algorithm always chooses x

(t) within the 2-element

set {x

i,min

, x

i,max

}. Thus, the x(t) vectors themselves cannot

converge to an approximate solution if the resulting solution

is not a corner point on the hyper-rectangle X (for example,

optimality might require x

∗

= (x

1,min

1,max

)/2). However,

Theorem 1 ensures the time averages

x(t) converge to an

O(ǫ)-approximation with convergence time O(1/ǫ

VI. DISTRIBUTED OPTIMIZATION OVER A CONNECTED

GRAPH

Consider a directed graph with N nodes. Let N =

{1, . . . , N} be the set of nodes. Let L be the set of all

directed links. Each node n ∈ N has a vector of its own

variables x

(n)

= (x

(n)

, . . . , x

(n)

) ∈ R

, where M

is a

positive integer for each n ∈ N. In addition, there is a vector

θ = (θ

, . . . , θ

) ∈ R

of common variables (for some

positive integer G). The goal is to solve the problem in a

distributed way, so that each node makes decisions based only

on information available from its neighbors. The problem and

approach in this section is a variation on the work in [3].

Each node n ∈ N must choose variables x

(n)

∈ X

(n)

where X

(n)

is a convex and compact subset of R

. In

addition, the nodes must collectively choose θ ∈ Θ, where Θ

is a convex and compact subset of R

. The goal is to solve:

Minimize:

n=1

(n)

, θ) (63)

Subject to: g

(n)

, θ) ≤ c

(n)

∀n ∈ N (64)

(n)

∈ X

(n)

∀n ∈ N (65)

θ ∈ Θ (66)

where f

(n)

, θ) and g

(n)

, θ) are convex functions

over X

(n)

× Θ, deﬁned for each n ∈ N.

The goal is to solve this problem by making distributed

decisions at each node. The difﬁculty is that the θ variables

must be chosen collectively. The next subsection clariﬁes

the challenges by specifying the drift-plus-penalty algorithm.

Subsection VI-B modiﬁes the problem (without affecting

optimality) to produce a distributed solution.

A. The direct drift-plus-penalty approach

The problem (63)-(66) is a convex program. The drift-plus-

penalty method deﬁnes virtual queues Q

(n)

(t) for each n ∈ N

to enforce the constraints (64):

(n)

(t+1) = max[Q

(n)

(t)+g

(n)

(t), θ(t))−c

(n)

, 0]∀n ∈ N

(67)

Every slot t ∈ {0, 1, 2, . . .}, the algorithm chooses x

(n)

(t) ∈

(n)

for all n ∈ N, and chooses θ(t) ∈ Θ to minimize:

n=1

V f

(n)

(t), θ(t)) +

n=1

(n)

(t)g

(n)

(t), θ(t))

The difﬁculty is the joint selection of the θ(t) variables, which

couples all terms together in a centralized optimization.

B. A distributed approach

This subsection speciﬁes a distributed solution, along the

lines of the general solution methodology from [3]. The idea

is to introduce estimation vectors θ

(n)

(t) ∈ Θ at each node

n ∈ N. Consider the following problem:

Minimize:

n=1

(n)

, θ

(n)

) (68)

Subject to: g

(n)

, θ

(n)

) ≤ c

(n)

∀n ∈ N (69)

(n)

= θ

(j)

∀(n, j) ∈ L (70)

(n)

∈ X

(n)

∀n ∈ N (71)

(n)

∈ Θ ∀n ∈ N (72)

The constraints (70) are vector equality constraints. Speciﬁ-

cally, if θ

(n)

= (θ

(n)

, . . . , θ

(n)

), then the constraints are:

(n)

= θ

(j)

∀i ∈ {1, . . . , G}, ∀(n, j) ∈ L (73)

Now assume that if one changes the directed graph to an

undirected graph by changing all directed links to undirected

links, then the resulting undirected graph is connected (so that

there is a path from every node to every other node in the

undirected graph). With this connectedness assumption, the

problem (68)-(72) is equivalent to the original problem (63)-

(66). That is because for any nodes n and m in N, there is

a path in the undirected graph from n to m, and the equality

constraints (70) ensure that each node j on this path has

(j)

= θ

(n)

. It follows that the constraints (70) ensure that

the estimation vectors θ

(n)

are the same for all nodes n ∈ N.

The problem (68)-(72) can be solved via the drift-plus-

penalty framework of Section IV. For each inequality con-

straint (69) (that is, for each n ∈ N), deﬁne:

(n)

(t + 1) = max[Q

(n)

(t) + g

(n)

(t), θ

(n)

(t)) − c

(n)

, 0]

(74)

For each equality constraint (73) (that is, for each i ∈

{1, . . . , G} and (n, j) in L) deﬁne:

(n,j)

(t + 1) = Z

(n,j)

(t) + θ

(n)

(t) − θ

(j)

(t) (75)

Each node n ∈ N is responsible for updating queues Q

(n)

(t)

and Z

(n,j)

(t) for all i ∈ {1, . . . , G} and all j such that (n, j) ∈

L. Every slot t, decisions are made to minimize:

n=1

V f

(n)

(t), θ

(n)

(t))

n∈N

(n)

(t)g

(n)

(t), θ

(n)

(t))

i=1

(n,j)∈L

(n,j)

(t)(θ

(n)

(t) − θ

(j)

(t))

This is a separable optimization in each of the local vari-

ables x

(n)

(t) and θ

(n)

(t) associated with individual nodes

n ∈ N. Each node n ∈ N needs to know only its own internal

queues and the queue values Z

(a,n)

(t) of its neighbors. It is

assumed that these values can be obtained via message passing

on the links associated with each neighbor. The resulting

algorithm is as follows: Initialize all queues to 0. Every slot

t ∈ {0, 1, 2, . . .} do:

• Each node n ∈ N observes Q

(n)

(t) and the queues

(n,j)

(t) and Z

(a,n)

(t) for all (n, j) ∈ L and all

(a, n) ∈ L, and all i ∈ {1, . . . , G}. It then chooses

(n)

(t), θ

(n)

(t)) ∈ X

(n)

× Θ to minimize:

V f

(n)

(t), θ

(n)

(t)) + Q

(n)

(t)g

(n)

(t), θ

(n)

(t))

i=1

(n)

(t)





j|(n,j)∈L

(n,j)

(t) −

a|(a,n)∈L

(a,n)

(t)





• Each node n ∈ N updates Q

(n)

(t) via (74) and updates

(n,j)

(t) for (n, j) ∈ L via (75). The Z

(n,j)

(t) update

for node n requires all neighbors j such that (n, j) ∈ L

to ﬁrst pass their chosen θ

(j)

(t) vectors to node n, so that

the right-hand-side of (75) can be computed.

Fix ǫ > 0. Using V = 1/ǫ, the resulting time averages

(n)

(t) and θ

(n)

(t) converge to an O(ǫ) approximation with

convergence time O(1/ǫ

C. A different type of constraint

The problem (68)-(72) speciﬁes one constraint of the form

(n)

, θ) ≤ c

(n)

for each node n ∈ N. Suppose the

problem is changed so that these constraints (69) are replaced

by a single constraint of the form:

n∈N

(n)

, θ

(n)

) ≤ c (76)

for some given real number c. In principle, this could be treated

using a virtual queue:

J(t + 1 ) = max

J(t) +

n∈N

(n)

(t), θ

(n)

(t)) − c, 0

However, it is not clear which node should implement this

queue. Further, every slot t, that node would need to know

values of g

(n)

(t), θ

(n)

(t)) for all nodes n ∈ N, which is

difﬁcult in a distributed context.

One way to avoid this difﬁculty is as follows: Form new

variables x

(n,m)

∈ X

(n)

for all n, m ∈ N. The variable x

(n,m)

can be interpreted as the node m estimate of the optimal value

of x

(n)

. The constraint (76) is then replaced by:

n∈N

(n)

(n,1)

, θ

(1)

) ≤ c (77)

(n,m)

= x

(n,j)

∀n ∈ N, ∀(m, j) ∈ L (78)

(n,m)

∈ X

(n)

∀n ∈ N (79)

Node 1 is responsible for the constraint (77) and maintains a

virtual queue:

J(t + 1 ) = max

J(t) +

n∈N

(n)

(n,1)

(t), θ

(1)

(t)) − c, 0

Each node m ∈ N is responsible for the vector equality

constraints x

(n,m)

= x

(n,j)

for all n ∈ N and all (m, j) ∈ L.

These are enforced in the same manner as the constraints (70).

VII. CONCLUSIONS

This paper proves O(1/ǫ

) convergence time for the drift-

plus-penalty algorithm in a general situation where a Lagrange

multiplier vector exists, without requiring a Slater condition.

This holds for both stochastic optimization problems and

for (deterministic) convex programs. Special case implemen-

tations were given for convex programs, including linear

programs. Example solutions were also presented for solving

convex programs in a distributed way over a connected graph.

REFERENCES

[1] M. J. Neely. Stochastic Network Optimization with Application to

Communication and Queueing Systems. Morgan & Claypool, 2010.

[2] M. J. Neely. Distributed stochastic optimization via correlated schedul-

ing. ArXiv technical report, arXiv:1304.7727v2, May 2013.

[3] M. J. Neely. Distributed and secure computation of convex programs

over a network of connected processors. DCDIS Conf., Guelph, Ontario,

July 2005.

[4] M. J. Neely, E. Modiano, and C. Li. Fairness and optimal stochastic

control for heterogeneous networks. IEEE/ACM Transactions on Net-

working, vol. 16, no. 2, pp. 396-409, April 2008.

[5] M. J. Neely. Dynamic Power Allocation and Routing for Satellite

and Wireless Networks with Time Varying Channels. PhD thesis,

Massachusetts Institute of Technology, LIDS, 2003.

[6] L. Georgiadis, M. J. Neely, and L. Tassiulas. Resource allocation and

cross-layer control in wireless networks. Foundations and Trends in

Networking, vol. 1, no. 1, pp. 1-149, 2006.

[7] M. J. Neely. Energy optimal control for time varying wireless networks.

IEEE Transactions on Information Theory, vol. 52, no. 7, pp. 2915-2934,

July 2006.

[8] L. Tassiulas and A. Ephremides. Stability properties of constrained

queueing systems and scheduling policies for maximum throughput in

multihop radio networks. IEEE Transacations on Automatic Control,

vol. 37, no. 12, pp. 1936-1948, Dec. 1992.

[9] L. Tassiulas and A. Ephremides. Dynamic server allocation to parallel

queues with randomly varying connectivity. IEEE Transactions on

Information Theory, vol. 39, no. 2, pp. 466-478, March 1993.

[10] A. Eryilmaz and R. Srikant. Fair resource allocation in wireless networks

using queue-length-based scheduling and congestion control. IEEE/ACM

Transactions on Networking, vol. 15, no. 6, pp. 1333-1344, Dec. 2007.

[11] L. Huang and M. J. Neely. Delay reduction via Lagrange multipliers

in stochastic network optimization. IEEE Transactions on Automatic

Control, vol. 56, no. 4, pp. 842-857, April 2011.

[12] M. J. Neely. Energy-aware wireless scheduling with near optimal

backlog and convergence time tradeoffs. ArXiv technical report,

arXiv:1411.4740, Nov. 2014.

[13] S. Supittayapornpong, L. Huang, and M. J. Neely. Time-average

optimization with nonconvex decision set and its convergence. In Proc.

IEEE Conf. on Decision and Control (CDC), Los Angeles, California,

Dec. 2014.

[14] E. Wei and A. Ozdaglar. On the O(1/k) convergence of asynchronous

distributed alternating direction method of multipliers. In Proc. IEEE

Global Conference on Signal and Information Processing, 2013.

[15] D. P. Bertsekas, A. Nedic, and A. E. Ozdaglar. Convex Analysis and

Optimization. Boston: Athena Scientiﬁc, 2003.

[16] X. Lin and N. B. Shroff. Joint rate control and scheduling in multihop

wireless networks. Proc. of 43rd IEEE Conf. on Decision and Control,

Paradise Island, Bahamas, Dec. 2004.

[17] J. W. Lee, R. R. Mazumdar, and N. B. Shroff. Opportunistic power

scheduling for dynamic multiserver wireless systems. IEEE Transactions

on Wireless Communications, vol. 5, no.6, pp. 1506-1515, June 2006.

[18] H. Kushner and P. Whiting. Asymptotic properties of proportional-fair

sharing algorithms. Proc. 40th Annual Allerton Conf. on Communica-

tion, Control, and Computing, Monticello, IL, Oct. 2002.

[19] R. Agrawal and V. Subramanian. Optimality of certain channel aware

scheduling policies. Proc. 40th Annual Allerton Conf. on Communica-

tion, Control, and Computing, Monticello, IL, Oct. 2002.

[20] A. Stolyar. Maximizing queueing network utility subject to stability:

Greedy primal-dual algorithm. Queueing Systems, vol. 50, no. 4, pp.

401-457, 2005.

[21] A. Stolyar. Greedy primal-dual algorithm for dynamic resource alloca-

tion in complex networks. Queueing Systems, vol. 54, no. 3, pp. 203-220,

2006.

[22] A. Eryilmaz and R. Srikant. Joint congestion control, routing, and MAC

for stability and fairness in wireless networks. IEEE Journal on Selected

Areas in Communications, Special Issue on Nonlinear Optimization of

Communication Systems, vol. 14, pp. 1514-1524, Aug. 2006.