gsort — Ascending and descending sort

Title stata.com

Syntax Menu Description Options

Remarks and examples Also see

Syntax

gsort [ + | - ] varname



[ + | - ] varname . . .

 

, generate(newvar) mfirst



Data > Sort

Description

gsort arranges observations to be in ascending or descending order of the speciﬁed variables and

so differs from sort in that sort produces ascending-order arrangements only; see [D] sort.

Each varname can be numeric or a string.

The observations are placed in ascending order of varname if + or nothing is typed in front of the

name and are placed in descending order if - is typed.

Options

generate(newvar) creates newvar containing 1, 2, 3, . . . for each group denoted by the ordered

data. This is useful when using the ordering in a subsequent by operation; see [U] 11.5 by varlist:

construct and examples below.

mfirst speciﬁes that missing values be placed ﬁrst in descending orderings rather than last.

Remarks and examples stata.com

gsort is almost a plug-compatible replacement for sort, except that you cannot specify a general

varlist with gsort. For instance, sort alpha-gamma means to sort the data in ascending order of

alpha, within equal values of alpha; sort on the next variable in the dataset (presumably beta),

within equal values of alpha and beta; etc. gsort alpha-gamma would be interpreted as gsort

alpha -gamma, meaning to sort the data in ascending order of alpha and, within equal values of

alpha, in descending order of gamma.

Example 1

The difference in varlist interpretation aside, gsort can be used in place of sort. To list the 10

lowest-priced cars in the data, we might type

. use http://www.stata-press.com/data/r13/auto

. gsort price

. list make price in 1/10

2 gsort — Ascending and descending sort

or, if we prefer,

. gsort +price

. list make price in 1/10

To list the 10 highest-priced cars in the data, we could type

. gsort -price

. list make price in 1/10

gsort can also be used with string variables. To list all the makes in reverse alphabetical order,

we might type

. gsort -make

. list make

Example 2

gsort can be used with multiple variables. Given a dataset on hospital patients with multiple

observations per patient, typing

. use http://www.stata-press.com/data/r13/bp3

. gsort id time

. list id time bp

lists each patient’s blood pressures in the order the measurements were taken. If we typed

. gsort id -time

. list id time bp

then each patient’s blood pressures would be listed in reverse time order.

Technical note

Say that we wished to attach to each patient’s records the lowest and highest blood pressures

observed during the hospital stay. The easier way to achieve this result is with egen’s min() and

max() functions:

. egen lo_bp = min(bp), by(id)

. egen hi_bp = max(bp), by(id)

See [D] egen. Here is how we could do it with gsort:

. use http://www.stata-press.com/data/r13/bp3, clear

. gsort id bp

. by id: gen lo_bp = bp[1]

. gsort id -bp

. by id: gen hi_bp = bp[1]

. list, sepby(id)

This works, even in the presence of missing values of bp, because such missing values are placed

last within arrangements, regardless of the direction of the sort.

gsort — Ascending and descending sort 3

Technical note

Assume that we have a dataset containing x for which we wish to obtain the forward and reverse

cumulatives. The forward cumulative is deﬁned as F (X) = the fraction of observations such that

x ≤ X . Again let’s ignore the easier way to obtain the forward cumulative, which would be to use

Stata’s cumul command,

. set obs 100

. generate x = rnormal()

. cumul x, gen(cum)

(see [R] cumul). Eschewing cumul, we could type

. sort x

. by x: gen cum = _N if _n==1

. replace cum = sum(cum)

. replace cum = cum/cum[_N]

That is, we ﬁrst place the data in ascending order of x; we used sort but could have used gsort.

Next, for each observed value of x, we generated cum containing the number of observations that

take on that value (you can think of this as the discrete density). We summed the density, obtaining

the distribution, and ﬁnally normalized it to sum to 1.

The reverse cumulative G(X) is deﬁned as the fraction of data such that x ≥ X. To obtain this,

we could try simply reversing the sort:

. gsort -x

. by x: gen rcum = _N if _n==1

. replace rcum = sum(rcum)

. replace rcum = rcum/rcum[_N]

This would work, except for one detail: Stata will complain that the data are not sorted in the second

line. Stata complains because it does not understand descending sorts (gsort is an ado-ﬁle). To

remedy this problem, gsort’s generate() option will create a new grouping variable that is in

ascending order (thus satisfying Stata’s narrow deﬁnition) and that is, in terms of the groups it deﬁnes,

identical to that of the true sort variables:

. gsort -x, gen(revx)

. by revx: gen rcum = _N if _n==1

. replace rcum = sum(rcum)

. replace rcum = rcum/rcum[_N]

Also see

[D] sort — Sort data