Eliciting People's First-Order Concerns: Text Analysis of Open

Eliciting People’s First-Order Concerns:

Text Analysis of Open-Ended Survey Questions

By Beatrice Ferrario and Stefanie Stantcheva

∗

Surveys are a key tool for understanding

people’s views on public policies. They let

us slip into people’s minds and reveal other-

wise invisible things such as attitudes, per-

ceptions, reasonings, and beliefs. They can

shed light on how people reason about im-

portant policies that shape their daily lives,

such as health care, taxation, and trade pol-

icy. What eﬃciency and distributional im-

pacts do people have in mind when thinking

about these policies? What are their per-

ceived goals and social objectives?

To some extent, we can learn about sup-

port for some policies by observing citizens’

political behaviors. Yet, we lack data on

their more detailed policy preferences since

voting rarely happens on speciﬁc and sep-

arate issues. Furthermore, it is diﬃcult to

infer the reasoning underlying people’s pol-

icy views using observational data. Survey

methods are thus an invaluable complement

to our other research methods.

The backbone of surveys often consists

of closed-ended questions that provide a

ﬁxed set of answer options. The advantages

of these questions are that answer options

are standardized and streamlined across

respondents and they easily lend them-

selves to quantitative analysis. However, in

some settings, we may prime respondents

to think about (and, subsequently, perhaps

select) answer options that they would oth-

erwise not have thought about. Conversely,

we may omit relevant options that we do

not know about. In open-ended questions,

respondents are not oﬀered answer options,

but rather, an empty text entry ﬁeld in

∗

Ferrario: Harvard University, 1805 Cambridge

Street, Cambridge, MA 02138 (e-mail: beat-

rice

[email protected]ard.edu); Stantcheva (correspond-

ing author): Harvard University, CEPR, and NBER,

1805 Cambridge Street, Cambridge, MA 02138 (e-mail:

sstantchev[email protected]ard.edu). We thank Chantal

Pezold and Martha Fiehn for exceptional research

assistance.

which they can write freely. Open-ended

survey questions can therefore circumvent

some of the above-mentioned issues. By be-

ing less guided, they may teach us things

that we may otherwise have missed and

that we may not be used to thinking about

as economists. The answers to these open-

ended questions can be analyzed using text

analysis methods to shed light on the ﬁrst-

order considerations that come to people’s

minds without constraining them to think

about a limited set of answer options.

This paper illustrates the design and use

of open-ended survey questions, focusing

on the topics of income and estate taxation.

An abundant literature leverages sur-

vey data to explore people’s perceptions

and preferences about tax policy and

redistribution (Gimpelson and Treisman,

2018; Alesina, Stantcheva and Teso, 2018;

Stantcheva, 2021; Fisman et al., 2020;

Cruces, Perez-Truglia and Tetaz, 2013;

Karadja, Mollerstrom and Seim, 2017; Roth

and Wohlfart, 2018; Hvidberg, Kreiner and

Stantcheva, 2020). Perceptions (and mis-

perceptions) of tax rates are documented

in De Bartolome (1995), Gideon (2017),

Ballard and Gupta (2018), Rees-Jones

and Taubinsky (2019), Chetty, Friedman

and Saez (2013), Feldman, Katuˇsˇc´ak and

Kawano (2016), and Stantcheva (2021).

Text analysis methods of non-survey

data, such as online media and newspa-

per coverage, have been applied in ﬁnance

(Antweiler and Frank, 2004), macroeco-

nomics (Baker, Bloom and Davis, 2016),

and political economy (Groseclose and Mi-

lyo, 2005; Gentzkow and Shapiro, 2010; Te-

sei, Durante and Pinotti, 2018; Gentzkow,

Kelly and Taddy, 2019). Our goal is to

apply text analysis methods to data de-

rived from answers to open-ended survey

questions. A few papers in political sci-

2 PAPERS AND PROCEEDINGS MONTH YEAR

ence (Roberts et al., 2014; Brugidou, 2003)

leverage open-ended survey questions, and

the practice is also starting to spread to

economics (Stantcheva, 2020; Houde and

Wekhof, 2021).

The data for this paper comes from two

surveys on income and estate taxes, con-

ducted in 2019 on 5140 U.S. residents aged

18 to 70. The sample is representative

of the U.S. population along the dimen-

sion of gender, age, income, political af-

ﬁliation, and employment (see Appendix

OA-1). Section I presents the application

of text analysis to open-ended survey ques-

tions. Section II summarizes key results

about people’s views on income and estate

taxation.

I. Using Open-Ended Survey Questions

A. What do Open-ended Questions Measure?

Open-ended questions can go from broad

to narrow. Broader open-ended questions

are useful to elicit ﬁrst-order, intrinsic con-

cerns that people have before they are

prompted to think of a particular policy as-

pect with more directed questions. Thus, it

makes sense to start by asking people big

picture questions such as the “main consid-

erations” that come to their minds when

they think about an issue (e.g., the income

or estate tax). In our application, we then

narrow the focus by asking people what

a “good” tax system means to them and

what its goals should be, as well as what

their main perceived shortcomings of the

current U.S. tax system are. Finally, one

can ask targeted questions, such as about

the eﬀects on the U.S. economy and on dif-

ferent groups of people if the policy were

changed (e.g., “What would be the eﬀects

on the economy if taxes on high earners

were raised?”).

Ideally, open-ended ques-

tions should be complemented with closed-

ended questions for cross-validation.

It is useful to think about what the

answers to open-ended questions capture.

The answers of respondents who have

Appendix Section OA-2 provides all the questions

asked.

not previously thought carefully about the

topic may be “gut reactions.” These reac-

tions are informative, as they reﬂect what

a respondent thinks and will keep think-

ing, absent more learning or targeted re-

ﬂection. The answers of respondents who

have already thought about the topic pre-

viously or take time to think about it dur-

ing the survey before answering may reﬂect

more profound views.

Either way, answers

to open-ended questions capture the ﬁrst-

order considerations that matter to people

and the aspects of an issue that are top of

mind for them.

B. Text Analysis Methods for Open-Ended

Questions

Data pre-processing

To prepare the data for text analysis, we

ﬁrst parse the answers to reduce the number

of distinct text elements. We remove punc-

tuation, excess spaces, numbers, misspelled

words, and so-called “stop words,” which

are common words that carry no intrinsic

meaning such as “and” or “the.” The re-

maining words in each answer are then lem-

matized to group all inﬂected forms of a

word.

Words appearing in the question it-

self or that occur generically in answers can

also be removed (e.g., “think,” “believe,”

and “should”). Appendix OA-4 describes

the data pre-processing in detail. We now

brieﬂy present three text analysis methods,

with more details in Appendix OA-5.

Word Clouds

For each of the methods presented, a de-

cision has to be made on the basic unit of

analysis, i.e., the size of word groups that

will be considered as a set. “N-grams” are

groups of n words. In word clouds, the font

size for each n-gram is proportional to its

frequency. Word clouds are best used as

a ﬁrst step in visualizing the data and for

scanning answers quickly. Their weakness

The time spent on each question can be measured

and thus, it is possible to distinguish between these two

types of responses.

For instance, “policies” becomes “policy,” “were”

becomes “be.”

VOL. VOL NO. ISSUE PEOPLE’S FIRST-ORDER CONCERNS ON TAX POLICY 3

is that they do not account for synonyms.

Hence, topics for which there are many pos-

sible words to express the same thought

may be artiﬁcially diluted, while niche top-

ics that feature clear buzzwords may be in-

ﬂated in importance.

Keyness analysis

Keyness analysis is based on a relative

frequency analysis that compares the use

of n-grams between two groups (a reference

and a target/study group). The keyness

scores of an n-gram are based on the χ

test statistic for the null hypothesis that the

propensity to use the n-gram is the same for

the reference and target groups. In a nut-

shell, the keyness score of a term measures

how characteristic this term is of the refer-

ence group. Words that are common, but

used relatively equally by the two groups do

not have a high keyness score.

Topic Analysis

The topic analysis is based on a

keywords-count model. Topics are deﬁned

by sets of keywords. To extract the top-

ics and associated keywords, approaches

range from manual to semi-supervised or

unsupervised (see Appendix ?? for a sum-

mary of some key methods). Many of these

methods are developed for longer text and

are not that well-suited for survey answers

which are shorter in length. In practice,

given the manageable sample sizes, a more

guided approach does better. We recom-

mend extracting the “document-term ma-

trix” (matrix of frequencies of terms in each

answer), plotting the distributions of words,

and checking many sample answers to bet-

ter understand how words are used by re-

spondents. Oftentimes, themes and com-

monly used words appear quite clearly from

the frequency distributions. It is, however,

important to do sensitivity checks on the

topics delineated and on the keywords in-

cluded. Among other decisions that need

to be made (and which warrant sensitiv-

ity analysis) are whether to count a topic

that is mentioned multiple times by a re-

spondent only once or not, and whether

to ﬁlter out diﬀerences in answer lengths

across groups by computing topic distribu-

tions within groups.

II. Application: How Do People Think

About Taxes?

To apply these methods to how peo-

ple think about income and estate taxa-

tion, we focus on answers to the broad

question “What are your main considera-

tions?” when thinking about income or es-

tate taxes, respectively. The other open-

ended questions are analyzed in Appendix

OA-8. Figure 1 shows the word clouds de-

rived from the responses. For the income

tax, respondents express disagreement with

the current levels of taxes and views on the

direction in which to change them (“lower

(a) Income Tax

lower tax

higher tax

middle class

fair share

lower class

fair tax

working class

flat tax

tax people

people higher

lower people

higher higher

tax higher

lower middle

tax class

tax middle

class people

rich people

tax tax

rich tax

rich poor

upper class

higher lower

tax lower

people work

double tax

government spending

everyone fair

middle lower

people tax

tax good

everyone share

wealthy tax

tax high

higher people

government spend

work hard

cant afford

family tax

tax everyone

end meet

social security

working people

tax share

wealthy fair

national debt

people lower

poor people

tax wealthy

class class

class higher

people rich

higher class

across board

rich fair

government waste

lower government

higher wealthy

lower working

people spend

poor rich

very high

tax break

people people

class rich

rich rich

sales tax

higher rich

people afford

social program

high earner

class tax

tax government

wealthy people

lower lower

higher earner

tax poor

people high

poor class

tax rich

class poor

rich share

people struggle

universal healthcare

tax fair

take home

class middle

lower higher

people government

government good

people hard

tax level

good people

class upper

tax working

capital gain

many people

rich higher

everyone tax

middle people

very rich

people back

people cant

people fair

wealthy share

progressive tax

middle working

lower family

working middle

class lower

higher middle

level tax

lower good

tax economy

people take

take paycheck

high people

government control

nothing come

take away

tax benefit

wealthy higher

economic growth

close loophole

government program

good job

tax loophole

fair everyone

american people

paycheck paycheck

fair people

poor middle

support government

long term

hard people

tax always

people keep

government people

standard deduction

class too much

higher everyone

whether tax

hard earn

class fair

affect economy

two thousand

tax across

fund government

higher level

spend economy

class family

middle poor

people earn

working poor

government take

good economy

higher good

loophole wealthy

higher afford

tax low

poor lower

affect personally

take check

middle higher

upper middle

business business

affect family

fair rich

poor poor

hard working

double high

favor tax

president trump

poverty level

higher earn

everyone government

afford higher

tax afford

tax business

people working

government provide

low people

minimum wage

fifty thousand

working citizen

economy government

tax spend

many loophole

loophole rich

government fund

lower everyone

high low

different class

ten million

hundred thousand

break rich

class big

raise lowering

break working

people break

(b) Estate Tax

double tax

fair tax

middle class

family tax

lower class

lower tax

work hard

tax tax

tax twice

lower income

tax no tax

tax bracket

tax double

working class

income tax

higher lower

tax government

tax free

income bracket

government tax

lower family

fair share

higher income

government right

tax income

five million

cant afford

hundred thousand

wealthy tax

work life

low income

higher tax

tax higher

income family

transfer tax

upper class

tax high

tax transfer

family farm

left behind

higher rich

two thousand

higher higher

rich poor

good way

family government

lower high

tax time

cost live

wealthy family

way government

rich rich

lower higher

likely tax

middle lower

higher little

tax fair

rich tax

lower inherit

small business

wealthy higher

higher bracket

government family

afford higher

tax lower

family family

lower lower

tax inherit

ten million

since tax

income inherit

class lower

poor class

lower middle

family without

inherit tax

higher wealthy

lower government

capital gain

financial burden

tax way

transfer fund

bracket higher

save tax

family no tax

poor rich

burden family

hard work

asset tax

lower good

flat tax

family business

higher class

tax impose

lower work

lower poor

high income

tax asset

transfer higher

fair inherit

tax good

tax live

lower way

way lower

next generation

hard life

lose lovedone

little higher

whole life

slide scale

government transfer

first place

higher way

hard tax

ultra wealthy

tax died

income level

class family

entire life

wealthy fair

income income

rich higher

either way

hard family

wealthy income

lower since

high tax

no tax government

government revenue

financial situation

inherit large

hard save

lower burden

tax previously

life insurance

tax save

lower bracket

family generation

lower hard

lower afford

low family

tax wealthy

lowered tax

lower fair

government greedy

lower lose

large sum

tax since

exist tax

government double

lower no tax

no tax tax

government no tax

work save

class poor

belong family

life government

good government

eliminate tax

tax fund

family good

class afford

higher middle

higher transfer

work child

family work

owner tax

benefit wealthy

life work

farm business

work tax

work family

Figure 1. : Main Considerations about In-

come and Estate Taxes

Note: Word clouds based answers to open-ended

question about respondents’ main considerations about

income and estate taxes.

4 PAPERS AND PROCEEDINGS MONTH YEAR

tax” or “higher tax,” depending on the

groups they refer to) and concern about the

impacts on the “middle class.” For the es-

tate tax, respondents’ ﬁrst-order reaction

centers around “double taxation” and the

fairness of the estate tax, followed closely

by concerns about the “middle class” and

“family.”

A. Key topics mentioned

Figure 2 shows the distribution of top-

ics in respondents’ answers by political af-

ﬁliation. For the income tax, there are

eight distinct topics: Distribution, Fair-

ness, Government Spending, Social Insur-

ance, Eﬃciency, Loopholes, Flat Tax, Pub-

lic Goods, and Don’t Know (the labels are

chosen by us and need not be words used by

respondents). For instance, the Distribu-

tion topic contains keywords such as “mid-

dle class,” “low income, “millionaire.” The

Fairness issue contains the words “fair” and

“unfair.” Public Goods captures “infras-

tructure,” “education,” and “health care.”

Eﬃciency is represented by words such as

“hurt economy,” “work less,” “competi-

tion,” and “spend less,” among others. The

ﬁnal category is for respondents who ex-

press that they do not know enough about

the policy to give a meaningful answer.

Appendix OA-6 lists the full set of key-

words identifying each topic, and example

answers are in Appendix OA-7. An answer

is counted as mentioning a topic if it con-

tains at least one of the topic keywords. An-

swers can thus contain multiple topics.

The topics of Distribution and Fairness,

as well as Government Spending and Loop-

holes are most frequently mentioned. The

Eﬃciency topic does not appear to be top

of mind for most people. These results

echo Stantcheva (2021), which uses a ﬁner

a ﬁner decomposition analysis that lever-

ages a series of (non open-ended) survey

questions. The paper ﬁnds that distribu-

tional and fairness considerations dominate

eﬃciency concerns in shaping people’s tax

policy views. Thus, people’s responses to

the open-ended questions already provide

meaningful insights into what matters to

them.

(a) Income Tax

Distribution

Fairness

Gov. Spending

Social Safety

Efficiency

Flat Tax

Loopholes

Public Goods

Don't Know

Clinton Liberal Clinton Moderate Trump Moderate Trump Conservative

(b) Estate Tax

Distribution

Fairness

Gov. Spending

Efficiency

Loopholes

Double Tax

Grieve

Public Goods

Don't Know

Clinton Liberal Clinton Moderate Trump Moderate Trump Conservative

Figure 2. : Main Topics Respondents think

About Regarding the Income and Estate

Tax by Political Aﬃliation

Note: The ﬁgure shows distribution of topics mentioned

in the answers about main considerations about income

and estate taxes. The bars represent the number of

times a topic was mentioned out of the total mentions

of any topic by political group.

For the estate tax in Panel B, the top-

ics are similar to those from the income

tax survey. Speciﬁc to the estate tax is

the Double Tax topic, captured by key-

words such as “already taxed,” “already

paid,” and “taxed/paid twice” and the

Grieve topic, which is captured by terms

such as “grieve,” “bury,” and “funeral.”

The distribution of topics is again heav-

ily centered around issues of Distribution

and Fairness, but Double Tax also carries

a lot of weight. Much rarer are mentions

of Government Spending, Eﬃciency, Loop-

holes, Grieve, and Public Goods. Further-

more, more people express a lack of knowl-

edge about the estate tax than the income

tax.

B. Political diﬀerences in topics mentioned

It is useful to plot topic distributions for

diﬀerent groups to see whether ﬁrst-order

concerns are heterogeneous across respon-

dents. Figure 2 does this by political aﬃlia-

tion, while Appendix OA-8 provides cuts by

age, income, and education groups. Politi-

VOL. VOL NO. ISSUE PEOPLE’S FIRST-ORDER CONCERNS ON TAX POLICY 5

cal leanings are captured by both the vote

in 2016 (Clinton vs. Trump) and by a self-

reported degree of conservatism vs. liberal-

ism (see Appendix OA-3 for details). The

four main categories that emerge are: Clin-

ton Liberals, Clinton Moderates, Trump

Moderates, and Trump Conservatives.

There are clear political diﬀerences in the

topics mentioned. Issues related to Dis-

tribution are much more prevalent on the

left side of the political spectrum than on

the right: they represent 50% of topics

mentioned by Clinton Liberals and 25% by

Trump Conservatives. Government spend-

ing, on the other hand, is a more pressing

concern for respondents on the right, ac-

counting for less than 4% of topic mentions

for Clinton Liberals and 22% for Trump

Conservatives. Eﬃciency is not frequently

mentioned by any political group. Fair-

ness mentions are quite evenly distributed

across the political spectrum. However,

even though everyone cares about fair-

ness, the meaning of this concept greatly

diﬀers across respondents: fairness is in

the eye of the beholder. The answers

to this broad open-ended question thus

prompt us to ask more detailed (open-

and closed-ended) questions about fairness

(Stantcheva, 2021).

Figure 3 shows the keywords that are

most speciﬁc to Democrats and Repub-

licans. Consistent with the topic distri-

bution, the keywords on the Democrat

side center around issues of Distribution

(“poor rich,” “lower class,” “middle class,”

and “tax wealthy”). Republicans tend

to emphasize Government Spending, Gov-

ernment Waste, and the Economic Costs

(“people work,” “economy spend”).

For the estate tax, there are also clear dif-

ferences in the topics mentioned by politi-

cal leanings. Distribution issues are most

prevalent among Clinton Liberals and di-

minish rapidly and monotonically towards

Trump Conservatives. Conversely, Dou-

ble Taxation concerns are prevalent among

Trump Conservatives, but quite rare among

Clinton Liberals. The mentions of Fairness

are again evenly distributed across the po-

litical spectrum. These patterns are con-

ﬁrmed by the keyness analysis (Panel B of

Figure 3).

Figure OA-2 shows the distribution of re-

sponses to the question “Who gains if taxes

on high earners were increased?” Among

Democrats, prevalent answers are “the mid-

dle class” and “everybody.” Among Repub-

(a) Income Tax

higher middle *

poor class *

working class *

not fair **

upper class **

higher class **

tax wealthy **

lower middle **

higher rich **

tax middle **

higher wealthy ***

fair share ***

higher earner ***

rich people ***

tax class ***

fair rich ***

higher tax ***

lower class ***

higher people ***

poor rich ***

middle class ***

government waste ***

government spending ***

economy spend ***

government lower ***

flat tax ***

people work **

lower tax *

live paycheck *

−10

Considerations

Chi2

Democrat Republican

(b) Estate Tax

class working *

income bracket **

lower class **

middle class ***

already tax ***

family government ***

hard work ***

family tax ***

tax twice ***

double tax ***

already no tax ***

income tax **

already lower *

−75

−50

−25

Considerations

Chi2

Democrat Republican

Figure 3. : Keywords Mentioned by

Democrats and Republicans Regarding the

Income and Estate Tax

Note: The ﬁgure shows keywords among Democrats and

Republicans in answers to the question about respon-

dents’ main considerations on the income and estate

taxes. The score reported for a set of two words is

the χ

- test statistic, testing the null hypothesis that

the occurrence of the given keywords is the same among

Democrats and Republicans. * p ă 0.1, ** p ă 0.05,

*** p ă 0.01. See Appendix OA-5.2.

6 PAPERS AND PROCEEDINGS MONTH YEAR

licans, common answers are “nobody” and

“government and politicians.” This shows

that beliefs in “trickle-down” economics are

more prevalent on the right. Figure OA-6

shows similar perceived winners and losers

from estate tax increases.

On the perceived shortcomings of the in-

come tax, Figure OA-3 shows that key-

words speciﬁc to Democrats relate to Loop-

holes and Tax Breaks for the Rich and Cor-

porations, while for Republicans they cen-

ter around Government Spending and Work

(i.e., that taxes discourage work and that it

is unfair to tax hard-working people).

III. Conclusion

Open-ended survey questions oﬀer the

potential to elicit people’s ﬁrst-order con-

siderations on policy issues. By not con-

straining respondents to a given set of

answer choices, they avoid priming them

to think of otherwise non-salient options

or omitting relevant options. Open-ended

questions can range from broader to more

targeted. Leveraging recent advances in

text analysis, their answers can be visual-

ized and quantitatively studied to shed light

across many areas of economics.

REFERENCES

Alesina, Alberto, Stefanie Stantcheva, and

Edoardo Teso. 2018. “Intergenerational mobil-

ity and preferences for redistribution.” American

Economic Review, 108(2): 521–54.

Antweiler, Werner, and Murray Z. Frank.

2004. “Is All That Talk Just Noise? The In-

formation Content of Internet Stock Message

Boards.” The Journal of Finance, 59(3): 1259–

1294.

Baker, Scott R, Nicholas Bloom, and

Steven J Davis. 2016. “Measuring economic

policy uncertainty.” The Quarterly Journal of

Economics, 131(4): 1593–1636.

Ballard, Charles L, and Sanjay Gupta. 2018.

“Perceptions and realities of average tax rates

in the federal income tax: evidence from Michi-

gan.” National Tax Journal, 71(2): 263–294.

Brugidou, Mathieu. 2003. “Argumentation and

Values: an analysis of ordinary political compe-

tence via an open-ended question.” International

Journal of Public Opinion Research, 15(4): 413–

430.

Chetty, Raj, John N Friedman, and Em-

manuel Saez. 2013. “Using Diﬀerences in

Knowledge across Neighborhoods to Uncover the

Impacts of the EITC on Earnings.” American

Economic Review, 103(7): 2683–2721.

Cruces, Guillermo, Ricardo Perez-Truglia,

and Martin Tetaz. 2013. “Biased Perceptions

of Income Distribution and Preferences for Re-

distribution: Evidence from a Survey Experi-

ment.” Journal of Public Economics, 98(C): 100–

112.

De Bartolome, Charles AM. 1995. “Which tax

rate do people use: Average or marginal?” Jour-

nal of Public Economics, 56(1): 79–96.

Feldman, Naomi E, Peter Katuˇsˇc´ak, and

Laura Kawano. 2016. “Taxpayer confusion:

Evidence from the child tax credit.” American

Economic Review, 106(3): 807–35.

Fisman, Raymond, Keith Gladstone, Ilyana

Kuziemko, and Suresh Naidu. 2020. “Do

americans want to tax wealth? evidence from

online surveys.” Journal of Public Economics,

188: 104207.

Gentzkow, Matthew, and Jesse M Shapiro.

2010. “What drives media slant? Evidence from

US daily newspapers.” Econometrica, 78(1): 35–

71.

Gentzkow, Matthew, Bryan Kelly, and Matt

Taddy. 2019. “Text as Data.” Journal of Eco-

nomic Literature, 57(3): 535–74.

Gideon, Michael. 2017. “Do Individuals Perceive

Income Tax Rates Correctly?” Public Finance

Review, 45(1): 97–117.

Gimpelson, Vladimir, and Daniel Treisman.

2018. “Misperceiving inequality.” Economics &

Politics, 30(1): 27–54.

Groseclose, Tim, and Jeﬀrey Milyo. 2005. “A

measure of media bias.” The Quarterly Journal

of Economics, 120(4): 1191–1237.

Houde, S´ebastien, and Tobias Wekhof. 2021.

“The narrative of the energy eﬃciency gap.”

Economics Working Paper Series, 21.

Hvidberg, Kristoﬀer, Claus Kreiner, and

Stefanie Stantcheva. 2020. “Social Position

and Fairness Views.” NBER Working Paper

28099.

Karadja, Mounir, Johanna Mollerstrom, and

David Seim. 2017. “Richer (and Holier) than

Thou? The Eﬀect of Relative Income Improve-

ments on Demand for Redistribution.” Review of

Economics and Statistics, 99(2): 201–212.

Rees-Jones, Alex, and Dmitry Taubinsky.

2019. “Measuring “Schmeduling”.” The Review

of Economic Studies, 87(5): 2399–2438.

Roberts, Margaret E, Brandon M Stewart,

Dustin Tingley, Christopher Lucas, Jet-

son Leder-Luis, Shana Kushner Gadarian,

Bethany Albertson, and David G Rand.

2014. “Structural topic models for open-ended

survey responses.” American Journal of Politi-

cal Science, 58(4): 1064–1082.

Roth, Christopher, and Johannes Wohl-

fart. 2018. “Experienced Inequality and Prefer-

ences for Redistribution.” Journal of Public Eco-

nomics, 167: 251–262.

Stantcheva, Stefanie. 2020. “Understanding Eco-

nomic Policies: What Do People Know and How

Can They Learn?” Harvard University Working

Paper.

Stantcheva, Stefanie. 2021. “Understanding tax

policy: How do people reason?” The Quarterly

Journal of Economics, 136(4): 2309–2369.

Tesei, A, R Durante, and P Pinotti. 2018.

“The Political Legacy of Entertainment TV.”

American Economic Review.

ONLINE APPENDIX

for “Eliciting People’s First-Order Concerns:

Text Analysis of Open-Ended Survey Questions”

by Beatrice Ferrario and Stefanie Stantcheva

Contents

OA-1 Representativeness of the Sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . OA-2

OA-2 Open-ended Questions Wording . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . OA-3

OA-3 Political Aﬃliation Categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . OA-3

OA-4 Preprocessing Steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . OA-5

OA-4.1 Word clouds and Keyness Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . OA-5

OA-4.2 Topic Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . OA-5

OA-5 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . OA-6

OA-5.1 Word clouds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . OA-6

OA-5.2 Keyness and keyword graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . OA-6

OA-5.3 Topic analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . OA-7

OA-6 Topic Keywords . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . OA-8

OA-7 Sample of Answers to the Open-ended Question “What are your Main Considerations about

the Policy?” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . OA-9

OA-7.1 Income Taxation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . OA-9

OA-7.2 Estate Tax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . OA-10

OA-8 Additional Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . OA-11

List of Figures

OA-1 Word clouds for the Income Tax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . OA-11

OA-2 Distributional eﬀects of Income Tax Increase: Who Gains if Taxes on High Earners were to

be Increased? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . OA-13

OA-3 Keywords of Income Taxation by Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . OA-14

OA-4 Word clouds for the Estate Tax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . OA-16

OA-5 Keywords of Estate Tax by Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . OA-17

OA-6 Distributional eﬀects of Estate Tax Increase: Who Loses if the Estate Tax were Increased? OA-19

OA-7 Probability of Mentioning Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . OA-20

OA-8 Topic Distribution by Political Aﬃliation for the Income Tax . . . . . . . . . . . . . . . . . . OA-22

OA-9 Topic Distribution by Income Groups for the Income Tax . . . . . . . . . . . . . . . . . . . OA-23

OA-10 Topic Distribution by Age Group for the Income Tax . . . . . . . . . . . . . . . . . . . . . . OA-24

OA-11 Topic Distribution by Political Aﬃliation of the Estate Tax . . . . . . . . . . . . . . . . . . OA-25

OA-12 Topic Distribution by Income Groups of the Estate Tax . . . . . . . . . . . . . . . . . . . . OA-26

OA-13 Topic Distribution by Age Group of the Estate Tax . . . . . . . . . . . . . . . . . . . . . . . OA-27

OA-1

OA-1 Representativeness of the Sample

For more details on the surveys, sample, data collection, and data analysis, see Stantcheva (2021). The full

questionnaire is in Section OA-2 of the Online Appendix of that paper.

Table OA-1: Sample Characteristics

US Population Income Tax Survey Estate Tax Survey

Male 0.49 0.48 0.46

18-29 years old 0.24 0.23 0.22

30-39 years old 0.20 0.20 0.20

40-49 years old 0.18 0.19 0.19

50-59 years old 0.19 0.21 0.19

60-69 years old 0.19 0.18 0.19

$0-$19,999 0.13 0.15 0.16

$20,000-$39,999 0.16 0.19 0.19

$40,000-$69,999 0.21 0.23 0.24

$70,000-$109,999 0.20 0.19 0.19

$110,000+ 0.31 0.24 0.20

Four-year college degree or more 0.34 0.48 0.46

High-school graduate or less 0.38 0.19 0.19

Employed 0.70 0.63 0.62

Unemployed 0.03 0.07 0.06

Self-employed 0.07 0.07 0.06

Married 0.53 0.55 0.53

White 0.61 0.76 0.76

Black/African-American 0.12 0.06 0.06

Hispanic/Latino 0.18 0.06 0.07

Asian/Asian-American 0.06 0.07 0.07

Democrat 0.30 0.34 0.35

Republican 0.26 0.31 0.30

Independent 0.42 0.33 0.33

Voted for Clinton in the 2016 Presidential Election 0.48 0.44 0.44

Voted for Trump in the 2016 Presidential Election 0.46 0.44 0.44

Sample size 2784 2360

Notes: This table displays statistics for the overall US population (column 1) and compares it to the characteristics of the

samples of the income tax and estate tax surveys (columns 2 and 3). National statistics on gender, age, income brackets, race,

education, marital status, and employment status are from the IPUMS-CPS-ASEC dataset for March 2019 (Flood et al., 2020).

National statistics on party aﬃliation for March 2019 are from Gallup (2019). Statistics on 2016 Presidential Election Results

are from Leip (2019). See Stantcheva (2021) for details on how the summary statistics are constructed.

OA-2

OA-2 Open-ended Questions Wording

This section provides the open-ended questions that were asked to respondents in the surveys. See Stantcheva

(2021) for the full questionnaires.

We now want to ask you a few broader questions. Please use the text boxes below and write as much as you

feel like. Your opinion and thoughts are important to us! There is no right or wrong answer.

Income Taxation Survey

1. When you think about federal personal income taxation and whether the U.S. should have higher or

lower federal personal income taxes, what are the main considerations that come to your mind?

2. What would be a “good” federal tax system in your view? What would be the goal of a good tax system?

3. What do you think are the issues with or shortcomings of the U.S. federal income tax system?

4. Which important aspects of the U.S. federal income tax system would you say are not discussed enough

in the current policy debate?

5. What do you think would be the eﬀects on the U.S. economy if the federal personal income taxes were

increased?

6. Which groups of people do you think would gain if federal personal income taxes on high earners were

increased?

7. Which groups of people do you think would lose if federal personal income taxes on high earners were

increased?

Estate Tax Survey

1. The federal estate tax is a tax imposed on the transfer of wealth from a deceased person to his or her

heirs. When you think about the federal estate tax and whether the U.S. should have a higher or a

lower federal estate tax, what are the main considerations that come to your mind?

2. In your view, what would be a “good” federal estate tax that you would be satisﬁed with? What would

be the goal of a good estate tax system?

3. What do you think are the shortcomings of the U.S. federal estate tax?

4. What do you think would be the eﬀects on the U.S. economy if the federal estate tax were increased?

5. Which groups of people do you think would gain if the federal estate tax were increased?

6. Which groups of people do you think would lose if the federal estate tax were increased?

OA-3 Political Aﬃliation Categories

This section reports the questions that we used to construct the political categories of Figure 3, the deﬁnition

of political categories, summary statistics of the political leanings in the sample.

1. Did you vote in the last presidential election?

Yes; No

2. (If “Yes” to 1) In the last presidential election, supported:

Hillary Clinton; Donald Trump; Jill Stein; Gary Johnson; Other

(If “No” to 1) Even if you did NOT vote, please indicate the candidate that you were most likely to

have voted for or who represents your views more closely.

Hillary Clinton; Donald Trump; Jill Stein; Gary Johnson; Other

OA-3

3. On economic policy matters, where do you see yourself on the liberal/conservative spectrum? Very

liberal; Liberal; Moderate; Conservative; Very conservative

Deﬁnition of Variables:

Trump Conservative: respondent supported Donald Trump in the 2016 presidential elections and has

“conservative” or “very conservative” views on economic policy matters.

Trump Moderate: respondent supported Donald Trump in the 2016 presidential elections and has

“moderate” views on economic policy matters.

Clinton Moderate: respondent supported Hillary Clinton in the 2016 presidential elections and has

“moderate” views on economic policy matters.

Clinton Liberal: respondent supported Hillary Clinton in the 2016 presidential elections and has “lib-

eral” or “very liberal” views on economic policy matters.

Table OA-2: Political Affiliation and Views on Economic Policy Matters

in the Income Tax Survey

D. Trump H. Clinton G. Johnson J. Stein Other

Liberal 7% 54% 12% 53% 15% 29%

Moderate 32% 38% 62% 37% 57% 39%

Conservative 61% 7% 26% 10% 28% 32%

Total 100% 100% 100% 100% 100% 100%

Notes: The table shows the distribution of the respondents’ views on economic policy matters by political candidate supported

in the 2016 presidential elections in the income tax survey sample.

Table OA-3: Political Affiliation and Views on Economic Policy Matters

in the Estate Tax Survey

D. Trump H. Clinton G. Johnson J. Stein Other

Liberal 8% 55% 16% 43% 19% 30%

Moderate 31% 38% 54% 45% 58% 38%

Conservative 61% 7% 31% 12% 23% 32%

Total 100% 100% 100% 100% 100% 100%

Notes: The table shows the distribution of the respondents’ views on economic policy matters by political candidate supported

in the 2016 presidential elections in the estate tax survey sample.

OA-4

OA-4 Preprocessing Steps

This section summarizes the preprocessing steps performed to carry out the text analysis. The ﬁrst step,

whose goal is to reduce the number of distinct elements, is common to all three methods, whereas the

subsequent steps are speciﬁc to each of the methods.

Given an answer d

1. Parse d

: lower-case every word, remove punctuation, spaces in excess, numbers, misspelled words, very

common words that carry no intrinsic meaning (“stopwords”) such as “and,” “the,” “each,” “then”.

OA-4.1 Word clouds and Keyness Graphs

2. Lemmatizing remaining words, i.e. grouping together the inﬂected forms of a word so they can be

analysed as a single item.

– Use the English lemmatization list available in the lexicon package.

– e.g., : “policies” becomes policy, “were” becomes “be” → reduces number of distinct textual

elements

→ Output: (

)

3. Remove words coming from the question as well as extra words related to the structure of answer.

– e.g., for the question, “what are your main considerations about income tax system?” : remove

“main,” “considerations,” “income,” from the answers, as well as “think,” “believe,” “should,”

etc.

4. Transform

into numerical vector c

in which each element is a 2-gram, i.e. a 2-component expression

of two words which were separated by 0 or 1 word in the original text. Group together 2-grams which

correspond to the same inverted two words. Manually remove 2-grams which have no grammatical

foundation and duplicated 2-grams (e.g., “tax tax”).

– e.g., take d

= “We should tax the wealthy more and tax the poor less.” After steps 1-2-3 becomes:

= “tax wealthy more tax poor less”. After step 4 becomes: [‘tax wealthy’ = 1, ‘tax more’ = 2,

‘wealthy more’ = 0 (because it is not grammatically coherent), ‘tax poor’ = 1, ‘poor people’ = 0,

...]

OA-4.2 Topic Analysis

2. Reduce remaining words to common root (stemming).

– Use Snowball stemming algorithm

– policies and policy become polic → reduces number of distinct textual elements.

→ Output: (

)

3. Transform

into numerical vector c

in which each element is the count of a distinct word.

– e.g., take d

= “We should tax the wealthy more and the poor less.”

∗ After Steps 1-2 becomes:

= “tax wealthi more poor less”

∗ After Step 3 becomes: [‘tax’ = 1, ‘wealthi’ = 1, ‘more’ = 1, ‘poor’ = 1, ‘less’ = 1 ...]

4. Generate topic dummy variables equal to 1 when an element of c

matches a custom-made topic

dictionary.

OA-5

OA-5 Methods

OA-5.1 Word clouds

The simplest text analysis consists in plotting “word clouds,” which are based on a frequency analysis.

The frequency of each identiﬁed n-gram is computed and the feature label is plotted proportional to this

frequency. The word clouds and keyness ﬁgures are based on the package by Benoit et al. (2018). The

transparency and font size are proportional to the frequency of each group of words relative to the total.

Pros and Cons. Word clouds are a key tool for visualizing responses to open-ended survey questions. The

font size of a word group is displayed proportionally to its frequency, yielding an insightful starting point for

the analysis of text (Heimerl et al., 2014).

By clearly presenting which themes seem to be common among people’s answers, word clouds can equip

researchers with a ﬁrst intuition for topics that could be fruitful to explore further. This quick assessment

of answers is particularly useful when analyzing large text corpora in the age of Big Data, or in our case, a

large number of open-ended responses to survey questions.

This approach, however, comes with certain drawbacks. For instance, word clouds do not account for

synonyms, i.e., they will fail to link only slightly diﬀerent words to the same underlying idea. As a result,

themes that can be described in various ways will be less prevalent in the word cloud, while the importance

of topics that are clearly associated with certain “buzzwords” might be over-represented. Beyond that, word

clouds take the individual words out of context, potentially leading to a loss of interpretability or meaning.

Overall, word clouds are a useful way to visualize and explore textual data. It would be premature, however,

to draw any ﬁnal conclusions about the importance of topics simply based on the font size of occurring

words. Instead, the themes communicated in word clouds should be seen as a ﬁrst step in text analysis and

used as cautiously-interpreted guidance for further analysis.

OA-5.2 Keyness and keyword graphs

Related Literature. We draw on an in-depth overview by Gabrielatos (2018), which describes the history

of keyness analysis and provides a critical review of its applications. In addition, Stubbs (2010) digs into the

notion of “keywords,” which parallels the idea of keyness.

Among the ﬁrst to conduct keyness analysis is Leech and Fallon (1992), who study the drivers of diﬀerences

between cultures by comparing corpora in American and British English. Today, keyness analysis is under-

stood primarily as a comparison of frequencies of words in the so-called study and reference groups. This

approach can be used to elicit degrees of similarity or diﬀerence between the studied groups (Gentzkow and

Shapiro, 2010; Taylor, 2013; Partington, 2014).

The keyness graphs in this paper are based on a relative frequency analysis that compares the use of n-grams

identiﬁed in the corpus between two groups (a reference and a target group). To establish the keyness score

of a given item, we establish its frequency relative to the number of words in the group corpus (i.e., the

number of words in all the answers of a given group). Those frequencies are cross-tabulated and a χ

test is

conducted to test the independence between the two groups’ distribution.

Consider a given n-gram i. Let j be the group index, with j = 0 for the reference group and j = 1 for

the target group. Let A

i,j

be the observed number of occurrences of the n-gram i in group j and A

−i,j

the

observed number of occurrences of all other n-grams (except the one we consider) in this group. Let R

the total number of occurrences of n-gram i in both groups, C

be the number of occurrences of all n-grams

in group j, and N the overall number of occurrences of n-grams in both groups.

To perform the test, we ﬁrst establish the theoretical distribution under the independence hypothesis by

computing E

i,j

, the expected frequency of a given n-gram i in group j :

i,j

× C

OA-6

and the expected frequency of all other n-grams in group j:

−i,j

−i

× C

The χ

test statistic is:

= (−1)

1{E

−i,1

}

k∈{−i,i}

j=0

k,j

− E

k,j

)

k,j

We compare this statistic to the distribution of a χ

distribution law with one degree of freedom (i.e., number

of groups −1). A given n-gram is signiﬁcant when the independence hypothesis is rejected and the n-gram is

considered a “keyword.” A negative χ

indicates that the word is signiﬁcantly more frequent in the reference

group. In absolute value terms, the null hypothesis is rejected at the 10% level when |χ

| > 2.71 (*), at the

5% level when |χ

| > 3.84 (**) and at the 1% level when |χ

| > 6.63 (***).

Pros and Cons. Keyness analysis is particularly useful when the goal is to compare text corpora associated

with diﬀerent groups. It is based on a relative frequency analysis that compares the use of words between

two groups, for example, Democrats and Republicans in our study.

This simple method allows us to compute a keyness score that measures how characteristic an n-gram is for

a given group. Based on this, keyness analysis can be used to establish diﬀerences (similarities) between

groups. It also allows the researcher to extract notions and attitudes that are at the heart of a given group’s

text corpus. Thus, this method can, for instance, help elevate our understanding of what drives polarization

within the political spectrum.

One common critique of keyness analysis is that it is based purely on statistical signiﬁcance and does not

take into account the eﬀect size (Gabrielatos and Marchi, 2011; Kilgarriﬀ, 2001). In particular, we do not

learn about the size of a frequency diﬀerence (Gabrielatos, 2018).

Overall, keyness analysis serves as a valuable tool when the goal is to explore potential heterogeneity, e.g., in

attitudes across groups. Yet, results need to be interpreted with caution, given the limitations of the method

(Gabrielatos, 2018). Indeed, while statistical signiﬁcance is a useful metric, its utility is limited to express

the accuracy of a given frequency diﬀerence. Furthermore, keyness is not an intrinsic attribute of words but

depends on several subjective decisions regarding, for instance, the size of the linguistic units that are the

focus of the analysis, the target and reference text corpora, and the statistical signiﬁcance thresholds.

OA-5.3 Topic analysis

Related Literature. Topic analysis was originally developed as a tool for text mining with the goal to

impose structure on the rapidly expanding textual data on the internet (Baeza-Yates et al. (1999)). Salton

and McGill (1983) introduced a now widely-applied method that compares a normalized word frequency

count, within one document, to an inverse document frequency count within a text corpus containing all

considered documents. To reduce the dimensionality of vast text corpora, Deerwester et al. (1990) introduced

latent semantic indexing (LSI), a method later advanced by Hofmann (1999) into the probabilistic LSI (pLSI).

Another approach to topic modeling is the latent Dirichlet allocation (LDA), developed by Blei et al. (2003).

Further developments include dynamic topic models (Blei and Laﬀerty (2006)) and supervised topic models

(Blei and Jon (2007)). A more recent advancement was driven by Roberts et al. (2013) and Roberts et al.

(2014), who introduce structural topic models and apply the method to open-ended survey responses.

Topic analysis is a versatile tool that can be applied to a wide range of bodies of text. Like keyness anal-

ysis, it allows to study heterogeneity in topics use across groups, but its advantage is that we can also ﬂag

topics for which the frequency of use is similar across the diﬀerent groups.(Roberts et al. (2014)). For more

comprehensive reviews of the limitations of topic analysis and potential solutions, see Tang et al. (2014) and

Agrawal et al. (2018).

Semi-supervised approaches to topic modeling allow the researcher to suggest a set of words around which

the algorithm attempts to build topics. Including additional information, e.g., from external data sources,

OA-7

in the topic modeling process can signiﬁcantly improve classiﬁcation results. Banerjee et al. (2007) and

Sch¨onhofen (2009) draw on the titles and categories of Wikipedia articles, as an external data source, to

improve the clustering accuracy of their documents.

To improve the interpretability of topics, Lu et al. (2011) propose an algorithm that is closely related to

Latent Dirichlet Allocation (LDA). Another approach, referred to as “Newsmap” and based on naive Bayes

classiﬁers, was introduced by Watanabe (2018b) and implemented in Watanabe (2018a). Both the seeded-

LDA and the Newsmap models were employed in Watanabe and Zhou (2020). Finally, Gallagher et al.

(2017) develops a novel approach that does not require assumptions about the underlying data generation

process: the “Correlation Explanation” (CorEx) Topic Model.

Semi-supervised algorithms reduces the amount of hand-coding necessary by the researcher and are by na-

ture, data driven. Nevertheless, there are always decisions to be made. For instance, one needs to chose how

ﬁne-grained or coarse to make the topics, whether to use the same keywords across time (e.g., newspaper

articles across decades) and across people (e.g., groups that may use diﬀerent language), or rather adapt them.

For open-ended survey questions, it can make sense to pick topics manually, because the answers are shorter

than most of the texts that topic analysis has been used for. It is of course then possible to cross-validate

the results using less supervised methods. When topics are chosen with a more hands-on approach, multiple

hypothesis testing may become a concern Roberts et al. (2014), but could possibly be alleviated thanks to

a pre-analysis plan. In practice, we use a manual approach.

Our approach. As a ﬁrst preliminary step, we extract the document-term matrix (DTM). The DTM is

a mathematical matrix that describes the frequency of terms that occur in a given set of documents (in

our case, a document coincides with the answer of a given respondent to a given question). The rows and

columns correspond to documents and terms, respectively. Hence, the element (x,y) in the DTM corresponds

to the frequency of the term y in question answer x.

Topics are then deﬁned by sets of keywords that aim to capture the particular aspects of the policy with

which the respondent is concerned when answering to the question. We ﬁx these topics based on the distri-

bution of words that we see in the answers (i.e., in the DTM) and on our understanding of the issues studied.

It is critically important to read many sample answers in order to better understand how respondents are

interpreting the question and how they are using speciﬁc terms. The topic indicator variable is equal to one

if the document contains at least one of the keywords that deﬁne the topic. Documents can thus contain

more than one topic if the respondent used keywords belonging to diﬀerent topics.

OA-6 Topic Keywords

In this section, we provide the full list of keywords used to deﬁne the topics presented in Figure 2 and in

Figures OA-8 to OA-13 of this Online Appendix.

OA-8

Table OA-4: Keywords Defining the Income Tax Topics

Distribution: Middle class; working class; low income; wealthy; millionaire; rich; billion-

aire; corporations & pay/tax

Fairness: Fair; unfair

Government spending: Government spending & high; government spending & cut; deﬁcit; debt;

government & waste; balance & budget; government & budget; government

& control & spend

Social insurance: Social services; governmental services; governmental program & fund; gov-

ernmental program & cover; help & poor; pay & poor; social program; poor

work; live & paycheck; provide & family

Eﬃciency: Hurt & economy; work hard; work less; work more; create & job; depress;

trickle down; negative/detrimental/destroy/damage & economy; competi-

tion; innovation; create & business; boost & economy; discourage; spend

less

Flat tax: Flat tax

Loopholes: Loopholes; lawyer; account; tax evasion; evade; avoid taxes

Public goods: Infrastucture; education; healthcare

Don’t know: Not know; knowledgeable enough; idk; not sure; know enough; unsure

Table OA-5: Keywords Defining the Estate Tax Topics

Distribution: Middle class; working class; low income; wealthy; millionaire; rich; billion-

aire; corporations & pay/tax; poor; inequality

Fairness: Fair; unfair

Government spending: Government spending & high; government spending & cut; deﬁcit; debt;

government & waste; balance & budget; government & budget; government

& control & spend

Eﬃciency: Hurt economy; work hard; ﬂat

Loopholes: Loopholes; lawyer; account; tax evasion; evade; avoid taxes

Double tax: Already taxed/paid; twice & tax/pay

Grieve: Grieve; bury; funeral

Public goods: Infrastructure; education; health care

Don’t know: not know; knowledgeable enough; idk; not sure; know enough; unsure

OA-7 Sample of Answers to the Open-ended Question “What are

your Main Considerations about the Policy?”

OA-7.1 Income Taxation

Distribution: “That the rich and wealthy do not pay their fair share of taxes.”

“Everyone, including the rich and corporations should pay their fair share.”

“I would want working class and middle class people to get tax cuts and I’d be willing to pay more in taxes

for that to happen.”

Fairness: “I have trouble with the concept of tax brackets that punish an individual for being successful.”

“I believe Everyone should be taxed fairly and the most wealthy should not escape carrying their weight.”

OA-9

Government Spending: “Current tax rates being raised are a result of government mismanagement of

funds and over spending without appropriate oversight. Taxes really can’t eﬀectively be lowered until govern-

ment spending is properly controlled.”

“I am okay with raising personal income tax to reduce deﬁcit but not for entitlement programs.”

Social safety net: “What are the taxes going towards? I strongly believe in funding going towards education

and infrastructure.”

“Cut government spending on social welfare programs for lower taxes and privatize most government services

for lower taxes e.g., mail, law enforcement, parks, schools...”

Eﬃency: “I am concerned about the push to raise taxes on persons with higher incomes. I do believe in

trickle down economics and that government should pretty much keep their hands oﬀ.”

“I want the U.S. to be competitive for businesses, but also know there needs to be an appropriate amount of

money to fund the government necessities.”

Flat Tax: “We need a ﬂat tax. Tax forms are complex.”

“I think tax Rates are not fairly representative for most taxpayers. I support a ﬂat tax rate for all except the

totally disabled and indigent.”

Loopholes: “I think the more you make, the more you should pay. We need to close the loopholes that are

there to make sure that those who make more actually pay more.”

“The wealthy oligarchs who own this country will never allow their taxes to be raised, or will hire tax lawyers

to get out of paying them, so any raise in federal personal income taxes will fall on the middle class.”

Don’t know: “I don’t know much about this topic.”

OA-7.2 Estate Tax

Distribution: “It can help keep the ultra wealthy accountable for their wealth.”

“Passing wealth from one generation to the next contributes to wealth inequality. Federal estate tax should

be much higher.”

Fairness: “I don’t think there should be a federal estate tax because it’s kind of unfair to have to pay taxes

on money that already belongs to your family and has most likely had taxes paid on it already.”

Government spending: “I believe in smaller government, so all taxes should be lower. I actually think we

should have a ﬂat tax for income - period. Then estate taxes wouldn’t even be an issue.”

Public goods: “I would like higher taxes to pay for more domestic spending such as education, healthcare,

etc.”

Eﬃciency: “Lower taxes mean I have more disposable income to spend therefore more products can be mad

and more jobs created. I feel it is wrong to penalize people for increased wealth.”

Loopholes: “The wealthy don’t ususally pay these taxes, they ﬁnd a loophole. Why should my children have

to pay taxes on things I’ve already paid taxes on during my lifetime?”

Double taxation: “I think it is ridiculous, you pay taxes twice.”

Grief: “I don’t think we should have one at all. You’re taxing a family member for the death of their loved

one? That’s messed up.”

OA-10

OA-8 Additional Figures

Figure OA-1: Word clouds for the Income Tax

(a) What are your Main Considerations

about the Income Tax?

lower tax

higher tax

middle class

fair share

lower class

fair tax

working class

flat tax

tax people

people higher

lower people

higher higher

tax higher

lower middle

tax class

tax middle

class people

rich people

tax tax

rich tax

rich poor

upper class

higher lower

tax lower

people work

double tax

government spending

everyone fair

middle lower

people tax

tax good

everyone share

wealthy tax

tax high

higher people

government spend

work hard

cant afford

family tax

tax everyone

end meet

social security

working people

tax share

wealthy fair

national debt

people lower

poor people

tax wealthy

class class

class higher

people rich

higher class

across board

rich fair

government waste

lower government

higher wealthy

lower working

people spend

poor rich

very high

tax break

people people

class rich

rich rich

sales tax

higher rich

people afford

social program

high earner

class tax

tax government

wealthy people

lower lower

higher earner

tax poor

people high

poor class

tax rich

class poor

rich share

people struggle

universal healthcare

tax fair

take home

class middle

lower higher

people government

government good

people hard

tax level

good people

class upper

tax working

capital gain

many people

rich higher

everyone tax

middle people

very rich

people back

people cant

people fair

wealthy share

progressive tax

middle working

lower family

working middle

class lower

higher middle

level tax

lower good

tax economy

people take

take paycheck

high people

government control

nothing come

take away

tax benefit

wealthy higher

economic growth

close loophole

government program

good job

tax loophole

fair everyone

american people

paycheck paycheck

fair people

poor middle

support government

long term

hard people

tax always

people keep

government people

standard deduction

class too much

higher everyone

whether tax

hard earn

class fair

affect economy

two thousand

tax across

fund government

higher level

spend economy

class family

middle poor

people earn

working poor

government take

good economy

higher good

loophole wealthy

higher afford

tax low

poor lower

affect personally

take check

middle higher

upper middle

business business

affect family

fair rich

poor poor

hard working

double high

favor tax

president trump

poverty level

higher earn

everyone government

afford higher

tax afford

tax business

people working

government provide

low people

minimum wage

fifty thousand

working citizen

economy government

tax spend

many loophole

loophole rich

government fund

lower everyone

high low

different class

ten million

hundred thousand

break rich

class big

raise lowering

break working

people break

(b) What would be the Goal of a Good

Income Tax System?

flat tax

fair tax

lower tax

middle class

higher tax

fair share

tax tax

across board

lower class

tax everyone

tax bracket

everyone fair

rich poor

working class

everyone share

tax higher

lower middle

upper class

rich people

rich tax

higher higher

progressive tax

tax rich

sales tax

fair everyone

tax people

tax lower

tax class

class lower

flat everyone

tax middle

help people

fund government

hundred thousand

tax fair

tax deduction

wealthy people

class people

higher lower

tax share

higher rich

everyone equally

people higher

fair people

social security

provide service

lower lower

everyone tax

middle lower

rich fair

poor people

everyone regardless

people fair

poor middle

tax loophole

people people

lower people

people tax

wealthy fair

people lower

higher bracket

close loophole

people just

ensure everyone

people share

people live

people earn

large corporation

people afford

people fairly

slide scale

rich share

class upper

class poor

take consideration

rich rich

super rich

everyone matter

hard working

fund program

flat across

rich lower

everyone fairly

poor rich

loophole wealthy

balance budget

ten percent

class class

national sales

national tax

fair class

working people

go back

higher earner

everyone treat

allow people

social program

fair across

progressive higher

tax across

cant afford

loophole rich

flat deduction

tax wealthy

rich higher

poor class

take away

tax working

higher class

national debt

wealthy tax

people cant

lower higher

work hard

provide fund

cost live

thousand hundred

just rich

twenty five

higher wealthy

lower poor

people keep

very high

people help

lower bracket

help citizen

higher people

wealthy individual

help poor

healthcare education

bracket higher

take account

tax high

tax ten

government provide

fair rich

class tax

government program

tax low

people work

graduate higher

many people

working poor

people spend

tax government

tax poor

higher upper

dollar spend

two hundred

help government

everyone loophole

everyone deduction

government spending

everyone equal

loophole allow

board everyone

earn credit

eliminate loophole

class middle

tax board

tax easy

tax citizen

middle poor

fair equitable

wealthy share

rich help

fifty thousand

class wealthy

able afford

benefit people

minimum wage

working hard

just people

take people

free healthcare

quality life

capital gain

universal healthcare

tax business

loophole corporation

hundred fifty

support government

live paycheck

people accord

class rich

upper middle

wealthy afford

everyone accord

service provide

tax may

treat equally

fair equal

fair higher

share higher

run government

million dollar

go towards

few deduction

people working

twenty thousand

Income Tax System?

too many loophole

middle class

fair share

rich poor

lower class

rich people

working class

rich rich

too many break

loophole rich

too many people

upper class

too many rich

loophole wealthy

people work

people avoid

too many wealthy

loophole people

avoid fair

rich avoid

poor poor

wealthy people

poor people

wealthy corporation

avoid share

middle lower

rich corporation

loophole corporation

loophole avoid

class people

rich fair

people people

take too much

lower people

government spending

cant afford

break rich

rich share

high earner

lower middle

rich middle

break wealthy

people fair

rich break

loophole fair

poor rich

people too much

work hard

big corporation

complicated too many

class too much

big business

poor middle

poor class

super rich

people cant

favor rich

people middle

favor wealthy

working people

class lower

too many deduction

feel like

rich keep

people rich

use avoid

people corporation

too much government

people use

wealthy avoid

complicated loophole

break people

class higher

higher bracket

class rich

class poor

high class

class class

large corporation

higher class

earn credit

donald trump

too much people

wealthy share

high bracket

everyone fair

everyone share

corporation wealthy

keep rich

high middle

take away

class upper

rich too much

people earn

loophole use

lower bracket

use thing

people higher

people hard

people share

higher earner

every year

poor too much

benefit rich

high people

very wealthy

fair people

wealthy fair

big break

people business

working poor

people lower

hard working

people like

people understand

too much class

wealthy too many

people live

complicated fair

too much waste

just rich

thousand dollar

come mind

government waste

break big

too much lower

lower family

too many avoid

find loophole

too many too many

break corporation

help people

rich lower

middle people

good people

people working

average person

people barely

take people

wealthy individual

benefit wealthy

class work

big company

capital gain

people loophole

people try

social security

two thousand

hundred thousand

people benefit

people very

fair middle

class high

very rich

let people

people cheat

people break

rich loophole

loophole lower

help class

fair everyone

class working

burden middle

people also

corporation avoid

hard earn

free free

corporation too many

class citizen

people want

working middle

people government

easy people

people nothing

too much spending

corporation share

just keep

million dollar

everyone else

people takeadvantage

wealthy loophole

people high

high low

loophole high

write offs

low end

far too many

too many corporation

corporation rich

people decision

super wealthy

people afford

very fair

rich too many

corporation people

too much take

loophole deduction

people receive

bracket fair

fair too many

OA-11

(d) Which important aspects of the U.S.

federal income tax system would you say

are not discussed enough in the current

policy debate?

middle class

lower class

working class

fair share

lower middle

social security

small business

rich poor

government spending

long term

loophole rich

big business

rich share

rich little

upper class

cost live

hard working

class rich

minimum wage

help middle

rich fair

rich avoid

close loophole

rich middle

large corporation

top one

poor class

class family

middle lower

big company

average american

rich corporation

national debt

loophole wealthy

loophole corporation

capital gain

higher bracket

class class

rich no

corporation wealthy

cant anything

working poor

business owner

every year

work hard

working middle

cut spending

higher class

class lower

high earner

wealthy break

effect lower

write offs

percentage take

large company

very rich

wealthy share

federal government

higher earner

wealthy fair

break corporation

break rich

action take

higher lower

average citizen

class wealthy

class too much

big corporation

help class

very wealthy

wealthy corporation

across board

cut wealthy

way too much

break big

take away

come mind

rich class

favor rich

one thousand

special interest

receive break

loophole avoid

loophole allow

benefit wealthy

class citizen

small owner

low class

class poor

right now

higher higher

low earner

break wealthy

raise rich

cut class

wealthy individual

lower working

(e) What do you think would be the

effects on the U.S. economy if the federal

personal income taxes were increased?

middle class

lower class

working class

hurt middle

lower middle

small business

poor class

negative impact

higher tax

end meet

middle lower

hurt lower

lower tax

across board

economic growth

class suffer

poor middle

government spending

consumer spending

hurt class

cant afford

decrease spending

working poor

class poor

quality life

cost live

national debt

able afford

live paycheck

class class

upper class

class struggle

spending decrease

put back

take hit

very bad

stock market

long run

class lower

already struggle

good service

rich poor

poor rich

lower spending

work hard

hard working

working middle

class rich

slow growth

social program

live check

fair share

good good

class family

tax bracket

lose home

class working

very good

right now

poor working

slow spending

lower working

tax class

good life

struggle end

negative affect

hard time

class middle

rich tax

class spending

hurt business

low class

minimum wage

find way

higher lower

working family

poverty level

hurt working

paycheck paycheck

affect rich

long term

spending good

lower bracket

spending hurt

working suffer

buy good

able buy

stop spending

government waste

social service

work government

buy power

lower suffer

class even

double tax

spending power

struggle meet

spending consumer

poor lower

struggle even

cut spending

credit card

good infrastructure

already live

lose job

use wisely

slow economic

poor suffer

class hard

spending government

universal healthcare

hurt already

class citizen

class hurt

cut back

affect lower

class afford

create job

hard end

stop working

afford live

house market

able bill

middle suffer

government just

hard live

very wealthy

stop buy

middle poor

move country

day day

government assistance

lower family

hurt poor

middle struggle

middle working

good use

rich find

lower consumer

good working

fund government

class worker

negative way

class work

discretionary spending

very negative

hard now

higher earner

give government

class see

good bad

good debt

government program

working american

few job

business suffer

super wealthy

government use

food stamp

poor even

lower deficit

reduce spending

negative lower

education healthcare

good quality

spending class

spending business

tax middle

economic slowdown

able live

take home

free market

class affect

hard middle

one hundred

class live

Notes: Word clouds based answers to open-ended questions stated in the caption. See Section OA-5.1.

OA-12

Figure OA-2: Distributional effects of Income Tax Increase: Who Gains if

Taxes on High Earners were to be Increased?

Notes: The ﬁgure reports raw frequencies among Democrat and Republican respondents of the groups mentioned when an-

swering to the question “Which groups of people do you think would gain if federal personal income taxes on high earners were

increased?”

OA-13

Figure OA-3: Keywords of Income Taxation by Groups

(a) By Political Affiliation (b) By Age Group

OA-14

Notes: The ﬁgure shows keywords among diﬀerent groups of respondents in answers to the questions about respondents’ main considerations, desired goals, and shortcomings

of the income tax. See Section OA-2 for the full text of the questions. The groups are deﬁned by political aﬃliation, age, perceived social class, and education level, respectively.

The score reported for a set of two words is the χ

- test statistic, testing the null hypothesis that the occurrence of the given keywords is the same among the two groups. *

p < 0.1, ** p < 0.05, *** p < 0.01. See Section OA-5.2.

OA-15

Figure OA-4: Word clouds for the Estate Tax

(a) What are your Main Considerations

about the Estate Tax?

double tax

fair tax

middle class

family tax

lower class

lower tax

work hard

tax tax

tax twice

lower income

tax no tax

tax bracket

tax double

working class

income tax

higher lower

tax government

tax free

income bracket

government tax

lower family

fair share

higher income

government right

tax income

five million

cant afford

hundred thousand

wealthy tax

work life

low income

higher tax

tax higher

income family

transfer tax

upper class

tax high

tax transfer

family farm

left behind

higher rich

two thousand

higher higher

rich poor

good way

family government

lower high

tax time

cost live

wealthy family

way government

rich rich

lower higher

likely tax

middle lower

higher little

tax fair

rich tax

lower inherit

small business

wealthy higher

higher bracket

government family

afford higher

tax lower

family family

lower lower

tax inherit

ten million

since tax

income inherit

class lower

poor class

lower middle

family without

inherit tax

higher wealthy

lower government

capital gain

financial burden

tax way

transfer fund

bracket higher

save tax

family no tax

poor rich

burden family

hard work

asset tax

lower good

flat tax

family business

higher class

tax impose

lower work

lower poor

high income

tax asset

transfer higher

fair inherit

tax good

tax live

lower way

way lower

next generation

hard life

lose lovedone

little higher

whole life

slide scale

government transfer

first place

higher way

hard tax

ultra wealthy

tax died

income level

class family

entire life

wealthy fair

income income

rich higher

either way

hard family

wealthy income

lower since

high tax

no tax government

government revenue

financial situation

inherit large

hard save

lower burden

tax previously

life insurance

tax save

lower bracket

family generation

lower hard

lower afford

low family

tax wealthy

lowered tax

lower fair

government greedy

lower lose

large sum

tax since

exist tax

government double

lower no tax

no tax tax

government no tax

work save

class poor

belong family

life government

good government

eliminate tax

tax fund

family good

class afford

higher middle

higher transfer

work child

family work

owner tax

benefit wealthy

life work

farm business

work tax

work family

(b) What would be the Goal of a Good

Estate Tax System?

already tax

fair tax

five million

hundred thousand

twenty five

lower income

ten million

middle class

lower class

five hundred

five thousand

double tax

tax tax

fair share

five ten

across board

thousand five

fifty thousand

hundred fifty

thousand hundred

family tax

low income

million five

rich poor

fair everyone

wealthy people

million million

two million

twenty million

property tax

people live

higher higher

income people

rich people

slide scale

five five

fifty million

million ten

five fair

ten thousand

small percentage

capital gain

low possible

income tax

transfer heir

fifty five

twenty thousand

two hundred

tax people

lower people

flat tax

wealthy tax

thirty five

wealth transfer

noooalready tax

thousand twenty

wealthy family

ten five

middle lower

lower higher

work hard

fair people

anything million

income family

hundred twenty

income inherit

higher income

concentration wealth

allow people

help people

zero tax

high income

income already

million twenty

five fifty

family farm

tax higher

allow family

fair income

million fifty

hundred million

little little

lower middle

fifteen million

inherit wealth

something fair

tax income

twenty two

wealth heir

zero already

wealthy fair

people able

government spend

help government

treat income

rich tax

two five

five twenty

five thirty

ten twenty

everyone fairly

higher class

higher tax

right now

percentage income

little possible

little five

everyone fair

two thousand

no tax heir

two fifty

little fortunate

fifty fifty

wealthy share

ten fair

tax come

nothing government

depend size

ultra wealthy

left heir

inherit income

four hundred

public education

tax twice

class class

family family

help economy

fifty twenty

inheritance five

income lower

upper class

set percentage

help lower

thousand ten

none tax

receive inheritance

people tax

Estate Tax System?

already tax

double tax

tax tax

too many loophole

fair tax

middle class

lower income

cant afford

two thousand

fair share

rich rich

tax twice

lower class

rich poor

loophole wealthy

high rate

favor rich

work hard

government spend

income family

wealth transfer

small business

find loophole

first place

loophole allow

able afford

property tax

answer question

working class

benefit wealthy

charge too much

family tax

lower middle

burden heir

family farm

just government

right now

income bracket

wealth tax

work life

seem fair

tax income

ultra wealthy

tax property

allow wealth

wealthy tax

transfer wealth

value asset

high limit

two eighteen

thousand eighteen

fair already

fair inherit

lose lovedone

many loophole

government too much

even exist

tax wealth

current system

too many rich

loophole rich

rate high

someone already

fair family

across board

family business

seem tax

tax since

tax many

tax earn

(d) What do you think would be the

effects on the U.S. economy if the

federal estate tax were increased?

middle class

lower class

find way

small business

lower income

already tax

negative impact

fund program

working class

government waste

good thing

fair share

negative affect

find loophole

find around

probably good

give government

economic growth

government little

low income

rich poor

little little

little good

able afford

little income

class lower

wealthy find

way around

revenue government

probably little

long run

just waste

upper class

move country

little job

family business

social program

just government

cause problem

government need

middle lower

help deficit

leave heir

long term

cant afford

poor middle

poor class

hard time

good wealthy

work hard

findway around

hurt little

good service

live paycheck

nothing government

national debt

way avoid

put back

create job

try hide

lot little

way leave

business farm

hurt class

family tax

farm business

help government

income government

government project

positive impact

good bad

right now

little home

leave country

fund social

impact overall

put government

good government

wealthy individual

little suffer

hurt family

either way

class class

help fund

bad government

little business

business business

negative little

good just

help debt

Notes: Word clouds based answers to open-ended questions stated in the caption. See Section OA-5.1.

OA-16

Figure OA-5: Keywords of Estate Tax by Groups

(a) By Political Affiliation

(b) By Age Group

higher lower *

income bracket **

dont know ***

income bracket *

two thousand **

dont know ***

already tax ***

double tax ***

tax free ***

already no tax ***

not fair ***

government tax **

not tax **

government right **

life work **

hard work *

higher rich *

tax twice *

zero percent ***

ten million *

double tax ***

already tax ***

not fair **

not tax *

−50

Considerations Goals Shortcomings

Chi2

18−39 50+

OA-17

(d) By Education Level

Notes: The ﬁgure shows keywords among diﬀerent groups of respondents in answers to the questions about respondents’ main

considerations, desired goals, and shortcomings of the estate tax. See Section OA-2 for the full text of the questions. The

groups are deﬁned by political aﬃliation, age, perceived social class, and education level, respectively. The score reported for

a set of two words is the χ

- test statistic, testing the null hypothesis that the occurrence of the given keywords is the same

among the two groups. * p < 0.1, ** p < 0.05, *** p < 0.01. See Section OA-5.2.

OA-18

Figure OA-6: Distributional effects of Estate Tax Increase: Who Loses if

the Estate Tax were Increased?

Notes: The ﬁgure reports raw frequencies among Democrat and Republican respondents of the groups mentioned when answer-

ing to the question “Which groups of people do you think would lose if the federal estate tax were increased?”

OA-19

Figure OA-7: Probability of Mentioning Topics

(a) Income Tax

OA-20

(b) Estate Tax

Probability (%)

Distribution Fairness Gov. Spending Efficiency Loopholes Double Tax Grieve Public Goods Don't Know

Main Considerations about Estate Tax? Goals of a Good Estate Tax System? Shortcomings of Estate Tax System?

Notes: The ﬁgure presents, based on the keyword-based topic analysis, the probability of mentioning topics for three open-ended

income and estate tax questions. Each reported topic corresponds to an indicator variable in the keyword-count model, which

equals 1 if the respondent mentions in their response at least one of the topic-deﬁning keywords. For the list of keywords and

details on the text analysis methodology, see Section OA-6 and see Section OA-5.3. Panel A: Considerations: When you think

about federal personal income taxation and whether the U.S. should have higher or lower federal personal income taxes, what

are the main considerations that come to your mind?; Goals: What would be the goal of a good tax system?; Shortcomings:

What do you think are the issues with or shortcomings of the U.S. federal income tax system? Panel B: Considerations: When

you think about the federal estate tax and whether the U.S. should have a higher or a lower federal estate tax, what are the

main considerations that come to your mind?; Goals: What would be the goal of a good estate tax system?; Shortcomings:

What do you think are the shortcomings of the U.S. federal estate tax? For details on the text analysis methodology, see Section

OA-5.3.

OA-21

Figure OA-8: Topic Distribution by Political Affiliation for the Income

Tax

(a) What are your Main Considerations about the U.S. federal income tax system?

Distribution

Fairness

Gov. Spending

Social Safety

Efficiency

Flat Tax

Loopholes

Public Goods

Don't Know

Clinton Liberal Clinton Moderate Trump Moderate Trump Conservative

(b) What would be the goal of a good income tax system?

Distribution

Fairness

Gov. Spending

Social Safety

Efficiency

Flat Tax

Loopholes

Public Goods

Don't Know

Clinton Liberal Clinton Moderate Trump Moderate Trump Conservative

Distribution

Fairness

Gov. Spending

Social Safety

Efficiency

Flat Tax

Loopholes

Public Goods

Don't Know

Clinton Liberal Clinton Moderate Trump Moderate Trump Conservative

Notes: The ﬁgure shows distribution of topics mentioned in the answers to the open-ended questions about the income tax by

political aﬃliation. See Section OA-2 for the full text of the questions. The bars represent the number of times a topic was

mentioned out of the total mentions of any topic by political group. For the list of keywords and details on the methodology,

see Section OA-6 and see Section OA-5.3.

OA-22

Figure OA-9: Topic Distribution by Income Groups for the Income Tax

(a) What are your Main Considerations about the U.S. federal income tax system?

Distribution

Fairness

Gov. Spending

Social Safety

Efficiency

Flat Tax

Loopholes

Public Goods

Don't Know

Low income Middle income High income

(b) What would be the goal of a good income tax system?

Distribution

Fairness

Gov. Spending

Social Safety

Efficiency

Flat Tax

Loopholes

Public Goods

Don't Know

Low income Middle income High income

Distribution

Fairness

Gov. Spending

Social Safety

Efficiency

Flat Tax

Loopholes

Public Goods

Don't Know

Low income Middle income High income

Notes: See the notes to Figure OA-8. Low income (High income) corresponds to respondents who report a pre-tax household

income below (above) 39,000 (70,000) U.S. dollars; Medium income corresponds to respondents who report a pre-tax household

income between 40,000 and 69,000 U.S. dollars.

OA-23

Figure OA-10: Topic Distribution by Age Group for the Income Tax

(a) What are your Main Considerations about the U.S. federal income tax system?

Distribution

Fairness

Gov. Spending

Social Safety

Efficiency

Flat Tax

Loopholes

Public Goods

0 10 20 30 40

Don't Know

Age 18-29 Age 30-49 Age 50-69

(b) What would be the goal of a good income tax system?

Distribution

Fairness

Gov. Spending

Social Safety

Efficiency

Flat Tax

Loopholes

Public Goods

0 10 20 30 40

Don't Know

Age 18-29 Age 30-49 Age 50-69

Distribution

Fairness

Gov. Spending

Social Safety

Efficiency

Flat Tax

Loopholes

Public Goods

0 10 20 30 40

Don't Know

Age 18-29 Age 30-49 Age 50-69

Notes: See the notes to Figure OA-8.

OA-24

Figure OA-11: Topic Distribution by Political Affiliation of the Estate Tax

(a) What are your Main Considerations about the U.S. federal estate tax system?

Distribution

Fairness

Gov. Spending

Efficiency

Loopholes

Double Tax

Grieve

Public Goods

Don't Know

Clinton Liberal Clinton Moderate Trump Moderate Trump Conservative

(b) What would be the goal of a good estate tax system?

Distribution

Fairness

Gov. Spending

Efficiency

Loopholes

Double Tax

Grieve

Public Goods

Don't Know

Clinton Liberal Clinton Moderate Trump Moderate Trump Conservative

Distribution

Fairness

Gov. Spending

Efficiency

Loopholes

Double Tax

Grieve

Public Goods

Don't Know

Clinton Liberal Clinton Moderate Trump Moderate Trump Conservative

Notes: See the notes to Figure OA-8.

OA-25

Figure OA-12: Topic Distribution by Income Groups of the Estate Tax

(a) What are your Main Considerations about the U.S. federal estate tax system?

Distribution

Fairness

Gov. Spending

Efficiency

Loopholes

Double Tax

Grieve

Public Goods

Don't Know

Low income Middle income High income

(b) What would be the goal of a good estate tax system?

Distribution

Fairness

Gov. Spending

Efficiency

Loopholes

Double Tax

Grieve

Public Goods

Don't Know

Low income Middle income High income

Distribution

Fairness

Gov. Spending

Efficiency

Loopholes

Double Tax

Grieve

Public

Don't Know

Low income Middle income High income

Notes: See the notes to Figure OA-9.

OA-26

Figure OA-13: Topic Distribution by Age Group of the Estate Tax

(a) What are your Main Considerations about the U.S. federal estate tax system?

Distribution

Fairness

Gov. Spending

Efficiency

Loopholes

Double Tax

Grieve

Public Goods

0 10 20 30 40

Don't Know

Age 18-29 Age 30-49 Age 50-69

(b) What would be the goal of a good estate tax system?

Distribution

Fairness

Gov. Spending

Efficiency

Loopholes

Double Tax

Grieve

Public Goods

0 10 20 30 40

Don't Know

Age 18-29 Age 30-49 Age 50-69

Distribution

Fairness

Gov. Spending

Efficiency

Loopholes

Double Tax

Grieve

Public Goods

0 10 20 30 40

Don't Know

Age 18-29 Age 30-49 Age 50-69

Notes: See the notes to Figure OA-8.

OA-27

References

Agrawal, A., W. Fu, and T. Menzies (2018). What is wrong with topic modeling? And how to ﬁx it using

search-based software engineering. Information and Software Technology 98, 74–88.

Baeza-Yates, R., B. Ribeiro-Neto, et al. (1999). Modern information retrieval, Volume 463. ACM press New

York.

Banerjee, S., K. Ramanathan, and A. Gupta (2007). Clustering short texts using wikipedia. In Proceedings

of the 30th annual international ACM SIGIR conference on Research and development in information

retrieval, pp. 787–788.

Benoit, K., K. Watanabe, H. Wang, P. Nulty, A. Obeng, S. M¨uller, and A. Matsuo (2018). Quanteda: An r

Package for the Quantitative Analysis of Textual Data. Journal of Open Source Software 3 (30), 774.

Blei, D., A. Ng, M. Jordan, and J. Laﬀerty (2003, 02). Journal of machine learning research 3 (2003) 993-1022

submitted 2/02; published 1/03 latent dirichlet allocation.

Blei, D. M. and D. Jon (2007). Mcauliﬀe. supervised topic models. Advances in Neural Information Processing

Systems 20, 121128.

Blei, D. M. and J. D. Laﬀerty (2006). Dynamic topic models. In Proceedings of the 23rd international

conference on Machine learning, pp. 113–120.

Deerwester, S., S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman (1990). Indexing by latent

semantic analysis. Journal of the American society for information science 41 (6), 391–407.

Flood, S., M. King, R. Rodgers, S. Ruggles, and J. R. Warren (2020). Integrated public use microdata

series, current population survey: Version 8.0 [dataset]. Minneapolis, MN: IPUMS. https://doi.org/

10.18128/D030.V8.0.

Gabrielatos, C. (2018, 02). Keyness analysis: Nature, metrics and techniques, pp. 225–258.

Gabrielatos, C. and A. Marchi (2011). Keyness: Matching metrics to deﬁnitions. In Theoretical-

methodological challenges in corpus approaches to discourse studies and some ways of addressing them.

Gallagher, R. J., K. Reing, D. Kale, and G. Ver Steeg (2017). Anchored correlation explanation: Topic mod-

eling with minimal domain knowledge. Transactions of the Association for Computational Linguistics 5,

529–542.

Gallup (2019). Party Aﬃliation, In Depth: Topics A to Z. https://news.gallup.com/poll/15370/

party-affiliation.aspx.

Gentzkow, M. and J. M. Shapiro (2010). What Drives Media Slant? Evidence from US Daily Newspapers.

Econometrica 78 (1), 35–71.

Heimerl, F., S. Lohmann, S. Lange, and T. Ertl (2014). Word Cloud Explorer: Text Analytics Based on

Word Clouds. In 2014 47th Hawaii International Conference on System Sciences, pp. 1833–1842.

Hofmann, T. (1999). Probabilistic latent semantic indexing. In Proceedings of the 22nd annual international

ACM SIGIR conference on Research and development in information retrieval, pp. 50–57.

Kilgarriﬀ, A. (2001). Comparing corpora. International journal of corpus linguistics 6 (1), 97–133. Publisher:

John Benjamins.

Leech, G. and R. Fallon (1992). Computer corpora-what do they tell us about culture. ICAME journal 16.

Leip, D. (2019). Election Results from the Associated Press. Atlas of U.S. Presidential Elections and David

Wasserman of the Cook Political Report. https://uselectionatlas.org/RESULTS/.

Lu, B., M. Ott, C. Cardie, and B. K. Tsou (2011). Multi-aspect sentiment analysis with topic models. In

2011 IEEE 11th international conference on data mining workshops, pp. 81–88. IEEE.

Partington, A. (2014). Mind the gaps: The role of corpus linguistics in researching absences. International

Journal of Corpus Linguistics 19 (1), 118–146.

OA-28

Roberts, M. E., B. M. Stewart, D. Tingley, E. M. Airoldi, et al. (2013). The structural topic model and

applied social science. In Advances in neural information processing systems workshop on topic models:

computation, application, and evaluation, Volume 4, pp. 1–20. Harrahs and Harveys, Lake Tahoe.

Roberts, M. E., B. M. Stewart, D. Tingley, C. Lucas, J. Leder-Luis, S. K. Gadarian, B. Albertson, and D. G.

Rand (2014). Structural topic models for open-ended survey responses. American Journal of Political

Science 58 (4), 1064–1082.

Salton, G. and M. J. McGill (1983). Introduction to modern information retrieval. mcgraw-hill.

Sch¨onhofen, P. (2009). Identifying document topics using the wikipedia category network. Web Intelligence

and Agent Systems: An International Journal 7 (2), 195–207.

Stantcheva, S. (2021). Understanding tax policy: How do people reason? The Quarterly Journal of Eco-

nomics 136 (4), 2309–2369.

Stubbs, M. (2010). Three concepts of keywords. Keyness in texts, 21–42.

Tang, J., Z. Meng, X. Nguyen, Q. Mei, and M. Zhang (2014, 22–24 Jun). Understanding the limiting factors

of topic modeling via posterior contraction analysis. In E. P. Xing and T. Jebara (Eds.), Proceedings of

the 31st International Conference on Machine Learning, Volume 32 of Proceedings of Machine Learning

Research, Bejing, China, pp. 190–198. PMLR.

Taylor, C. (2013). Searching for similarity using corpus-assisted discourse studies. Corpora 8 (1), 81–113.

Watanabe, K. (2018a). Conspiracist propaganda: How russia promotes anti-establishment sentiment online.

In ECPR General Conference, Hamburg.

Watanabe, K. (2018b, March). Newsmap: A semi-supervised approach to geographical news classiﬁcation.

Digital Journalism 6 (3), 294–309.

Watanabe, K. and Y. Zhou (2020). Theory-driven analysis of large corpora: Semisupervised topic classiﬁca-

tion of the un speeches. Social Science Computer Review, 0894439320907027.

OA-29