ONLINE AUDIENCE ENGAGEMENT WITH LEGACY AND DIGITAL-BORN NEWS MEDIA IN THE 2019 INDIAN ELECTIONS
| 4 |
S
Twitter
Figure A2 summarises the Twitter data collection
process. Broadly, we aimed to collect all tweets related
to the general election during our time window. To do
so, we relied on the Twitter Streaming API and used as
search queries the names and usernames of political
parties and candidates and the news outlets in our
list. (In our analysis here we focus specically on the
sample of legacy and digital-born news media).
Additionally, we also used the most important hashtags
during the election period to track all conversation
around the election. Due to the multithread nature
of an electoral event and to avoid missing important
tweets mentioning outlets on our list, we monitored
Twitter conversations in India by using external tools
to identify relevant hashtags every day, and updated
our list of queries daily (for a complete description of
the process see Majó-Vázquez et al. 2017).
In total, we collected 63,252,755 tweets from which
we subset 50,965,208 tweets matching the above-
mentioned criteria
7
. In sum, we followed the Twitter
activity of 73 news outlets. However, for the actual
analysis, we excluded those outlets that tweeted less
than once a day on average during the time window of
the study.
Facebook
As with the Twitter sample, we rst manually veried
each of the news outlets in our list that had active
Facebook pages. Then, we used a third-party tool
called CrowdTangle to gather all posts published
during the election period by those pages. For our
analysis, we only kept those pages that posted on
average at least one news piece per day about the
election. Finally, we narrowed down to posts only
relevant to the elections by applying a set of keyword
lters in the English, Hindi, Bengali, Tamil, Malayalam,
Telugu, Marathi, Kannada and Gujarati languages (see
Table A2). In total, we studied 65,941 posts published
by 78 news outlets’ Facebook pages.
Comscore
The nal dataset of our analysis was obtained from
Comscore, an online audience and trac metrics
rm. We collected the audience data for all the media
outlets on our list (when available), for the months
of January, February and March 2019. Although the
elections ocially started on 11 April, the discourse
and news coverage surrounding the elections were
already very contentious since, at least, the state
elections in Chhattisgarh, Madhya Pradesh, and
Mizoram in November 2018.
We averaged the available data for the three-month
period and used it to trace audience navigation
patterns across news media outlets. This web-
browsing data oered us a benchmark to assess
the dierences, if any, between the distribution of
audience on the general web and on social media
platforms, particularly, Twitter.
Results
N
Figures 2 and 3 summarise the news content provision
by media type on social media. As is clear from the
visualisation, regional news outlets dominated
the provision of political information on Facebook
throughout the election period. During each election
phase, vernacular language newspapers led the
ranking of most active media category on the platform
by a large margin. The second polling day, 18 April, was
the busiest in terms of content produced by regional
outlets, despite not being the phase when the largest
number of constituencies went to vote. In total, during
phase 2, regional news outlets published 1,294 posts,
a signicantly greater number than the 549 posts
published by national broadcasters, the 361 posts
published by the national newspapers and the 302
posts published by digital-born outlets. It is worth
mentioning again here that our sample included
38 regional outlets and 37 digital-born sites. Even
during the sixth phase of polling, which included Delhi
and the surrounding national capital region (NCR),
national broadcasters and newspapers still trailed
behind vernacular outlets in terms of the volume of
news content posted on Facebook.
7
Notably, the extensive use of Twitter during the polling days combined with the high percentage of Twitter users, at least among the Indian
English-speaking population, pushed us beyond the 1% limit allowed by the Streaming API on several occasions during the data gathering
process. An essential drawback of the Twitter Streaming API is the lack of information concerning what and how much data one gets once
it reaches the 1% threshold (for an in-depth discussion see Morstatter et al. 2013).