Help Me Read! Expanding Students' Reading with Wikipedia Articles

Help Me Read! Expanding Students’ Reading with

Wikipedia Articles

∗

Arun-Balajiee Lekshmi-Narayanan, Khushboo Thaker, Peter Brusilovsky, Jordan Barria-Pineda

School of Computing and Information

135 N Belleﬁeld Avenue

Pittbsurgh, PA, USA

{arl122,kmt81,peterb,jab464}@pitt.edu

ABSTRACT

In this demo paper, we present an implementation of an in-

telligent digital textbook integrated with external readings

for students, such as Wikipedia articles. Our system ap-

plies the ideas of concept extraction from a digital textbook

on topics in cognitive psychology and computer science for

a graduate class in a large US-based university to generate

search terms that can link with Wikipedia articles. Finally,

we integrate these articles into the textbook reading inter-

face, enabling students to quickly refer to Wikipedia articles

in connection with the reading material of the course to un-

derstand a concept or topic that they struggle with or are

interested in exploring further. With this demo, we present

a system that can be utilized for data collection in a real-

world classroom setup.

Keywords

Intelligent Textbooks, Digital Reading Systems, Wikipedia,

Concept Extraction, Data Collection

1. INTRODUCTION

The rapid development of science and technology created a

problem for college instructors who want to ensure that stu-

dents receive up-to-date knowledge of the subject. While in

the past, textbooks served as a predominant source of class

readings, they frequently lagged behind the state-of-the-art.

At present, many courses, especially at the graduate level,

use a collection of recent research papers rather than text-

books as course readings. Unlike textbooks, which introduce

domain knowledge gradually, taking care to explain critical

concepts, research papers are written for audiences who are

already familiar with core domain knowledge. Hence, re-

search papers are challenging to read for unprepared stu-

dents. Several authors have suggested that recommending

relevant Wikipedia articles to explain complicated concepts

∗

(Does NOT produce the permission block, copyright

information nor page numbering). For use with

ACM PROC ARTICLE-SP.CLS. Supported by ACM.

could facilitate reading [1, 4]. Moreover, as an added bene-

ﬁt, the recommendations could make reading more personal-

ized by encouraging students to explore readings related to

their interests. However, implementing Wikipedia recom-

mendations is not straightforward, since only some of the

“concepts” mentioned in a research paper are useful recom-

mendations in the context of a speciﬁc course. In this demo,

we present a course reading system for research papers that

uses advances in text mining to recommend the most rele-

vant Wikipedia pages for every page of assigned readings.

The system was tested in a full-term graduate course, where

we also collected student feedback on the relevance and dif-

ﬁculty of recommended Wikipedia articles.

2. A READING SYSTEM WITH WIKIPEDIA

RECOMMENDATIONS

To explore the opportunity to extend online reading with

Wikipedia articles, we modiﬁed an online digital textbook

reading platform, ReadingMirror [2], customizing it to re-

search paper readings. The modiﬁed system inherited sev-

eral useful features from the digital textbook platform, such

as a table of contents (now course reading plan), annota-

tions, and social comparison (Fig. 1). To extend the reading

system with the recommendations of Wikipedia articles, we

used text mining to extract entities from each reading page

(see Section 3). Page-level extraction was used to provide

recommendations on the page where the relevant concept

is mentioned. Recommendations are provided using an ex-

pandable tab on a page margin. Clicking on this tab reveals

a list of links to recommended articles which could be opened

next to the article page. For example, if a page of an assigned

article mentions “Allen Newell”, it is recognized as a useful

Wikipedia concept and a link to the Wikipedia article is of-

fered on Allen Newell, along with other recommendations

for further exploration and reading (Fig. 2).

To instrument the classroom study reviewed below, all stu-

dent work with recommendations (opening, scrolling, and

closing the recommendation tab) is logged. In addition, we

provide a simple interface for students to rate videos on the

relevance and diﬃculty of recommended Wikipedia articles

(bottom left in Fig. 2). To encourage ratings, the list of

Wikipedia articles that the student has rated or read ap-

pears in a separate tab above the Wikipedia links tab.

3. ENTITY EXTRACTION

Previous work on Wikipedia linking compared the content of

the page in the textbook that the student reads with the rel-

A.-B. Lekshmi-Narayanan, K. Thaker, P. Brusilovsky, and J. Barria-

Pineda. Help me read! expanding students’ reading with wikipedia

articles. In M. Feng, T. K

aser, and P. Talukdar, editors, Proceedings

of the 16th International Conference on Educational Data Mining,

pages 525–528, Bengaluru, India, July 2023. International Educa-

tional Data Mining Society.

under the Creative Commons Attribution NonCommercial NoDeriva-

tives 4.0 International (CC BY-NC-ND 4.0) license.

https://doi.org/10.5281/zenodo.8115760

Figure 1: The interface of the reading system, Reading Mirror, with the course reading plan on the left and a page of the

assigned reading on the right. A tab on the right of the reading page shows a list of recommended Wikipedia articles related to

this page.

evant Wikipedia articles [1, 4]. However, these approaches

could be noisy and generate relatively few recommendations.

Since one of the goals of our project was to explore the

feasibility of generating personalized recommendations that

could engage students with diﬀerent interests, we attempted

to generate a somewhat excessive number of recommenda-

tions targeting the most relevant concepts mentioned on

each page. To achieve this goal, we combined automatic

concept extraction with heuristic ﬁltering and embedding-

based ranking for each reading page.

The ﬁrst step in this process is to ﬁnd Wikipedia concepts

and entities mentioned on the target page. For each reading

page, we extracted the entities mentioned on the page using

the DBpedia Spotlight API

. DBpedia Spotlight generates

a list of entities in the submitted text along with correspond-

ing Wikipedia pages linked to those entities. This list is usu-

ally large and noisy, so it requires post-processing. In the

ﬁrst step of post-processing, we ﬁltered this list based on the

semantic types of these entities, removing several irrelevant

types of entities such as ’Event’, ’Website’, ’Film’, ’Loca-

tion’, and ’Country’. We also removed entities that did not

have a corresponding Wikipedia page in English. After the

cleaning, we ranked the remaining entities. Since DBPedia

Spotlight does not rank entities according to their relevance

to the target page, we used the EMBED Rank [3]. For rank-

ing with EMBED rank, we generated embeddings of the text

on the page for which the recommendation is generated and

the ﬁrst paragraph in the ranked Wikipedia page. Top-N

Wikipedia pages were recommended to the students.

4. A CLASSROOM DEPLOYMENT

To assess the usefulness of our idea and the quality of gener-

ated recommendations, we deployed the system as the course

https://github.com/dbpedia/spotlight-docker

reading system in a graduate course on human information

processing in a large US-based university. In this lecture-

based course, students were requested to read one or two

assigned research articles prior to each lecture to prepare

for a discussion. In the earlier oﬀerings of this course, the

articles were distributed to students in PDF form through

a learning management system. In our study, the same ar-

ticles were provided to students through the course read-

ing system, which allowed us to generate a large number of

page–level Wikipedia article recommendations for each as-

signed research article. The class had 11 lectures with a

total of 17 research articles assigned for the required read-

ings. The pages of these articles provided recommendations

for 1,238 concepts linked to Wikipedia articles. As part of

the learning process, we asked students to read at least 3

Wikipedia articles each week, selecting the most interesting

ones for them from the set of recommended articles. In turn,

to select these three most interesting articles, students were

instructed to examine and rate (by relevance and diﬃculty)

at least 10 recommended articles each week. For this work,

students could earn up to one course credit point.

5. PRELIMINARY RESULTS

We collected learning data from 42 students enrolled in the

class. In total, 772 out of 1238 recommended concepts linked

to Wikipedia articles were explored and rated by students.

An average of 12 students (mean = 12.73 , std = 8.73 ) rated

each concept for diﬃculty and 13 students (mean = 13.05,

std = 9.05 ) for relevance. The 10 most popular concepts

rated for relevance and those rated the most diﬃcult are

shown in Table 1. Since the students were guided by their

interests, this list likely indicates the concepts in which the

students are most interested in the course. Analysis of stu-

dent rating data indicates that each student rated on average

242 concepts (mean = 241.87, std = 132.12) for diﬃculty and

242 (mean = 242.97, std = 130.07) for relevance throughout

Figure 2: Once the student clicks on a link to a recommended Wikipedia article, it opens on the left side of the reading interface.

The rating bar at the bottom allows the student to rate the relevance and diﬃculty of the recommended article.

Table 1: 10 Most Popular Wikipedia articles by number of

students rating them as Relevant or Highly Relevant and as

Medium or Hard Diﬃculty

Relevance Diﬃculty

Change Blindness Cognitive Science

Cognitive Science Memory

Visual Perception Change Blindness

Cognitive Psychology Visual Perception

Saccade Flicker

Experimental Psychology Saccade

Cognitive Revolution Cognitive Psychology

Iconic Memory Distractions

Memory Metadata

Hybrid Image Mylifebits

the course duration. Note that it is considerably more than

110 ratings (10 per week) that the students were required

to make to get the full score. This data indicates that the

students were considerably engaged in examining and rating

recommended Wikipedia articles.

The distribution of relevance and diﬃculty ratings for rec-

ommended articles rated is shown in Figure 3. As the data

show, the majority of recommended articles were judged

easy or medium diﬃculty by the class, although a notice-

able number of articles were considered hard. From the

prospect of relevance, the majority of articles were rated

as relevant or highly relevant, although a good number were

rated somewhat relevant and even not relevant.

To examine the articles rated as relevant or highly relevant,

we counted the number of ratings for each of these articles

(i.e., the number of students who rated this article as rele-

vant or highly relevant) and plotted this data by ordering ar-

Figure 3: Distribution of Diﬃculty (left) and Relevance

(right) ratings for recommended Wikipedia articles.

ticles by the number of ratings (Fig. 4). The data show that

while a good number of concepts such as “Cognitive Science”

and “Memory” were universally popular, approximately half

of the relevant concepts such as “Probabilistic Reasoning”

and “Knowledge Visualization” covered in Wikipedia articles

were selected for examination by ﬁve or fewer students. This

conﬁrms our hypothesis that students in the same class have

considerably diﬀerent interests and opens up an opportunity

for personalized rather than class-level recommendations.

As Fig. 3 shows, a considerable number of recommended

Wikipedia articles were judged as not relevant. To under-

stand how we can improve the recommendation process,

we examined the concepts covered by these Wikipedia ar-

ticles. The analysis revealed several problems. The dom-

inant source of irrelevant recommendations was the PDF

source of research articles. First, hyphenation frequently

produces partial words such as “mecha” or “illus”, which

sometimes have perfectly valid Wikipedia articles unrelated

to the content of the course. Second, beyond their true con-

Figure 4: Relevant or highly relevant Wikipedia articles ranked by the number of ratings

tent, all articles have publication data, including named en-

tities for publishers (“Princeton University Press”, “SAGE”,

“IEEE”) and places of publication (“Hershey”, “Princeton”),

which are usually present in Wikipedia. Another problem

was the result of our attempt to recognize the names of re-

searchers mentioned in the articles to oﬀer students more in-

formation about them. Unfortunately, in a number of cases,

these researchers were not prominent enough to appear in

Wikipedia, while a diﬀerent famous person with the same

name was listed (i.e., “George Eyser”, “Terry Crews”), which

resulted in referring to the wrong people. Finally, some

perfectly valid concepts such as “priming” (in psychology)

had diﬀerent meanings in diﬀerent areas and correspond

to Wikipedia “disambiguation pages” with links to diﬀerent

meanings. Some students considered these pages irrelevant.

The analysis demonstrated that most of the observed prob-

lems could be resolved by adding additional heuristics to our

ﬁltering process.

6. CONCLUSION

In this demo, we present a system that uses text mining

to expand student reading options in graduate classes by

recommending relevant Wikipedia articles for research pa-

pers assigned for mandatory reading. This approach en-

riches student course knowledge and allows students to per-

sonalize their readings by focusing on the most interesting

concepts covered in the recommended articles. The system

was used as a primary reading tool in a semester-long grad-

uate course, enabling us to gain several interesting insights

into student work with recommendations. In particular, we

observed that about half of the articles rated as relevant

or highly relevant were examined and rated by 5 or fewer

students. It conﬁrms that diﬀerent students might be inter-

ested in diﬀerent aspects of the course and opens opportu-

nities for personalized recommendations. The current demo

used a relatively simple text mining approach to extract in-

teresting concepts mentioned in the text of the mandatory

readings, yet the majority of recommended Wikipedia arti-

cles (and their concepts) were judged as relevant or highly

relevant. The analysis of concepts judged as not relevant re-

vealed several heuristics that could be used to improve our

text-mining approach.

7. REFERENCES

[1] R. Agrawal, S. Gollapudi, K. Kenthapadi,

N. Srivastava, and R. Velu. Enriching textbooks

through data mining. In Proceedings of the First ACM

Symposium on Computing for Development, pages 1–9,

2010.

[2] J. Barria-Pineda, P. Brusilovsky, and D. He. Reading

mirror: Social navigation and social comparison for

electronic textbooks. In First Workshop on Intelligent

Textbooks at 20th International Conference on Artiﬁcial

Intelligence in Education (AIED 2019), volume 2225,

pages 30–37. CEUR, 2019.

[3] K. Bennani-Smires, C. Musat, A. Hossmann,

M. Baeriswyl, and M. Jaggi. Simple unsupervised

keyphrase extraction using sentence embeddings. In

Proceedings of the 22nd Conference on Computational

Natural Language Learning, pages 221–229, Brussels,

Belgium, Oct. 2018. Association for Computational

Linguistics.

[4] X. Liu and H. Jia. Answering academic questions for

education by recommending cyberlearning resources.

Journal of the American Society for Information

Science and Technology, 64(8):1707–1722, 2013.