prespecified rules (listed in Multimedia Appendix 2). This was
done in duplicate and differences were resolved by consensus.
All searches were conducted between May and July 2010. We
restricted each search to the search dates provided in the
methods section of each systematic review. For each search
result, we calculated the total number of records retrieved, the
number of relevant articles retrieved, and the position of the
relevant records in the search results. For each relevant article,
we followed all links for full-text access and documented
whether the full-text version could be retrieved for free. We did
not use any pay-for-privilege accesses. To ensure that we did
not inadvertently make use of our institution’s licensing when
searching, all searches were conducted on a computer with
Internet access provided by a local service provider and not our
institution. We tested and validated our search methodology in
a pilot phase. Two assessors (graduate students with expertise
in computer science and biomedical science) independently
conducted 10 searches in PubMed and Google Scholar and
achieved a percent agreement of 99%.
Content Coverage
To assess the potential for bias due to the absence of articles in
one source over the other, we evaluated the content coverage
for each database. A content coverage analysis determines
whether pertinent literature is contained within a specific
bibliographic database [31]. There are two potential reasons for
not finding an important article when searching a database such
as PubMed: either the article of interest is not included in the
content holdings of the database (referred to as a lack of content
coverage), or the article is present, but the search mechanism
fails to retrieve it when a search phrase is typed into the
database. To determine content coverage, we searched for each
primary article using advanced search strategies as outlined in
other coverage studies [32,33]. This involved various
combinations of the manuscript’s title (both English and
nonEnglish), the authors’ names, journal title, page numbers,
and the year published. We selected all links to candidate
matches to confirm a match. In Google Scholar, the option to
view all versions for a candidate article was always selected
and all links were attempted. If a primary article was not found
in one of the resources, further searches were performed by
another rater to confirm its absence. We previously published
a more comprehensive content coverage analysis of renal
literature that applied the same methods [34].
General Statistical Analytic Strategy and Sample Size
Primary Analysis
The two most prominent performance metrics of searching are
recall and precision (Table 2). Results from our survey indicated
that 80% of nephrologists do not review beyond 40 search
results, which is the equivalent of 2 default search pages in
PubMed [28]. Thus, for the primary analysis, we calculated the
recall and precision for the first 40 retrieved records in each
search. We used a 2-sided paired t test to compare search
outcomes between PubMed and Google Scholar. To reduce the
risk of type I error, we used a conservative P value of .025 to
interpret significance for all comparisons. We used SAS,
Version 9.2 for all statistical analyses.
Secondary Analysis
We repeated the calculation for recall while only considering
relevant articles that are freely accessible. For each
physician-generated search, we also calculated the recall and
precision for all retrieved records (not just the first 40).
Table 2. Formulas for calculating search recall
a
and precision
b
.
Nonrelevant articlesRelevant articles
c
Search in PubMed or Google Scholar for a clinical question
FPTPArticles found
TNFNArticles not found
a
Search recall TP/(TP + FN): the number of relevant articles found as a proportion of the total number of relevant articles.
b
Search precision TP/(TP + FP) (also referred to as the positive predictive value in diagnostic test terminology): the number of relevant articles found
as a proportion of the total number of articles found.
c
For each search, the set of relevant articles were the collection primary studies included in the original systematic review from which the clinical
question was derived.
Results
Nephrologist and Search Characteristics
Participating nephrologists were an average of 48 years old and
had practiced nephrology for an average of 15 years. All
respondents had used an online resource to guide the treatment
of their patients in the previous year. Approximately 90% used
PubMed to search, while 40% used Google Scholar; 32%
indicated using both bibliographic resources. Searches provided
by the nephrologists contained an average of three concept
terms, with each term embodying a single concept, for example,
myocardial infarction. Forty-eight percent of nephrologists used
Boolean terms such as AND, OR, and NOT in their searches.
Seven percent of searches included advanced search features
such as search limits, search filters, and truncation (inclusion
of multiple endings achieved by typing in an asterisk “*” in
PubMed, eg, nephr*). No substantive differences were observed
in searches provided by older versus younger nephrologists,
males versus females, or by those practicing in an academic
versus community setting.
Content Coverage
PubMed and Google Scholar contained similar proportions of
tested articles in their database holdings: each contained 78%
of the 1574 unique citations collected. Google Scholar contained
an additional 5% of the articles not included in PubMed and
PubMed contained an additional 2% of the articles not included
J Med Internet Res 2013 | vol. 15 | iss. 8 | e164 | p. 4http://www.jmir.org/2013/8/e164/
(page number not for citation purposes)
Shariff et alJOURNAL OF MEDICAL INTERNET RESEARCH
XSL
•
FO
RenderX