Identifying potential headings for Authority work
using III Sierra, MS Excel and OpenRefine
Lynn Whittenberger, Metadata Librarian
Raynor Memorial Library, Marquette University
In order to participate in a newly established Wisconsin NACO funnel project, Marquette University’s
Raynor Library needed to identify author/corporate names currently in the Catalog
(http://libus.csd.mu.edu/) or Institutional Repository (http://epublications.marquette.edu/) that might
be good candidates for creating new Authority records.
Two collections immediately came to mind that would benefit from additional authority control:
Marquette Faculty authors (represented in the Institutional Repository and in our Faculty Publications
Collection), and Milwaukee musicians (represented in the Cujé Collection)
III Sierra: pulling lists & exporting records
Lists of names represented in the catalog were easy to pull. Faculty Publications and Cujé Collection
materials have unique location codes in our catalog, which made it easy to gather the relevant
bibliographic records using the create lists function in III Sierra. From those lists, I exported only the
author and added author fields (MARC fields 100/110 and 700/710). I stripped subfields e and t from the
export, as that made it easier to identify and remove duplicates in Excel and OpenRefine. The author
and added author fields were exported as a | (pipe) delimited file. Sierra offers the option of specifying
delimiters for repeated fields, and I used the same delimiter for both fields and repeated fields, which
made it easier to convert the resulting file into one long list in Excel.
MS Excel: consolidating the catalog data
Once the file exported, I used the Excel wizard to convert the delimited file into an Excel spreadsheet.
The initial conversion gave me many columns of data. To consolidate the columns of data down to one, I
sorted the last column, used the Excel “remove duplicates” function on that column, then copied the de-
duplicated cells and pasted them onto the bottom of the column to the left. I repeated the process of
sorting and deduplicating, then copying and pasting, moving left on the spreadsheet until I was left with
one column of data. A time consuming process, but not terrible.
Digital Commons data
Our Institutional Repository (IR) is built on the Digital Commons platform and the list of faculty names is
natively exported as an Excel spreadsheet. Here, the author name exports into three columns: last
name, first name, middle name/initial. Using the concatenate function in Excel, I was able to ‘glue’ the
pieces of the author name together into the same format as the data from the catalog (Last Name
(comma space) First Name (space) middle name/initial). To convert the column of built names from a
formula to text, I had to copy the column, and then paste as values into a new column.