From this example, it becomes clear that coding data (as in the M for male and F for female
example above) or using other forms of shorthand can lead to confusion for people trying to
read and analyze the data, if the data collector does not document or define the codes. This is
especially true for datasets that are not in the same language as the archival documents or that
use jargon that the data collector added to the dataset, hence the necessity of documenting
one’s choices in the field definitions table/data dictionary and the Methodology section of
Journal of Slavery and Data Preservation data articles.
Applying Controlled Vocabularies
Controlled vocabularies are used to standardize data values within like fields and provide
consistency across a dataset. These carefully selected and defined lists of words and phrases
6
provide a consistent way to categorize data values within a field so that similar data in disparate
datasets can be searched effectively. For example, when describing Sex within the Enslaved.org
controlled vocabulary, the terms “Female,” “Male,” and “Intersex” are used. These controlled
terms standardize words that encompass this category in historical documents such as woman,
man, lady, gentleman, boy, girl, mujer, hombre, niña, niño, or abbreviations like M and F used in
some datasets. Aligning with Enslaved.org’s controlled vocabularies is recommended, but not
required. As in the following example, the key is both to establish a standardized process for
7
extracting and recording the data for each field and to document the process for each field.
Although controlled vocabularies encourage shared language, the current list of terms will not
be sufficient to accommodate the specifics of every historical example; use whatever terms you
think are best and after initial submission, talk to Enslaved.org staff about the possibility of
adding these terms to the controlled vocabularies to better represent your evidence.
Pro-tip: Specify and define terms appropriate for a category. These terms can be based on the
data in the historical document but standardized so that they can be consistently used across
records in the dataset. You might consider listing specific terms that you mapped to a single
term and/or how you determined the appropriate controlled vocabulary to apply based on other
aspects of the dataset. For the Sex example, you could specify that a historical ledger included a
heading named “Gender” and recorded that data as “M” or “F” but that you decided to record the
data as Male (for “M”) and Female (for “F”). Or that the Sex column you included in the dataset
did not appear in the original document but is data you inferred (or interpreted/imputed) based
on other information in the source document, for example from the first or given name of a
person or gendered phrases in a description field–for example “gentleman,” “boy,” “hombre,” and
“niño,” which you coded as Male, while “woman,” “lady,” “girl,” “mujer,” and “niña” was categorized
7
Enslaved.org continues to expand and refine the controlled vocabularies as contributors submit new
data.
6
Six Person fields use a controlled vocabulary: Sex, Age Category, Occupation, Relationships, Person
Status (freedom status), and Role within Event. Event has one controlled vocabulary that categories the
Type of slavery/slave trade event, for example a manumission, marriage, or sale. The Place Type
vocabulary defines terms such as county or parish, domicile, maroon community, plantation, port, etc.
Definitions for all controlled vocabulary terms are located at
https://docs.enslaved.org/controlledVocabulary/.
Enslaved.org Recommended Practices for Historical Slavery Data (CC BY-NC-SA 4.0) - 15/30