The Lab Space at DSG

The Digital Scholarship Group is developing a digital Lab space to showcase experiments with data analysis and visualization. The Primary Source Cooperative editions are providing early pilot projects for the Lab, where the edition data will be remixed and reshaped to reveal new patterns and interconnections. 

Scholarly editors working with the Primary Source Cooperative are creating digital editions using the TEI Guidelines: creating data which captures the structure of each individual document together with detailed metadata and information about the contents of the document, including people and topics discussed. That data is transferred via a data pipeline of the Lab’s design to bring it into the domain of the Digital Scholarship Group technical infrastructure. The raw files are stored in Amazon Web Services, and the data is indexed in an XML database to provide support for querying and for extracting specific features of the data for further analysis. From either of those endpoints, the data of the TEI is available for creative development by DSG’s analysts.

Using a combination of well-known tools, both open-source and proprietary, for natural language processing, network analysis, and visualization in Python and JavaScript, the Lab is exploring ways to begin exploration around researchers’ questions — and help them develop their own.

Below are some screenshots of prototype visualizations produced using Jupyter notebooks and the D3 visualization framework, based on data from the John Quincy Adams Digital Diaries project. 

Co-occurrence of Subject Headings

Editors at the Massachusetts Historical Society assign subject headings to each entry in John Quincy Adams’s diary. The headings describe the general content of JQA’s daily journal-keeping. When aggregated, these subject headings can reveal an overview of JQA’s focus: the subjects he most frequently thought about when we recorded his thoughts at the end of the day.

Co-occurence of Names

The editors at MHS have also encoded references to names in order to make them available for machine analysis, and to disambiguate names. Mousing over each name highlights the name co-occurrences, as well as statistics about the network connections.

Subject Headings Across Time: Recreation

John Quincy Adams apparently wasn’t all work and no play. This chart shows the use of “Recreation” as a subject heading across time. The empty circles represent the total number of subject headings during that time period. The green circles represent usage of “Recreation”. This shows that not only does the subject appear across the range of diaries edited, but also that it is frequently among the most used subjects for each particular time period, especially early on.