Datasets, extracted from McGill digitized collections, available for computational text-analysis and research.
This data set consists of plain text files containing the full text of the publication Canadian Architect and Builder (1888–1908), which was digitized by McGill University Libraries in the late 90s and is accessible at Canadian Architect and Builder. The Canadian Architect and Builder (CAB) was the only professional architectural journal published in Canada before World War I. With both advertisements and articles appearing in the text files, CAB provides a wealth of information on the state of architecture and building in Canada during the late 19th and early 20th centuries.
This data set consists of XML files from the digitization of a small collection of Chinese gynaecological works held by McGill University Library Rare Books and Special Collections. One of these texts is unique, others are well known works that exercised considerable influence in the practice of gynecology in late imperial China and were reprinted many times. The original digital collection project was carried out in the early 2000s and is accessible at Gynaecology in Traditional Chinese Medicine: Selected Texts.
This data set consists of metadata and (in some cases) full text of the McGill Thesis and Dissertation collections from 1881–2018. McGill holds theses and dissertations written by McGill students from 1881 to present day. The historical print collection is housed in the McGill University Library’s Rare Books and Special Collections. Since 2009, theses have been submitted electronically and are made available in our institutional repository. In 2016, a massive retrospective digitization project was completed, as a result of which the full text of the historical theses were also made available online in the institutional repository. All the digitized and born digital theses are now publicly available. Find more information about the collection on its website, Highlights from McGill theses and dissertations.
The Fur Trade in Canada and the North West Company data set provides access to the full-text XML files of 38 manuscripts collectively known as the Masson Papers, held in McGill University Library Rare Books and Special Collections. The Masson Papers comprise letters, diaries, travel narratives, and other textual documents relating to the North West Company and the colonial-era fur trade more generally. The papers represent a settler perspective of North American places and peoples. The source site, In Pursuit of Adventure: The Fur Trade in Canada and the North West Company, was created in the late 1990s. More information about the manuscripts and the transcription standards is available on the website.
In Search of Your Canadian Past: The Canadian County Atlas Digital Project, created by McGill University Library in the late 1990s, provides access to 43 Ontario county atlases which were produced between 1874 and 1881 and which are housed in McGill’s Rare Books and Special Collections. Of interest to genealogists, the atlases contain indexes of persons residing in each county and these have been digitized and are searchable on the above mentioned website. This data set is an extract of the people index used by the website, along with URLs for each record. The CSV contains 172927 records with the following fields: title (e.g. Mr., Mrs., Prof.), first name, last name, township name, town name, county name, atlas date, URL.
This data set is the result of a Text Encoding Initiative (TEI) project built around the McGill Library’s Chapbook Digital Collection. Rare Books and Special Collections created a TEI XML file for most of the chapbooks on this site using TEI P5:Guidelines for Electronic Text Encoding and Interchange by the TEI Consortium. Level 4 coding from Best Practices for TEI in Libraries was used to guide the encoding. Note that the woodcuts in each chapbook were assigned a classification code from the Iconclass thesaurus to describe the subject of the image. The McGill Library’s Chapbook Collection was created from chapbooks from three special collections in the Rare Books and Special Collections Library. The majority of the imprints (955 titles) are from the 19th century, published in England and the Northeastern United States. There are 74 Scottish and 19 Irish chapbooks in the collection. Most of the collection’s 18th century titles were published in London, England.
The current data set includes full text files of the OCR’d text from the full run of the student-run publications McGill University Gazette (1874–1890) and McGill Fortnightly (1892–1896). Digital copies of the full corpus of the McGill Student Publications are available in McGill Student Publications Collection in the Internet Archive. The physical collection is housed in the McGill University Archives.