Name: Creating large text datasets for humanities research using reproducible methods
Start: 2018-10-06T10:45:00-0400
End: 2018-10-06T12:15:00-0400

Back To Schedule

Creating large text datasets for humanities research using reproducible methods

Feedback form is now closed.

In this interactive session, I will guide the audience through methods for preparing a large, uniform data set from the writings of Phillip K. Dick suitable for use in discovering the uses of language around gender within the corpus using algorithmic methods. I will be working with a variety of media for source material (scanned pages of manuscripts, printed books, PDF's, web-based text, etc) and will use the Python programming language along with various libraries to extract text, format the data and retain meta-data information required to reference the original sources. I will be able to document my work in a way that will allow future researchers to read and understand the nuances of my methodology with enough detail to recreate my dataset from the original sources (whether or not they appear in the same medium) using an entirely different programming language or suite of tools.

Speakers

Austin L. Meyers

Saturday October 6, 2018 10:45am - 12:15pm EDT
Room 241

Interactive Session, Session 3B

BUDSC18

Austin L. Meyers

Attendees (9)

BUDSC18

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Austin L. Meyers

Attendees (9)