OpenRefine is a free, open source, powerful tool for working with messy data. Using an interface similar to that of a spreadsheet, it allows for the quick exploration of large datasets using features such as faceting and filtering. But where it really shines is in helping fix inconsistencies in your data, such as differences in spelling, date format, capitalization, etc.
This workshop will introduce the most powerful features of OpenRefine using a sample bibliographic dataset. Participants will be encouraged to install the tool on their own computer prior to the workshop, and follow along. We will conclude with a short discussion of use cases: participants are welcome to share examples of datasets from their own work practice that need to be cleaned up and discuss how OpenRefine can help with this process.
There are no particular prerequisites for this session. Familiarity with Regular Expressions can be useful in applying more advanced text matching functions, but is not required.
This workshop will draw inspiration from the Library Carpentry OpenRefine lesson.
By the end of this class, students will be able to:
Utilize facets and filters to explore large datasets
Identify inconsistencies in a dataset and correct them
Combine advanced operations such as clustering and text matching to quickly edit large datasets
Thomas Guignard (he/him) is an independent library technology consultant and project manager, with over 15 years of experience working in academic libraries and consortia. Since 2014, he is a volunteer instructor with Software Carpentry and Library Carpentry, building software and data skills within research, library and information-related communities. Thomas is also an avid traveler and library architecture photographer; his work can be found on his Instagram account @concretelibraries. Thomas has an MLS from the University of Aberystwyth (UK), and an MSc and PhD in Engineering from the Swiss Federal Institutes of Technology in Zurich and Lausanne. He lives with his family in Quebec City, Canada.