Do your datasets spark joy? Working with semi-structured or malformed data requires skills that cross disciplines. Fortunately there are tools that make cleaning and standardizing data easier. This class’s goals are to introduce participants to OpenRefine, cover basic strategies for its use, and perform exercises to familiarize them with its capabilities. With a spreadsheet-like interface, OpenRefine is visibly familiar to most users. But where a spreadsheet may be good for entering data, OpenRefine is built to tidy malformed data. This workshop's goals are to introduce participants to OpenRefine, cover basic strategies for its use, and perform exercises to familiarize them with its capabilities.
Participants will learn some of the rationale behind cleaning data, and complete guided exercises where they work on sample datasets. They will learn basic strategies for use, and ideally gain an understanding of the amount of work required for standardizing and cleaning datasets.
Participants are expected to have access to a laptop running Windows, MacOS, or Linux; a current installation of Java SE Runtime Environment 8, and an installation of the most recent stable release of OpenRefine, currently at version 3.1 (http://openrefine.org/download.html). Chrome or Firefox are recommended for known compatibility with OpenRefine's browser-based user interface.
Instructor: Jerry Waller
Jerry Waller became Systems Librarian at Elon University in 2014, just in time for an ILS migration and all the joy that such an endeavor brings. He's been working with data standardization techniques since graduate school and a subsequent stint as a biostatistics programmer. No matter the source, Jerry wants information to tell a clear, objective story.