Digitization Workflows: Scanning, OCR, and Audio Transcription

Converting documents, text, images, and sound files to digital and/or machine-readable formats is a prerequisite for many digital humanities projects. Digitization is the process of capturing analog materials as digital images. Optical Character Recognition (OCR) programs “read” these images and convert them to text documents which can be easily searched, copied, edited, or used for computational text analysis methods. Transcription is the process of translating audio or video files into a text format. Explore more tools these tasks in the ‘Capture’ category on the DiRT Directory.


