Assessing and Improving OCR Quality in the HathiTrust
The rise of large-scale digitized book collections—such as those provided by Google Books, the HathiTrust and the Internet Archive—is enabling a fundamentally new kind of text analysis that exploits the scale of collections to ask questions not possible with smaller corpora.