Skip to Main Content

Document Image Processing

Current BSU Projects

Past BSU Projects


Applications

Document Image Analysis can be applied to may applications beyond the desktop OCR package that comes with most commercial scanners or PDF readers. Some applications include:

  • Reading books and documents for visually impaired
  • Conversion of books to digital libraries
  • Signature verification
  • Reading license plate or cargo container numbers
  • Reading road signs for autonomous or semi-autonomous vehicles
  • PDA or tablet PC technology
  • Sorting of large document datasets (legal, historical, security)
  • Search engines on the Web

Motivation

Document Image Analysis aims to develop algorithms and processes through which machines (computers) can automatically read and develop some basic understanding of documents. Documents include

  • Machine printed documents – such as memos, letter technical reports, books.
  • Hand written documents – personal letters, addresses on postal mail, notes in the margins of documents.
  • On-line handwritten documents – writing on PDAs or tablet PCs.
  • Video documents – annotating videos based on text in the video clips.
  • Music scores – turning sheet music into MIDI or other electronic music formats.

The growth of the World Wide Web has made it easier to make information publicly available, but to make that information useful it must be in computer readable form so it can be searched and the items of interest retrieved. Documents are converted to computer readable form through the process of Document Image Analysis (DIA) which encompasses the process of Optical Character Recognition (OCR). An image of a document is made and the text content must be recognized in order to be searchable. An automated OCR system can reduce the time needed to convert a document to computer readable form to 25% of the time a human needs to hand enter the same data. Although much effort has been dedicated to developing methods of automatically converting paper documents into electronic form, and OCR products are commercially available, often for free, many documents that are easy for humans to read still have only 92% recognition accuracy. This is too high to remove the human from the process, increasing the time and cost of document conversion. Thus there is a need for further research in this field.

Low accuracy rates are most common in documents with image degradations caused by printing, scanning, photocopying and/or FAXing documents. These four operations all share the processes of spatial and intensity quantization. These are the primary sources that change the appearance of bilevel images such as characters and line drawings. Camera-based acquisition (such as with a cell phone) adds to the degradation by introducing out-of-focus degradations, and perspective distortions. To date the most common method of overcoming these degradations is to provide the classifier with enough variety of samples that the classifier can recognize the degraded characters. However, by understanding the degradation and being able to estimate the degradation characteristics for each document, a more effective method of preprocessing or recognizing the characters can be developed.