While pictures of nature or scientific experiments are more interesting than pictures of pages of text, there are far more documents in this world. The goal of Document Image Processing is usually to convert an image of a document into text that can be edited (MS Word) or searched (Google). While there are several commercial software packages that can do this, when the image quality is low, the error rate is high.
DIA Research at Boise State focuses primarily on the image processing part of the process. Modeling and analysis of the image creation (printing) and acquisition (scanning) degradations allows both understanding of why the quality is lower, and also better design of image processing algorithms that can either improve that quality, or compensate for it. Research projects have included image binarization, page segmentation, developing new OCR algorithms, adding to open source DIA codes sets, and developing a quality ruler for printing. A current project is working with the Melville Marginalia Online to add OCR output to its queries
Funding for this research is provided by NSF, Hewlett Packard, and the Osher Institute.
“Enhancement of historical printed document images by combining Total Variation regularization and Non-Local Means filtering,” L. Likforman- Sulem, J. Darbon, E.H. Barney Smith, Image and Vision Computing, Elsevier, Vol. 29, No. 5, 2011, pp. 351-363
“Statistical Image Differences, Degradation Features and Character Distance Metrics” E. H. Barney Smith and X. Qiu, International Journal of Document Analysis and recognition, Springer Verlag, Vol.6, No. 3, 2004, pp. 146-153.
E. H. Barney Smith, ” A new metric describes the edge noise in bilevel images “,SPIE online Newsroom, 13 October 2009, DOI: 10.1117/2.1200910.1829.
Find out more about Elisa H Barney Smith’s research.
Find out more about the Signal Processing Lab.