Skip to Main Content
Mobile Menu

Improvements to adaptive thresholding algorithms to binarize poorly illuminated documents

Participants:

Students: Tessa Triolo and Dede Russell
Dr. Elisa Barney Smith & Dr. Tim Andersen

Funding Source:

This research was funded by a grant from the Computing Research Association, Committee on the Status of Women in Computing Research’s CREU: Collaborative Research Experience for Undergraduates in Computer Science and Engineering project.

Description:

Often documents are poorly illuminated when they are scanned or have yellowed with aged causing an uneven background color.

 

Image of a poorly illuminated documentto convert the image into a text document, the image is passed through an Optical Character Recognition (OCR) algorithm. Most OCR algorithms process only input images that are black and white, without intermediate gray levels. Therefore the image must be thresholded. The simplest thresholding algorithm is a global threshold. That doesn’t work well on images with varying background content.

 

Image of a document that has gone through global thresholdingAdaptive thresholding algorithms can work around this, but often cause the background to have a peppered texture.

 

Image of a document that has gone through adaptive thresholdingWe are working to improve a common adaptive thresholding algorithm by Niblack, to overcome this problem. Preliminary results are promising.

 

Image of a document that has gone through Niblack adaptive thresholding