Binarization Data

E. H. Barney Smith. “An analysis of binarization ground truthing,” Proceedings of the 9th IAPR International Workshop on Document Analysis Systems, Boston, MA, 9-11 June 2010, pp. 27-33.

The accuracy of a binarization algorithm is often calculated relative to a ground truth image. Except for synthetically generated images, no ground truth image exists. Evaluating binarization on real images is preferred. The ground truthing between and among different operators is compared. Four direct metrics were used. The variability of the results of five different automatic binarization algorithms were compared to that of manual ground truth results. Significant variability in the ground truth results was found.

14 images were made available for the DIBCO 2009 contest. 7 machine print and 7 hand print images. Theses 14 images and the ground truth from Basilis Gatos are available at

The original DIBCO 2009 dataset

The rebinarization of these 14 images done at BSU for the aforementioned paper are available at

The new BSU ground truth DIBCO 2009 dataset (160kb)