Skip to Main Content

Feasibility of Melville Marginalia Authorship Differentiation

Masters Thesis, Boise State University, August 2017

Aaron Burdin

Abstract:

We examine the feasibility of using image processing techniques to determine differentiation in authorship of historical pencil marks. Pencil marks with unattributed and
attributed authorship are segmented from digital images of historical books. Analysis
is performed on five features that are extracted from the “vertical” pencil marks,
with those features used as a basis for authorship of marks. These marks consist
of single stroke marks that are interspersed in the same document. We describe
the challenges of the digital format that we were given and the steps taken in using
autonomous segmentation to save pixel locations of marks. Five mark features are
chosen and extracted: Average Intensity, Stroke Width, Blurriness, Stroke Curvature,
and Stroke Angle. Features are then analyzed with the use of different histograms,
2D scatter plots of feature space, and comparing and contrasting the two groups
of marks. C-means clustering is performed on the feature spaces of both groups.
Semi-supervised clustering is used to test if we can predict the clustering. We then
use two forms of cluster validity, Davies-Bouldin Index and Silhouette, in order to
produce a confidence value on the number of clusters and their membership. Then
we look at the histograms and 2D scatter plots with the Melville’s Marginalia Online
attributed and unattributed labels applied. Extracting features show patterns and
trends within the marks that could be used to group marks. Specifically, Stroke
Curvature became a dominant feature that showed promises of differentiating marks
created by different authors. Extracting features has the potential to be used with
high confidence in separating marks by author.