📝 Original Info
- Title: Construction and evaluation of classifiers for forensic document analysis
- ArXiv ID: 1004.0678
- Date: 2015-03-14
- Authors: ** Christopher P. Saunders, Linda J. Davis, Andrea C. Lamas, John J. Miller, Donald T. Gantz **
📝 Abstract
In this study we illustrate a statistical approach to questioned document examination. Specifically, we consider the construction of three classifiers that predict the writer of a sample document based on categorical data. To evaluate these classifiers, we use a data set with a large number of writers and a small number of writing samples per writer. Since the resulting classifiers were found to have near perfect accuracy using leave-one-out cross-validation, we propose a novel Bayesian-based cross-validation method for evaluating the classifiers.
💡 Deep Analysis
Deep Dive into Construction and evaluation of classifiers for forensic document analysis.
In this study we illustrate a statistical approach to questioned document examination. Specifically, we consider the construction of three classifiers that predict the writer of a sample document based on categorical data. To evaluate these classifiers, we use a data set with a large number of writers and a small number of writing samples per writer. Since the resulting classifiers were found to have near perfect accuracy using leave-one-out cross-validation, we propose a novel Bayesian-based cross-validation method for evaluating the classifiers.
📄 Full Content
arXiv:1004.0678v2 [stat.AP] 28 Jun 2011
The Annals of Applied Statistics
2011, Vol. 5, No. 1, 381–399
DOI: 10.1214/10-AOAS379
c
⃝Institute of Mathematical Statistics, 2011
CONSTRUCTION AND EVALUATION OF CLASSIFIERS FOR
FORENSIC DOCUMENT ANALYSIS1
By Christopher P. Saunders2, Linda J. Davis3, Andrea C.
Lamas3, John J. Miller3 and Donald T. Gantz3
George Mason University
In this study we illustrate a statistical approach to questioned
document examination. Specifically, we consider the construction of
three classifiers that predict the writer of a sample document based
on categorical data. To evaluate these classifiers, we use a data set
with a large number of writers and a small number of writing samples
per writer. Since the resulting classifiers were found to have near per-
fect accuracy using leave-one-out cross-validation, we propose a novel
Bayesian-based cross-validation method for evaluating the classifiers.
1. Introduction.
A common goal of forensic handwriting examination is
the determination, by a forensic document examiner, of which individual is
the actual writer of a given document. Recently, there has been a growing
interest in the development of forensic handwriting biometric systems that
can assist with this determination process. Forensic handwriting biometric
systems tend to focus on two main tasks. The first task, known as writer
verification, is the determination of whether or not two documents were
written by a single writer. The second task, commonly referred to as hand-
writing biometric identification, is the selection from a set of known writers
of a short list of potential writers for a given document. (Another exam-
ple of a biometric identification problem in forensics is searching fingerprint
databases to find a match for a latent fingerprint.)
Received May 2008; revised June 2010.
1Supported in part under a contract award from the Counterterrorism and Forensic
Science Research Unit of the Federal Bureau of Investigation’s Laboratory Division. Names
of commercial manufacturers are provided for information only and inclusion does not
imply endorsement by the FBI. Points of view in this document are those of the authors
and do not necessarily represent the official position of the FBI or the US Government.
2Supported by IC Post Doctorial Research Fellowship, NGIA HM1582-06-1-2016.
3Supported by Gannon Technologies Group.
Key words and phrases. Classification,
handwriting
identification,
cross-validation,
Bayesian statistics.
This is an electronic reprint of the original article published by the
Institute of Mathematical Statistics in The Annals of Applied Statistics,
2011, Vol. 5, No. 1, 381–399. This reprint differs from the original in pagination
and typographic detail.
1
2
C. P. SAUNDERS ET AL.
In this paper we focus on closed-set biometric identification, which as-
sumes that the writer of a document of unknown writership is one of W
known writers with handwriting styles that have been modeled by the bio-
metric system. It is important to note that the fundamental forensic writer
identification problem, which is to verify that a document of questioned writ-
ership came from a “suspect” to the exclusion of all other possible writers,
is not addressed in this paper. The “exclusion of all other possible writers”
requires an assumption that the suspect writer has a unique handwriting
profile and, further, that the handwriting quantification contains enough in-
formation to uniquely associate the writing sample of unknown writership
with the suspect’s writing profile. These issues are addressed in handwriting
individuality studies. [See Srihari et al. (2002) and related discussion pa-
pers in the Journal of Forensic Sciences.] Ongoing research by Saunders et
al. (2008) explores some of the issues associated with studying handwriting
individuality using computational biometric systems.
At a basic level, closed-set biometric identification is similar to a tradi-
tional multi-group statistical discriminate analysis problem. In this paper,
we implement three different discriminant functions (or classification pro-
cedures) for categorical data resulting from the quantification of a hand-
written document. We determine the accuracy of these three classification
procedures with respect to a database of 100 writers provided by the FBI.
Each of the three classification procedures is shown to identify with close to
100% accuracy the writer of a short handwritten note.
The quantification technology used in this study is a derivative of the
handwriting biometric identification system developed and implemented by
the Gannon Technologies Group and the George Mason University Doc-
ument Forensics Laboratory. Components of the system are described as
needed. For a document of unknown writership, the system returns a short
list of potential writers from a set of known writers. This functionality is
the common goal of most forensic biometric systems [Dessimoz and Cham-
pod (2008)]. A forensic document examiner can pursue a fina
…(Full text truncated)…
Reference
This content is AI-processed based on ArXiv data.