Forensic Speaker Identification
A Likelihood Ratio-based Approach using Vowel Formants
Tony Alderman
Australian National University
This monograph describes an experiment in Forensic Speaker Identification, showing how speech samples from the same speaker can be discriminated from speech from different speakers with acoustic features commonly used in forensics. It also explains what is now considered the legally and logically correct approach to Forensic Speaker Identification, and presents data that can be used both in real casework and in further testing.
Forensic Speaker Identification is typically concerned with addressing the question of whether two or more speech samples have been produced by the same, or different, speakers. It is clear from recent research that the legally and logically correct way of doing this is by using a Bayesian Likelihood Ratio. The monograph explains what a Likelihood Ratio is; why its use is now considered correct; and how it can be used to successfully discriminate same-speaker pairs from different-speaker pairs.
The monograph shows how the Likelihood Ratio is a ratio of the probability of the evidence given a hypothesis (e.g. that the two samples are from the same speaker) to the probability of the evidence given a competing hypothesis (e.g. that the speech samples are from different speakers). This can be seen as a ratio expressing the similarity of the samples, divided by the typicality of the samples (i.e. how common these similarities are in the rest of the population). Since same-subject pairs are predicted by theory to have Likelihood Ratios greater than unity, and different-subject pairs are predicted to have Likelihood Ratios smaller, the Likelihood Ratio lends itself to use as a discriminant function to discriminate same-speaker from different-speaker speech samples. The extent to which this is possible is vital knowledge, given the legal evidentiary standards now accepted in the wake of the well-known Daubert rulings.
One stumbling block in the implementation of Bayesian Forensic Speaker Identification is the general lack of adequate background distributions for the assessment of the typicality of the similarities; that is, while two forensic speech samples may be similar, how common are the similarities in the general population?
Typically, one of the most important acoustic features used to compare forensic speech samples is vowel formants. These are the resonant frequencies of the speaker’s vocal tract when they are producing vowels. Bernard’s early study on the formants of male Australian English vowels, although now relatively old, provides potential background distribution data from a large number of speakers. The first goal of the monograph, therefore, is to describe, in adequate detail for forensic-phonetic investigation, the distributions of formant values for a subset of the vowels from the Bernard data set.
Many of the analytical methods used within Forensic Speaker Identification have an inherent assumption of normality for the distribution of the feature being analysed, whereas acoustic parameters from speech are often not normally distributed. Kernel Density estimation is one method which can take into account this non-normality. The second goal of the monograph, therefore, is to compare the performance of a ikelihood Ratio formula which assumes normality in the background distribution, with that of one using Kernel Density estimation.
The third aim of the research described is, through interpretation of the results of the discrimination tests, to add to the growing corpus of knowledge of the strength of various vowel-formant combinations as parameters for use in Bayesian Forensic Speaker Identification.
The tests are performed on data from eleven male speakers of Australian English, with non-contemporaneous samples, using formant values at target for their long monophthongal vowels to estimate the Likelihood Ratios. The results show that a formula assuming normality performs different-speaker comparisons more successfully than same-speaker, while a Kernel Density formula performs differently depending on the values chosen for within-speaker variance. Optimal results are found using the Kernel Density formula with within-speaker variance estimated from the test data. Suggestions for forensic practice are made with reference to these results. These include the possible use of both formulae for an analysis of different parameters, and further investigation of normality values and within-speaker variance on research outcomes.
ISBN 9783895867156. LINCOM Studies in Phonetics 01. 160pp. 2005.