Acoustic Properties av Predictors of Perceptual Responses: a Study of Swedish Voiced Stops

 PERILUS VII, Diana Krull PhD Thesis (19098 Kb)


In speech recognition algorithms and certain theories of speech perception the interpretation of the signal is based on "distance scores" for comparisons of the signal with stored references; in these theories, perception is seen as a product of stimulus and experience. The aim of the present thesis is to evaluate such distance measures by investigating the perceptual confusions of the Swedish voiced stops [b,d,ɖ,g] in systematically varied fragments of vowel-consonant-vowel stimuli providing 25 vowel contexts for each consonant. To what extent can perceptual identifications be accounted for in terms of the acoustic properties of the stimuli? Short stimulus segments following stop release, chosen to elicit perceptual confusions, constituted the main material for this investigation. The resulting confusions were shown to form a regular pattern depending mainly on the acute/grave dimension of the following vowel. The acoustic distances calculated were based partly on formant frequencies at the consonant-vowel boundary, partly on filter-band spectra. Both models provided distance measures which revealed regular patterns related in their essentials to the confusions. However, the predictive capacity of both models was improved by including the dynamic properties of the stimuli in the distance measures. The highest correlation between predicted and observed percent confusions, r=.85, was obtained with the formant-based model. The asymmetries in the listeners' confusions were also shown to be predictable given acoustic data on the following vowel and were included in the calculations.

An errata list is included in the pdf-file.