Exercises 7 and 8: Sample vowels for measurement of F0 and formant values
Exercise 7 (can also be used for exercise 8): Some short signals for practice in lowpass filtering, gating, monotonization, multiplying F0, and prolongation
See also under chapter 9 below for more on the prime/probe, or prime/target, design for perception experiments, which has taken on considerable importance in cognitive phonetic experiments recently.
Exercise 1: Samples of consonants in different vocalic environments for practice in measuring and distinguishing similar consonants. Note that the order of the consonants varies in the different samples.
Exercise 3: Adult male and child utterances of [s] and [ʃ].
Exercise 4: Samples for practice measuring VOT.
Erratum: On p. 148, the last sentence of the caption for Figure 5.8 should read "Arrows represent the gliding of diphthongs."
Special Feature: Tips on measuring vowels in poor-quality recordings.
7 wav files, 7 TextGrids - zip
Chapter 2 gives you some information about how to measure the formants with LPC, FFT spectra, and other methods, and chapter 5 shows you how to mark the onset and offset of vowels in common situations and where to take measurements within vowels. The examples given in the book are all from recordings that were made in quiet situations. The field recordings used for sociolinguistic studies and forensic linguistic applications often have a lot of noise in them, though. In addition, you may wish to analyze archival recordings that either have degraded over time or were made with equipment that didn't meet today's standards. Here are some tips on how to handle recordings like that.
One tactic that can be used for a wide range of recordings is helpful when you can't get a formant to track without getting a false formant at some other frequency. This often happens when two formants are close together, making them hard to separate in LPC. In a case like that, the easiest solution is to set the number of LPC coefficients to a great enough number that the problem formant does appear on the formant track. Then compare the formant track against the spectrogram and just ignore the false formant(s).
Sample 1 shows an example where the background noise is relatively constant. This kind of situation may occur when a fan or other machine, in this case a vending machine, is running in the room where the recording was made. Another problem in this recording is that the microphone wasn't close enough to the speaker's mouth. In spite of these problems, the vowel formants are often clear enough to read. The louder syllables certainly have readable vowel formants. As a rule, if you can understand the speech auditorily, you should be able to get usable readings from the recording. The softer syllables, particularly in function words, don't have clear formants. There's also a band of especially loud machine noise around 500 Hz. Formants that are close to this band of noise have to be scrutinized carefully. Note how LPC gives you an irregular formant track that follows the band of noise except where there's a sufficiently loud vowel. Anywhere that you see that irregular track, don't trust the readings for it. Thus, in the words if and enough, F1 readings are unreliable. However, in the word she'll, F1 looks straighter and can be trusted (for the most part), and in the words I and fast, F1 is clearly away from the band of noise and readings of it are somewhat usable (though the formant track for F1 is probably pulled down a little by the noise). Of course, phonetic phenomena of lower amplitudes, such as fricative spectra, and those that are highly sensitive to noise, such as voice quality, cannot be measured reliably in this recording at all.
Sample 2 shows an example in which the background noise, here vehicular traffic, comes and goes. As with sample 1, the microphone in sample 2 wasn't close enough to the subject's mouth. Even so, vowel formants in parts of the recording are quite readable. When the vehicular noise gets really loud, as when the subject says I have, the formants become hard or impossible to read. In a case like this, it's best to exclude vowels spoken during the loud noises.
Sample 3 exhibits a couple of problems. First, there's some sort of buzz, probably caused by a bad connection between the microphone and the recorder. Second, a telephone rings. These two problems affect the LPC formant track in different ways. The buzzing gives the formant track a saw-edge pattern where it occurs. Formants are clearly visible during the buzzing, but you should remember that you can't get much precision in the formant readings because the buzzing affects sound frequencies. Some such problems may distort the frequencies even more seriously, rendering any measurements unreliable. The telephone, which consists of a series of tones at steady frequencies, dominates the formant track in places. For the most part, the speech has a greater amplitude than the telephone, so you can get formant readings, but the ring tones probably pull the formant track toward them and thus they aren't completely reliable.
Sample 4 has two different species of birds in the background. The grackle, in the earlier part of the sample, has great enough amplitude that it obliterates the subject's higher formants. The lower formants can be measured reliably, though. The dove has much lower amplitude and doesn't dominate the formant track when the subject is speaking. The dove call may exert some pull on the formant track, but not much because it's so soft. For the most part, you can trust the subject's formants while the dove calls.
Sample 5 has other kinds of background noise. Here, people are talking and a radio is playing during an interview. You have to be especially careful with this sort of background noise. You can often distinguish bird vocalizations or mechanical noises from a human voice by their patterning, both on the spectrogram and in the formant track. Extra human voices, though, show formant tracks just like those of the interviewee, so you don't always know for sure whether a particular reading is from your subject or from somebody in the background. It's important to double-check, both by examining the spectrogram and by listening to the recording, so that you're absolutely certain that a measurement is from the right person. If in doubt, don't use the measurement.
Sample 6 shows a very different recording. This one was made in a quiet situation, but it's an old recording, originally made on an acetate disc in 1941. It's also been copied over a number of generations, and some of the copying may have involved amplification of certain frequencies. As you can see when you make a spectrogram of sample 6, frequencies below about 1000 Hz have relatively high amplitude, while those above 1000 Hz are faint. Compounding the problem is the speaker's high F0. This recording is usable, but you may have to break a rule or two to get readings out of it. When you use LPC on this recording, you'll notice that you get too many formant readings below 1000 Hz and not enough above 1000 Hz. You could try to amplify the higher frequencies and damp the lower ones to equalize the amplitude. Another strategy is to use a fewer LPC coefficients for formants (mostly F1) below 1000 Hz and more coefficients for higher formants. This is usually a bad practice, but it's necessary for this recording. One other thing you can try is to change the upper limit of the LPC frequency range. It's crucially important to know where to expect the formants for each vowel so that you recognize when you have too many or too few formant readings.
Sample 7 shows part of the same interview as sample 6, but with filtering to correct the problems seen in sample 6: amplitudes below 1000 Hz have been damped and those above 1000 Hz have been amplified. LPC has an easier time with this version. Again, though, you have to know where to expect the formants.
Exercise 1: Samples of one individual's speech for measurement of the duration of vowel tokens
Exercise 2: Vowel tokens for practice on varying LPC settings
Exercise 3: Examples of heed, hid, hayed, head, had, hod, hawed, HUD, hoed, hood, who'd, heard, and hold, spoken normally (in a modal voice setting) and while yawning.
Exercise 4: Files for practice with vowel normalization techniques. Soundfiles are provided of a male and a female speaker. Spreadsheets with vowel formant data from two different speakers, a male and a female speaker from the same community as each other, are also provided. Normalization can be performed at https://slaap.chass.ncsu.edu/tools/norm/
Exercise 5: Samples of speech for measuring the onset, midpoint, offset, and duration for fifty tokens of a vowel for a single speaker. You should choose a vowel for which there are adequate numbers of tokens.
Exercise 6: Samples of speech for measuring the duration and formant values of the nucleus or glide of at least 25 tokens of a particular diphthong for a single speaker.
Special Feature: Mergers
Determine how many phonological contrasts each of the following speakers makes from the lot, cloth, and thought classes and, if more than one contrasting class is involved, how the words are classified.
Exercise 1: Examples of conversational speech for construction of "Henderson graphs".
Exercise 2: Examples of conversational speech for comparison of speaking rate and articulation rate.
Exercise 3: Examples of sentences, read by a native speaker of Mandarin, for practice measuring F0 of lexical tones.
Exercise 4: Examples of utterances for practice measuring prosodic rhythm with different computational methods.
Exercise 5: Examples of various tones. For some files, textgrids are provided. Look at the blank textgrids first for practice labeling the tones and break indices, and then compare your results with the answer textgrids.
- Edge tones in MAE_ToBI
- Pitch accents in MAE_ToBI
- Boundaries. Note the differences between an IP boundary, an ip boundary, and no boundary.
- Accentual Phrases. Accentual Phrases are recognized for certain languages, such as French, Tongan, and Korean. In most languages with Accentual Phrases, the phrase is marked by a high tone at or near its end, as in the French examples given here. In the French examples, the tonal transcription follows that of Jun and Fougeron (2002), in which the default pattern is for each Accentual Phrase to have four tones called L, H, L, and H*. However, the H and one of the L tones are optional, and the H* is superseded by an edge tone. African American English, however, usually has low tones at the end of what might qualify as Accentual Phrases. Compare the samples from African American English and see what you think. Does African American English have Accentual Phrases, or is there another explanation for its patterning? La and Ha represent edge tones for Accentual Phrases in the textgrids for African American English. Note that not all syllables with lexical stress get a high tone in African American English.
- Practice sentences. Each of the following sentences is spoken using different intonational patterns. Fill in the tones on the blank textgrids and then compare your results with the answer textgrids.
Exercises 6 and 7: Analyze the samples that were used for Exercises 1 and 2.
Exercise 8. Examples of utterances for computing linear regression slopes of F0 across the entire utterance.
Exercises 1 and 2: Samples for comparing mean and median F0 values and performing LTAS analysis. Remember to convert Hz to ERB before computing the mean.
Exercise 3: Examples of modal and breathy realizations of a few words.
Exercise 4: Examples of creaky realizations of a few words.
Exercise 5: Analyze examples from the earlier exercises.
Exercise 6: Example utterances of bud and bun for analysis of nasality. You may also find useful the nasal samples from Chapter 5, Exercise 2, which you can compare with the modal samples for Exercise 3 in this chapter.
A fair amount of experimental study has examined factors that influence vowel duration and vowel reduction in recent years, and there are a number of studies that postdate those cited in the text. New factors have come to light in the process, including phonological neighborhood density (the number of words that are phonologically similar to the word being affected), the commonness of the word, and the number of times it is repeated in a situation. Smaller neighborhood densities, increased commonness, and more repetitions reduce the duration and increase the amount of reduction. Most of these studies have been conducted in laboratory situations and it's uncertain to what extent they play out in real-life conversations, though Smiljanic and Bradlow (2005) compared clear laboratory speech against conversational speech. Here are some of these studies:
Aylett, Matthew, and Alice Turk. 2004. The smooth signal redundancy hypothesis: A functional explanation for relationships between redundancy, prosodic prominence, and duration in spontaneous speech. Language and Speech 47:31-56.
Baker, Rachel E., and Ann R. Bradlow. 2009. Variability in word duration as a function of probability, speech style, and prosody. Language and Speech 52:391-413.
Bell, Alan, Jason M. Brenier, Michelle Gregory, Cynthia Girand, and Dan Jurafsky. 2009. Predictability effects on durations of content and function words in conversational speech. Journal of Memory and Language 60:92-111.
Drager, Katie K. 2010. Sensitivity to grammatical and sociophonetic variability in perception. Laboratory Phonology 1:93-120.
Jurafsky, Dan, Alan Bell, Michelle Gregory, and William D. Raymond. 2001. Probabilistic relations between words: Evidence from reduction in lexical production. In Joan Bybee and Paul J. Hopper (eds.), Frequency and the Emergence of Linguistic Structure. Typological Studies in Language 45. Amsterdam/Philadelphia: John Benjamins. 229-54.
Munson, Benjamin, and Nancy Pearl Solomon. 2004. The effect of phonological neighborhood density on vowel articulation. Journal of Speech, Language, & Hearing Research 47:1048-58.
Smiljanic, Rajka, and Ann R. Bradlow. 2005. Production and perception of clear speech in Croatian and English. Journal of the Acoustical Society of America 118:1677-88.
van Bergem, Dick R. 1993. Acoustic vowel reduction as a function of sentence accent, word stress, and word class. Speech Communication 12:1-21.
There have also been experimental studies besides those cited in the book arguing for or against Exemplar Theory and for or against the related issue of underspecification. McLennan, Luce, and Charles-Luce (2003; see the reference in the book), argue against Exemplar models. Lahiri and Marslen-Wilson (1991) favor underspecification. Scott and Cutler (1984), Evans and Iverson (2004), Floccia et al. (2006), and Sumner and Samuel (2009; see the reference in the book), however, found evidence for improvement in perception of a dialect with greater exposure to a dialect. Similarly, Clopper and Pisoni (2004, 2006) found that listeners who had lived in more than one region performed better at dialect identification. Coupled with evidence from Munro et al. (1999; see the reference in the book) and Evans and Iverson (2007) that subtle changes occur in adults' speech production when they move to a new dialect area, this evidence would favor an Exemplar account. Some of these studies have employed the prime/probe, or prime/target, method in experiments. This method was briefly described in chapter 3. It involves playing two words, the first one called the prime and the second called the probe or target, and then asking subjects to react to the probe/target in some way, such as identifying whether it's a real word or not. The prime may be semantically related, phonologically similar, or unrelated and dissimilar to the probe/target. Note that the prime, though usually heard immediately before the probe/target in experiments, may be heard many minutes or even hours beforehand. In general, semantic relatedness shortens reaction times by listeners to the probe/target, while phonological similarity does not, and this information is used to examine cognitive processing.
Clopper, Cynthia G., and David B. Pisoni. 2004. Homebodies and army brats: Some effects of early linguistic experience and residential history on dialect categorization. Language Variation and Change 16:31-48.
Clopper, Cynthia G., and David B. Pisoni. 2006. Effects of region of origin and geographic mobility on perceptual dialect categorization. Language Variation and Change 18:193-221.
Evans, Bronwen G., and Paul Iverson. 2004. Vowel normalization for accent: An investigation of best exemplar locations in northern and southern British English sentences. Journal of the Acoustical Society of America 115:352-61.
Evans, Bronwen G., and Paul Iverson. 2007. Plasticity in vowel perception and production: A study of accent change in young adults. Journal of the Acoustical Society of America 121:3814-26.
Floccia, Caroline, Jeremy Goslin, Frédérique Girard, and Gabriel Konopczynski. 2006. Does a regional accent perturb speech processing? Journal of Experimental Psychology: Human Perception and Performance 32:1276-93.
Lahiri, Aditi, and William Marslen-Wilson. 1991. The mental representation of lexical form: A phonological approach to the recognition lexicon. Cognition 18:245-94.
Scott, Donia R., and Anne Cutler. 1984. Segmental phonology and the perception of syntactic structure. Journal of Verbal Learning and Verbal Behavior 23:450-66.
Experimental research on how variation is cognitively processed has been increasing over the past few years. McLennan et al. (2003; see the reference in the book) and Pitt (2009; see the reference in the book) examined the tapping of intervocalic coronal stops in American English, and Gaskell and Marslen-Wilson (1996) and Gow (2002, 2003) examined assimilation in consonantal sequences. These studies, often measuring processing speeds and/or using a prime/target design, have tested whether lexical representations or phonological abstractions are involved in the cognition of speech, with mixed results.
Gaskell, M. Gareth, and William D. Marslen-Wilson. 1996. Phonological variation and inference in lexical access. Journal of Experimental Psychology: Human Perception and Performance 22:144-58.
Gow, David W., Jr. 2002. Does English coronal place assimilation create lexical ambiguity? Journal of Experimental Psychology: Human Perception and Performance 28:163-79.
Gow, David W., Jr. 2003. Feature parsing: Feature cue mapping in spoken word recognition. Perception & Psychophysics 65:575-89.
© Erik R. Thomas
last mod: 4/10/2014 TK