Show simple item record

dc.contributor.advisorYounan, Nicholas H.
dc.contributor.authorChandra, Nishant
dc.date2007
dc.date.accessioned2020-05-07T18:24:57Z
dc.date.available2020-05-07T18:24:57Z
dc.identifier.urihttps://hdl.handle.net/11668/17269
dc.description.abstractTraditional and commercial speech synthesizers are incapable of synthesizing speech with proper emotion or prosody. Conveying prosody in artificially synthesized speech is difficult because of extreme variability in human speech. An arbitrary natural language sentence can have different meanings, depending upon the speaker, speaking style, context, and many other factors. Most concatenated speech synthesizers use phonemes, which are phonetic units defined by the International Phonetic Alphabet (IPA). The 50 phonemes in English are standardized and unique units of sound, but not expression. An earlier work proposed the analogy between speech and music ? ?speech is music, music is speech.? The speech data obtained from the master practitioners, who are trained in kinesensic voice, is marked on a five level intonation scale, which is similar to the music scale. From this speech data, 1324 unique expressive units, called expressemes®, are identified. The expressemes consist of melody and rhythm, which, in digital signal processing, is analogous to pitch, duration and energy of the signal. The expressemes have less acoustic and phonetic variability than phonemes, so they better convey the prosody. The goal is to develop a speech synthesizer which exploits the prosodic content of expressemes in order to synthesize expressive speech, with a small speech database. To create a reasonably small database that captures multiple expressions is a challenge because there may not be a complete set of speech segments available to create an emotion. Methods are suggested whereby acoustic mathematical modeling is used to create missing prosodic speech segments from the base prosody unit. New concatenated-formant hybrid speech synthesizer architecture is developed for this purpose. A pitch-synchronous time-varying frequency-warped wavelet transform based prosody manipulation algorithm is developed for transformation between prosodies. A time-varying frequency-warping transform is developed to smoothly concatenate the temporal and spectral parameters of adjacent expressemes to create intelligible speech. Additionally, issues specific to expressive speech synthesis using expressemes are resolved for example, Ergodic Hidden Markov Model based expresseme segmentation, model creation for F0 and segment duration, and target and join cost calculation. The performance of the hybrid synthesizer is measured against a commercially available synthesizer using objective and perceptual evaluations. Subjects consistently rated the hybrid synthesizer better in five different perceptual tests. 70% of speakers rated the hybrid synthesis as more expressive, and 72% preferred it over the commercial synthesizer. The hybrid synthesizer also got a comparable mean opinion score.
dc.publisherMississippi State University
dc.subject.lcshSpeech processing systems--Mathematical models.
dc.subject.lcshSpeech synthesis--Mathematical models.
dc.subject.lcshComputer input-output equipment--Mathematical models.
dc.subject.lcshHidden Markov models.
dc.subject.lcshErgodic theory.
dc.subject.otherEmotional speech synthesis
dc.subject.otherSpeech morphing
dc.subject.otherTime-varying frequency-warping
dc.subject.otherExpressive speech synthesizer
dc.subject.otherTTS
dc.titleHybrid Concatenated-Formant Expressive Speech Synthesizer For Kinesensic Voices
dc.typeDissertation
dc.publisher.departmentDepartment of Electrical and Computer Engineering.
dc.publisher.collegeBagley College of Engineering
dc.date.authorbirth1977
dc.subject.degreeDoctor of Philosophy
dc.subject.majorElectrical and Computer Engineering
dc.contributor.committeeFowler, James E.
dc.contributor.committeeDu, Jenny Q.
dc.contributor.committeeMarple, Gary A.


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record