Andrea SchnallSpeaker Adaptation for Word Prominence Detection | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
ISBN: | 978-3-8440-5921-2 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Series: | Informationstechnik | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Keywords: | Prosody; Word Prominence Detection; Speaker Adaptation; fMLLR; SVM | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Type of publication: | Thesis | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Language: | English | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Pages: | 140 pages | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Figures: | 32 figures | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Weight: | 197 g | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Format: | 21 x 14,8 cm | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Binding: | Paperback | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Price: | 45,80 € / 57,30 SFr | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Published: | May 2018 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Buy: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Download: | Available PDF-Files for this title: You need the Adobe Reader, to open the files. Here you get help and information, for the download. These files are not printable.
User settings for registered users You can change your address here or download your paid documents again.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Recommendation: | You want to recommend this title? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Review copy: | Here you can order a review copy. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Link: | You want to link this page? Click here. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Export citations: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Abstract: | The goal of this dissertation is to investigate methods for word prominence detection in speech. In human communication prosodic cues such as word prominence play an important role: We emphasize words to mark them important and indicate the informational focus in a sentence. Speech recognition systems currently do not use this information and are therefore not very intuitive and error-prone.
In this thesis, a system to distinguish prominent and non-prominent words is presented. Several different feature choices in the audio and video domain are investigated; furthermore, several different classifiers with different characteristics are examined. One aspect to be evaluated here is the usage of context information on the feature level as well as on the classifier level. It will be shown, that plenty of information is incorporated in the neighboring words. Therefore, the whole sequence should be used for classification. The study will be especially concerned with the performance difference between speaker-dependent and speaker-independent trained systems. To overcome the problem of variations from a pool of speakers and the resulting performance loss, a new adaptation method is presented. Common speaker adaptation methods, used for speech processing, are designed for Gaussian Mixture Models/Hidden Markov Models based classifiers. This thesis shows that for the problem of word prominence detection, a discriminative classifier, such as the Support Vector Machines, performs best, but until now has not been combined adequately with common speaker adaptation methods. Therefore, a new method, based on Support Vector Machines with Radial Basis Function kernel, and their two extensions are presented and evaluated. Ultimately, the thesis shows that this method can significantly improve performance for speaker-independent classification when only a small amount of speaker-specific data is available. |