Andrea Schnall

Speaker Adaptation for Word Prominence Detection


Front	Back

ISBN:

978-3-8440-5921-2

Series:

Informationstechnik

Keywords:

Prosody; Word Prominence Detection; Speaker Adaptation; fMLLR; SVM

Type of publication:

Thesis

Language:

English

Pages:

140 pages

Figures:

32 figures

Weight:

197 g

Format:

21 x 14,8 cm

Binding:

Paperback

Price:

45,80 € / 57,30 SFr

Published:

May 2018

Buy:

Download:

Available PDF-Files for this title:

You need the Adobe Reader, to open the files. Here you get help and information, for the download.

These files are not printable.


	Document		Document
	Type		PDF
	Costs		34,35 EUR
	Action		Purchase in obligation and display of file - 4,6 MB (4801525 Byte)
	Action		Purchase in obligation and download of file - 4,6 MB (4801525 Byte)


	Document		Table of contents
	Type		PDF
	Costs		free
	Action		Display of file - 170 kB (173659 Byte)
	Action		Download of file - 170 kB (173659 Byte)

User settings for registered users

You can change your address here or download your paid documents again.

User:	Not logged in.
Actions:	Login / Register Forgotten your password?

Recommendation:

You want to recommend this title?

Review copy:

Here you can order a review copy.

Link:

You want to link this page? Click here.

Export citations:

Text
BibTex
RIS

Abstract:

The goal of this dissertation is to investigate methods for word prominence detection in speech. In human communication prosodic cues such as word prominence play an important role: We emphasize words to mark them important and indicate the informational focus in a sentence. Speech recognition systems currently do not use this information and are therefore not very intuitive and error-prone.

In this thesis, a system to distinguish prominent and non-prominent words is presented. Several different feature choices in the audio and video domain are investigated; furthermore, several different classifiers with different characteristics are examined. One aspect to be evaluated here is the usage of context information on the feature level as well as on the classifier level. It will be shown, that plenty of information is incorporated in the neighboring words. Therefore, the whole sequence should be used for classification.

The study will be especially concerned with the performance difference between speaker-dependent and speaker-independent trained systems. To overcome the problem of variations from a pool of speakers and the resulting performance loss, a new adaptation method is presented. Common speaker adaptation methods, used for speech processing, are designed for Gaussian Mixture Models/Hidden Markov Models based classifiers. This thesis shows that for the problem of word prominence detection, a discriminative classifier, such as the Support Vector Machines, performs best, but until now has not been combined adequately with common speaker adaptation methods. Therefore, a new method, based on Support Vector Machines with Radial Basis Function kernel, and their two extensions are presented and evaluated. Ultimately, the thesis shows that this method can significantly improve performance for speaker-independent classification when only a small amount of speaker-specific data is available.