Header

Shop : Details

Shop
Details
48,80 €
ISBN 978-3-8440-8779-6
Softcover
154 pages
28 figures
228 g
21 x 14,8 cm
English
Thesis
October 2022
Ziyue Zhao
Contributions to Neural Network-Based Speech Processing: Nonlinear Speech Prediction, Decoder Postprocessing, and Perceptual Loss Functions
Speech processing technologies are omnipresent in our daily communication products and services. Neural networks, as powerful data-driven models, have shown promising performance in various research fields, including speech processing. This thesis focuses on neural network-based speech processing, and it can be divided into three parts as follows.

In the field of speech prediction, a nonlinear speech predictor using the echo state network (ESN) is proposed as a novel adaptive prediction approach. This proposed nonlinear predictor shows better prediction performance than all baseline prediction methods in the simulations, including a predictor based on a long short-term memory (LSTM) structure. Second, the field of neural network-based speech enhancement puts focus on loss functions. A novel perceptual weighting filter (PWF) loss function motivated by the weighting filter from code-excited linear prediction (CELP) speech coding is proposed. A fully connected neural network (FCNN) and a convolutional neural network (CNN) are both used to evaluate the proposed loss functions, and the simulation results show their superior performance compared to baselines. Finally, neural network-based postprocessing for the enhancement of coded speech is studied. CNN-based postprocessors are proposed either to directly enhance the raw waveform in an end-to-end fashion, or to enhance the cepstral domain features using analysis synthesis. Furthermore, an advanced network structure, the fully convolutional recurrent network (FCRN), is utilized to enhance coded speech in the frequency domain, with the PWF loss function advantageously applied. The experimental results confirm the effectiveness of the proposed postprocessors with improved speech quality.
Keywords: Speech Coding; Speech Enhancement; Neural Networks
Mitteilungen aus dem Institut für Nachrichtentechnik der Technischen Universität Braunschweig
Edited by Prof. Dr.-Ing. U. Reimers, Prof. Dr.-Ing. T. Kürner, Prof. Dr.-Ing. T. Fingscheidt and Prof. Dr.-Ing. Eduard A. Jorswieck, Braunschweig
Volume 70
Available online documents for this title
You need Adobe Reader, to view these files. Here you will find a little help and information for downloading the PDF files.
Please note that the online documents cannot be printed or edited.
Please also see further information at: Help and Information.
 
 DocumentDocument 
 TypePDF 
 Costs36,60 € 
 ActionDownloadPurchase in obligation and download the file 
     
 
 DocumentTable of contents 
 TypePDF 
 Costsfree 
 ActionDownloadDownload the file 
     
User settings for registered online customers (online documents)
You can change your address details here and access documents you have already ordered.
User
Not logged in
Export of bibliographic data
Shaker Verlag GmbH
Am Langen Graben 15a
52353 Düren
Germany
  +49 2421 99011 9
Mon. - Thurs. 8:00 a.m. to 4:00 p.m.
Fri. 8:00 a.m. to 3:00 p.m.
Contact us. We will be happy to help you.
Captcha
Social Media