Analysis of unsupervised crosslingual speaker adaptation. Pdf thousands of voices for hmmbased speech synthesis. Us6076057a unsupervised hmm adaptation based on speech. Pdf unsupervised crosslingual speaker adaptation for. A block diagram of the hmmbased speech synthesis system. So far, research has been conducted into unsupervised and. The task of speech synthesis is to convert normal language text into speech.
Unsupervised acoustic model adaptation algorithm using. We integrate two techniques, unsupervised adaptation for hmmbased tts using a word. Silence and speech regions are determined either using a speech endpointer or the segmentation obtained from the recognizer in a first pass. Analysis of unsupervised and noiserobust speakeradaptive. Hmmbased speech synthesis minitutorial hmms are used to generate sequences of speech in a parameterised form from the parameterised form, we can generate a waveform the parameterised form contains suf. The application of our research is the personalisation of speechtospeech translation in which we employ a hmm statistical framework for both speech recognition and synthesis. This paper demonstrates how unsupervised crosslingual adaptation of hmm based speech synthesis models may be performed without explicit knowledge of the adaptation data language. Pdf unsupervised intralingual and crosslingual speaker. Unsupervised crosslingual speaker adaptation for hmmbased speech synthesis conference paper pdf available in acoustics, speech, and signal processing, 1988. For unsupervised lm adaptation with limited adaptation data, overtraining may. Supervised adaptation the use of adaptation to create new voices for speech synthesis makes hmmbased speech synthesis very attractive.
Byrne1 1cambridge university engineering department, 2helsinki university of technology introduction twopass decision tree construction evaluation. Thus, an unsupervised crosslingual speaker adaptation system can be developed. Speech database excitation parameter extraction spectral. Unsupervised clustering for expressive speech synthesis. Unsupervised speaker adaptation of dnnhmm by selecting similar speakers for lecture transcription masato mimura and tatsuya kawahara kyoto university, academic center for computing and media studies, sakyoku, kyoto 6068501, japan abstractunsupervised speaker adaptation of deep neural network dnn is investigated for lecture transcription. In this paper we develop, analyze, and evaluate unsupervised interpolation methods that can be used to generate intermediate stages of two language varieties. Flexible speech synthesis based on hidden markov models.
Unsupervised crosslingual speaker adaptation for hmmbased speech synthesis using twopass decision tree construction m. A unified speaker adaptation method for speech synthesis. In hmmbased speech synthesis, speaker adaptation techniques can be used to adapt the. Unsupervised speaker adaptation of dnnhmm by selecting. Unsupervised crosslingual speaker adaptation for hmmbased speech synthesis by keiichiro oura, keiichi tokuda, junichi yamagishi, mirjam wester and simon king download pdf 636 kb. It is now possible to synthesise speech using hmms with a comparable quality to. Speaker adaptation for speech synthesis is the task of creating a new voice for a tts system by adjusting parameters of an initial model.
This paper firstly presents an approach to the unsupervised speaker adaptation task for hmmbased speech synthesis models which avoids the need for such supplementary acoustic models. Junichi yamagishi october 2006 main synthesis, in chinese spoken language processing, 2008. Listening tests show very promising results, demonstrating that adapted. Thus, a core goal of emime is the development of unsupervised crosslingual speaker adaptation for hmmbased tts. In this paper we present results of unsupervised crosslingual speaker adaptation applied to textto speech synthesis. It has been shown that supervised speaker adaptation can yield high quality synthetic voices with an order of magnitude less data than required to train a speakerdependent model or to build a basic unitselection. Twopass decision tree construction for unsupervised adaptation of hmmbased synthesis models matthew gibson cambridge university engineering department, trumpington street, cambridge cb2 1pz, u. Index termsspeaker adaptation, unsupervised adaptation. Analysis of unsupervised and noiserobust speakeradaptive hmmbased speech synthesis systems toward a uni. Unsupervised intralingual and crosslingual speaker adaptation for hmmbased speech synthesis using twopass decision tree construction. Unsupervised adaptation for hmmbased speech synthesis core. Cabral trinity college dublin, ireland the adapt centre is funded under the sfi research centres programme grant rc2106 and is cofunded under the european regional development fund. The training part of hts has been implemented as a modified version of htk and released as a form of patch code to htk. In this paper we present results of unsupervised crosslingual speaker adaptation applied to texttospeech synthesis.
This distinctiveness makes unsupervised crosslingual speaker adaptation one key to the projects success. Twopass decision tree construction for unsupervised adaptation of hmmbased synthesis models. The unsupervised lm adaptation method we employ is based on the statistics of occurrence rates of partofspeech pos classes 9. Analysis of speaker adaptation algorihms for hmmbased speech synthesis and a constrained smaplr adaptation algorithm. A comparison of supervised and unsupervised crosslingualspeaker adaptation approaches for hmm based speech synthesis hui liang1,2, john dines1, lakshmi saheer1,2 1 idiap research institute, martigny, switzerland 2 ecole polytechnique fe. A read is counted each time someone views a publication summary such as the title, abstract, and list of authors, clicks on a figure, or views or downloads the fulltext. Then, by combining the method with unsupervised speaker adaptation based on sufficient statistics and the speaker distance, and hmm synthesis, a highly precise unsupervised integrated adaptation system is constructed. The application of our research is the personalisation of speech to speech translation in which we employ a hmm statistical framework for both speech recognition and synthesis. Pdf unsupervised adaptation for hmmbased speech synthesis. This paper presents an automatic speech recognition based unsupervised adaptation method for hidden markov model hmm speech synthesis and its quality evaluation. However, it still requires high quality audio data with low signal to noise ration and precise labeling. This paper describes the integration of these developments into a single architecture which achieves unsupervised crosslingual speaker adaptation for hmmbased speech synthesis. Pdf some aspects of asr transcription based unsupervised. Frequency warping for speaker adaptation in hmmbased.
Context adaptive training with factorized decision trees for hmmbased speech synthesis kai yu 1, heiga zen2, francois mairesse, and steve young 1 cambridge university engineering department, trumpington street, cambridge, cb2 1pz, uk. The emime project aims to build a personalized speechtospeech translator, such that spoken input of a user in one language is used to produce spoken output that still sounds like the users voice however in another language. Unsupervised intralingual and crosslingual speaker. A comparison of supervised and unsupervised crosslingual. Unsupervised adaptation for hmmbased speech synthesis. Similarly to other datadriven speech synthesis approaches, hts has a compact language. Excitation and spectral parameters are extracted from the waveform and based on the phonetic transcription context dependent labels are calculated. Unsupervised crosslingual speaker adaptation for hmm. Speaker adaptation is not a new topic but a wellresearched one, especially for hmmbased acoustic models of speech synthesis 8 and speech recognition 9. Unsupervised intralingual and crosslingual speaker adaptation for hmmbased speech. Pdf on jan 1, 2009, junichi yamagishi and others published thousands of voices for. The discriminative training procedure using a gpd or any other discriminative training algorithm, employed in conjunction with the hmm. Pdf analysis of speaker adaptation algorithms for hmmbased. It is now possible to synthesise speech using hmms with a com parable quality to unitselection techniques.
By defining a mapping between hmmbased synthesis models and asrstyle models, this paper introduces an approach to the unsupervised speaker adaptation task for hmmbased speech synthesis models which avoids the need for supplementary acoustic models. Generating speech from a model has many potential advantages unsupervised adaptation for hmmbased speech synthesis. Lm adaptation methods have been proposed to cope with the sparseness of the lm data 78. Twopass decision tree construction for unsupervised. Reformulating the hmm as a trajectory model by imposing explicit relationships between static and dynamic. Mixedlanguage synthesis for indian languages with dual acoustic. The patch code is released under a free software license. We demonstrate an endtoend speechtospeech translation system built for four languages american english, mandarin, japanese, and finnish. Context adaptive training with factorized decision trees.
Finally, listener evaluations reveal that the proposed unsupervised adaptation methods deliver performance approaching that of supervised adaptation. Some aspects of asr transcription based unsupervised. Multimodal speech synthesis architecture for unsupervised speaker adaptation hieuthi luong 1and junichi yamagishi. Unsupervised crosslingual speaker adaptation for hmm based speech synthesis using twopass decision tree construction conference paper pdf available in acoustics, speech, and signal processing.
Analysis of speaker adaptation algorithms for hmmbased speech synthesis and a constrained smaplr adaptation algorithm. Speaker adaptation is not a new topic but a wellresearched one, especially for hmmbased acoustic models of speech synthesis and speech recognition. This paper describes an hmmbased speech synthesis system hts, in which speech waveform is generated from hmms themselves, and applies it to english speech synthesis using the general speech synthesis architecture of festival. Speaker adaptation for hmmbased speech synthesis system using mllr masatsune tamura y, takashi masuko, keiichi tokuda, and takao kobayashi y tokyo institute of technology, yokohama, 2268502 japan yy nagoya institute of technology, nagoya, 4668555 japan abstract. In recent years, hidden markov model hmm has been successfully applied to acoustic modeling for speech synthesis, and hmmbased parametric speech synthesis has become a. The hmmdnnbased speech synthesis system hts has been developed by the hts working group and others see who we are and acknowledgments. Flexible speech synthesis based on hidden markov models keiichi tokuda nagoya institute of technology apsipa asc 20, kaohsiung. The hmmbased speech synthesis system hts v ersion 2. An unsupervised, discriminative, sentence level, hmm adaptation based on speechsilence classification is presented.
885 1223 48 1217 673 61 836 255 1354 659 604 815 1042 1379 231 670 1108 152 635 1419 1490 461 539 1206 1105 306 607 1497 37 490 548 264 1117 493 462 1455 588 759