Flexible speech: Harnessing prosodic variation



The project aims at a parallel development of optimization-based hierarchical speech modeling, a hierarchical analysis tool based on continuous wavelet transform, and the model adaptation parametric speech synthesis technology to address the task of better theoretical understanding and improved synthesis of style modification. The optimization modeling conceptualizes speech as an action emerging from interplays among adjustable demands of production efficiency, perception efficacy and temporal cohesion. Time-scale wavelet analysis offers a means of disentangling the system of hierarchically organized parameters quantifying continuous adjustments among the different types of influences. These parameters, depicting external factors - time running out, it is noisy - rather than surface signal characteristics, can be used by statistical parametric synthesis systems to modify the speech output in a faithful and theoretically interpretable manner. Subsequent perceptual evaluation of synthetic speech can further inform the underlying modeling effort.

By combining theoretical and technological approaches, the project wants to explore and exploit synergies between the sister fields of theoretical phonetics and speech synthesis. This harmonized effort promises to increase our understanding of how humans vary and modify prosody to accommodate external influences and adjust their speaking style. In addition, the research program has a potential to prepare the way for novel applications in speech technology and industry.
Gällande start-/slutdatum01/09/2017 → …