Personal tools
You are here: Home Workshop Theme

Workshop Theme

Research on various aspects of paralinguistic and extralinguistic speech has gained considerable importance in recent years. On the one hand, models have been proposed for describing and modifying voice quality and prosody related to factors such as emotional states or personality. Such models often start with high-intensity states (e.g., full-blown emotions) in clean lab speech, and are difficult to generalise to everyday speech. On the other hand, systems have been built to work with moderate states in real-world data, e.g. for the recognition of speaker emotion, age, or gender. Such models often rely on statistical methods, and are not necessarily based on any theoretical models.

While both research traditions are obviously valid and can be justified by their different aims, it seems worth asking whether there is anything they can learn from each other. For example: "Can models become more robust by incorporating methods used for dealing with real-world data?"; "Can recognition rates be improved by including ideas from theoretical models?"; "How would a database need to be structured so that it can be used for both, research on model-based synthesis and research on recognition?" etc.

While the workshop will be open to any kind of research on paralinguistic speech, the workshop structure will support the presentation and creation of cross-links in several ways:

  • papers with an explicit contribution to cross-linking issues will stand a higher chance to be accepted as oral papers;
  • sessions and proceedings will include space for peer comments and answers from authors;
  • poster sessions will be organised around cross-cutting issues rather than traditional research fields, where possible.

We therefore encourage prospective participants to place their research into a wider perspective. This can happen in many ways; as illustrations, we outline two possible approaches.

1. In application-oriented research, such as synthesis or recognition, a guiding principle could be the requirements of the "ideal" application: for example, the recognition of finely graded shades of emotions, for all speakers in all situations; or fully natural-sounding synthesis with freely specifiable expressivity; etc. This perspective is likely to highlight the hard problems of today's state of the art, and a cross-cutting perspective may lead to innovative approaches yielding concrete steps to reduce the distance towards the "ideal".

2. A second illustration of attaining a wider perspective would be to attempt to cross-link work in generative modelling (e.g., expressive speech synthesis) and analysis (e.g., recognition of expressivity from speech). Researchers on generation are invited to investigate the relevance of their work for analysis, and vice versa. What methodologies, corpora or descriptive inventories exist that could be shared between analysis and generation, or at least mapped onto each other? If certain parameters have proven to be relevant in one area, to what degree is it possible to transfer them to the other area? Issues of relevance in this area may include, among other things, personalisation, speaker dependency vs. independency, links between voice conversion in synthesis and speaker calibration in (automatic) recognition or (human) perception, etc.


TOPICS


Paper are invited in all areas related to paralinguistic speech, including, but not limited, to the following topics:

  • prosody of paralinguistic speech
  • voice quality and paralinguistic speech
  • synthesis of paralinguistic speech (model-based, data-driven, ...)
  • recognition/classification of paralinguistic properties of speech
  • analysis of paralinguistic speech (acoustics, physiology, ...)
  • assessment and perception of paralinguistic speech
  • typology of paralinguistic speech (emotion, expression, attitude, physical states, ...)

While all papers must be related to paralinguistic speech, papers making the link with a related area, e.g. investigating the interaction of the speech signal with the meaning of the verbal content, are explicitly welcome.
Document Actions