This is the demo page for the paper "Improving Emotional TTS with an Emotion Intensity Input from Unsupervised Extraction" submitted to SSW'21. It is currently for review purpose only.
| System | Samples | ||||||
|---|---|---|---|---|---|---|---|
| angry | sad | happy | fearful | surprised | happy | neutral | |
| baseline | |||||||
| attention | |||||||
| transformer | |||||||
| rank | |||||||
| copy synth | |||||||
UI of the listening test. 25 samples were randomly selected. Each one had to be rated on 5-scale MOS and in terms of perceived emotion at the same time.
| Scaling | Samples | ||||||
|---|---|---|---|---|---|---|---|
| angry | sad | happy | fearful | surprised | happy | neutral | |
| 0 | |||||||
| 1 | |||||||
| 4 | |||||||
| 7 | |||||||
| 10 | |||||||