Because semantic information such as arousal, valence, or abstract emotional label(i.e. happy, sad, angry…) are often manually gathered, speech emotion recognition(SER) research usually suffers from scarcity of emotional label of speech.

Our few-shot technologies can be applied to this data scarcity conditions, based on transfer learning and multi-task learning. Then the ANN model can be derived into high performances even in the condition of data shortage.

http://humelo.dothome.co.kr/data/file/research/2038883233_bD9f5CJi_1bb6daba3a1bf667635e929f10779a987fbee590.png

http://humelo.dothome.co.kr/data/file/research/2038883233_m3KLYk6q_828b34c005a53fa06b622db7aa45b7749cd288e8.png

(Blue: not adjusting humelo's few-shot technology on our DNN model)

(Red: applying humelo's few-shot technology on our DNN model)