User’s Choice of Images and Text to Express Emotions in Twitter and Reddit (ITEM)

Emotions are, next to propositional information, a main ingredient of human interaction. In contrast to information extraction methods, which focus on facts and relations, emotion analysis received comparably little attention and is not yet well understood computationally. Two popular subtasks in emotion analysis in natural language processing are emotion categorization and emotion stimulus detection. For emotion categorization, text is classified into predefined categories, for instance joy, sadness, fear, anger, disgust, and surprise. In stimulus detection, textual segments that describe what happened that caused an associated emotion need to be identified. For instance, the text "I am so happy that my mother will visit me" is associated with joy and the phrase "my mother will visit me" describes the stimulus event. Next to natural language processing, visual computing has also been applied to emotion categorization, for instance to interpret facial emotion expressions, estimate the impact of artistic peaces on a person, or evaluate depicted events or objects. Further, stimulus detection has seen a similar counterpart to NLP, in which relevant regions in images have been detected. However, no previous work in visual computing exists which puts together whole scenes (with relations between depicted objects and places) for emotion stimulus detection; particularly not informed by emotion theories (which has been done for NLP). In the project, we advance the state of the art in several directions: (1), we will develop appraisal-theory-based interpretations of images from social media regarding their emotional connotation and stimulus depiction. (2), we will combine this research with our previous work on emotion categorization and stimulus detection in text to develop multimodal approaches. (3), we will do that from both the perspective of the author of a social media post (which emotion is she expressing?) and the intended or probable emotion of a reader (what emotion does an author want to cause, which emotion might a reader feel?). We will therefore contribute to multimodal emotion analysis and ensure that emotion-related information is not missed or misinterpreted in social media communication because computational models do, so far, not have access to the complete picture. Further, we will answer research questions about how users of social media communicate their emotions, what influences their choices of modality and what the relation between the modalities is.

The project starts in May 2024 and is funded by the German Research Foundation (DFG). Carina Silberer (Uni Stuttgart) is a Co-Project Leader.


Khlyzova, Anna/Silberer, Carina/Klinger, Roman (2022): On the Complementarity of Images and Text for the Expression of Emotions in Social Media. In: Proceedings of the 12th Workshop on Computational Approaches to Subjectivity, Sentiment & Social Media Analysis. Association for Computational Linguistics. S. 1–15.

Cevher, Deniz/Zepf, Sebastian/Klinger, Roman (2019): Towards Multimodal Emotion Recognition in German Speech Events in Cars using Transfer Learning. In: Proceedings of the 15th Conference on Natural Language Processing (KONVENS 2019). German Society for Computational Linguistics & Language Technology. S. 79–90.

Klinger, Roman (2017): Does Optical Character Recognition and Caption Generation Improve Emotion Detection in Microblog Posts?. In: Natural Language Processing and Information Systems: 22nd International Conference on Applications of Natural Language to Information Systems, NLDB 2017, Liège, Belgium, June 21-23, 2017, Proceedings. Cham: Springer International Publishing. S. 313–319.