Donate or Create Dataset

This page refers you to the supplementary material for the following paper:

Christopher Bagdon, Aidan Combs, Carina Silberer, and Roman Klinger. 2025. Donate or Create? Comparing Data Collection Strategies for Emotion-labeled Multimodal Social Media Posts. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 17307–17330, Vienna, Austria. Association for Computational Linguistics. https://aclanthology.org/2025.acl-long.847/

The dataset is available upon request: here

1. Overview

This page is for the Donate or create multimodal (text + image) dataset. We collected the dataset using two main stratgeies: (1) Donate: study participants donated social media posts they had previously written which were about an event which triggered the target emotion, then annotated them. (2) Create: Study participants were prompted to remember an event which triggered the target emotion, then asked to create a social media post about the event, and finally annotate the post. Additionally there is a smaller subset of Recent posts, in which participants were asked to donate their most recent posts and annotate them for the emotion they experienced, rather than searching for posts about a target emotion.  The corpus contains 1,200 posts per Donate and Create, balanced by emotion, and 200 Recent posts. More details on the data and methods can be found in the associated paper.

2. Content folder

The main folder contains the following files:

all.csv: file contains annotations

images: subfolders contain images associated with posts, divided by collection strategy. Images are linked to posts via post_id in all.csv and image file name.

README.md: this file

LICENSE: Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

3. Citation

Please cite this paper as:

@inproceedings{bagdon-etal-2025-donate,
   title = "Donate or Create? Comparing Data Collection Strategies for Emotion-labeled Multimodal Social Media Posts",
   author = "Bagdon, Christopher  and
     Combs, Aidan  and
     Silberer, Carina  and
     Klinger, Roman",
     editor = "Che, Wanxiang  and
     Nabende, Joyce  and
     Shutova, Ekaterina  and
     Pilehvar, Mohammad Taher",
   booktitle = "Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
   month = jul,
   year = "2025",
   address = "Vienna, Austria",
   publisher = "Association for Computational Linguistics",
   url = "https://aclanthology.org/2025.acl-long.847/",
   doi = "10.18653/v1/2025.acl-long.847",
   pages = "17307--17330",
   ISBN = "979-8-89176-251-0",
}
4. License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. This means that the repository is freely available for academic purposes or individual research, but any other use is explicitly prohibited. Moreover, any derivative work (e.g., re-using or modifying the existing dataset) has to be distributed under the same terms and conditions. If you want to use the data for commercial purposes, please contact the authors (see contact details below).

5. Contact

For any questions regarding the dataset, do not hesitate to contact us at:

Christopher.Bagdon(at)uni-bamberg.de

Roman.Klinger(at)uni-Bamberg.de