Donate or Create Dataset
This page refers you to the supplementary material for the following paper:
Christopher Bagdon, Aidan Combs, Carina Silberer, and Roman Klinger. 2025. Donate or Create? Comparing Data Collection Strategies for Emotion-labeled Multimodal Social Media Posts. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 17307–17330, Vienna, Austria. Association for Computational Linguistics. https://aclanthology.org/2025.acl-long.847/
The dataset is available upon request: here
1. Overview
This page is for the Donate or create multimodal (text + image) dataset. We collected the dataset using two main stratgeies: (1) Donate: study participants donated social media posts they had previously written which were about an event which triggered the target emotion, then annotated them. (2) Create: Study participants were prompted to remember an event which triggered the target emotion, then asked to create a social media post about the event, and finally annotate the post. Additionally there is a smaller subset of Recent posts, in which participants were asked to donate their most recent posts and annotate them for the emotion they experienced, rather than searching for posts about a target emotion. The corpus contains 1,200 posts per Donate and Create, balanced by emotion, and 200 Recent posts. More details on the data and methods can be found in the associated paper.
2. Content folder
The main folder contains the following files:
all.csv: file contains annotations
images: subfolders contain images associated with posts, divided by collection strategy. Images are linked to posts via post_id in all.csv and image file name.
README.md: this file
LICENSE: Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License
3. Citation
Please cite this paper as:
@inproceedings{bagdon-etal-2025-donate,
title = "Donate or Create? Comparing Data Collection Strategies for Emotion-labeled Multimodal Social Media Posts",
author = "Bagdon, Christopher and
Combs, Aidan and
Silberer, Carina and
Klinger, Roman",
editor = "Che, Wanxiang and
Nabende, Joyce and
Shutova, Ekaterina and
Pilehvar, Mohammad Taher",
booktitle = "Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
month = jul,
year = "2025",
address = "Vienna, Austria",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.acl-long.847/",
doi = "10.18653/v1/2025.acl-long.847",
pages = "17307--17330",
ISBN = "979-8-89176-251-0",
}
4. License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. This means that the repository is freely available for academic purposes or individual research, but any other use is explicitly prohibited. Moreover, any derivative work (e.g., re-using or modifying the existing dataset) has to be distributed under the same terms and conditions. If you want to use the data for commercial purposes, please contact the authors (see contact details below).
5. Contact
For any questions regarding the dataset, do not hesitate to contact us at:
Christopher.Bagdon(at)uni-bamberg.de
Roman.Klinger(at)uni-Bamberg.de
