Interactive Prompt Optimization with the Human in the Loop for Natural Language Understanding Model Development and Intervention - Fundamentals of Natural Language Processing

Interactive Prompt Optimization with the Human in the Loop for Natural Language Understanding Model Development and Intervention (INPROMPT)

The paradigm of few-shot or zero-shot learning for the creation of models in algorithmic natural language understanding assumes that little or no annotated text is available for the problem to be solved. Methods in this subject area therefore meet the challenge of relaxing the high data requirements that the optimization of deep neural networks entails. A typical approach is to use pre-trained neural language models and use a prompt to generate a word that describes an instance of text. For example, you can do sentiment polarity classification by entering a text instance such as "The person is very satisfied with the product." associated with a prompt and check whether the sentence "The product is good" or "The product is bad" results in a higher probability. Creating such prompts has the advantage that it does not necessarily require technical expertise, but creating good prompts is still not trivial. Existing research has approached the problem from two perspectives: (1) adapting existing language models using (few) annotated data points and manually generated prompt sets, and (2) using data-driven automatic prompt generation. We combine these two research directions in our project and start with the typical situation in which a language comprehension task is formulated vaguely, a more precise specification is still missing, and no annotated (but certainly non-annotated) texts are available. Our goal is to develop and analyze systems that automatically guide domain experts without technical training in machine learning to create well-functioning prompts. To do this, we use optimization methods that change prompts iteratively and estimate their quality with the help of a target function. This estimation is based on automatic predictions on text instances, based on the readability of the prompt, and based on the conclusiveness of an explanation of the decision-making. In our project, the objective function based on these factors is not automatically evaluated, but replaced by a "human in the loop". However, in order to study the problem of iterative optimization of prompts on a larger scale, we also simulate human decisions using automatic approximations of the human objective function. We expect that our project will significantly improve the transparency of prompt-based models and contribute to the democratization of the use of machine learning algorithms.

The projects starts in July 2024 and is funded by the German Research Foundation (DFG, KL 2869/13-1).

Data

Here you find supplementary material for publications from the INPROMPT project:

Data for “Are Humans as Brittle as LLMs” (Li, Papay, Klinger, AACL 2025)

Publications related to this project

Menchaca Resendiz, Yarik/Klinger, Roman (2026): PARL: Prompt-based Agents for Reinforcement Learning. In: Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026). Paris: European Languages Resources Association (ELRA). S. 6166–6184.

Rauf, Moiz/Papay, Sean (2026): Medical Summarization in Practice: Design, Deployment, and Analysis of a Clinical Summarization System for a German Hospital. In: Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics. S. 455–466.

Li, Jiahui/Klinger, Roman (2025): iPrOp: Interactive Prompt Optimization for Large Language Models with a Human in the Loop. In: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics. S. 276–285.

Li, Jiahui/Papay, Sean/Klinger, Roman (2025): Are Humans as Brittle as Large Language Models?. In: Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics. Mumbai, India: The Asian Federation of Natural Language Processing and The Association for Computational Linguistics. S. 2130–2155.

Menchaca Resendiz, Yarik et al. (2025): Supporting Plain Language Summarization of Psychological Meta-Analyses with Large Language Models. In: Proceedings of The 14th International Joint Conference on Natural Language Processing and The 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics: System Demonstrations. Mumbai, India: Association for Computational Linguistics. S. 25–35.

Menchaca Resendiz, Yarik/Klinger, Roman (2025): MOPO: Multi-Objective Prompt Optimization for Affective Text Generation. In: Proceedings of the 31st International Conference on Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics. S. 5588–5606.

Papay, Sean/Klinger, Roman/Padó, Sebastian (2025): Regular-pattern-sensitive CRFs for Distant Label Interactions. In: Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025). Association for Computational Linguistics. S. 26–35.

Schäfer, Johannes et al. (2025): Which Demographics do LLMs Default to During Annotation?. In: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics. S. 17331–17348.

Bareiss, Patrick/Klinger, Roman/Barnes, Jeremy (2024): English Prompts are Better for NLI-based Zero-Shot Emotion Classification than Target-Language Prompts. In: WWW ’24: Companion Proceedings of the ACM on Web Conference 2024. New York. S. 1318–1326.

Menchaca Resendiz, Yarik/Klinger, Roman (2023a): Affective Natural Language Generation of Event Descriptions through Fine-grained Appraisal Conditions. In: Proceedings of the 16th International Natural Language Generation Conference. Prag: Association for Computational Linguistics. S. 375–387.

Menchaca Resendiz, Yarik/Klinger, Roman (2023b): Emotion-Conditioned Text Generation through Automatic Prompt Optimization. In: Proceedings of the 1st Workshop on Taming Large Language Models: Controllability in the era of Interactive Assistants! Prag: Association for Computational Linguistics. S. 24–30.

Kadiķis, Emīls/Srivastav, Vaibhav/Klinger, Roman (2022): Embarrassingly Simple Performance Prediction for Abductive Natural Language Inference. In: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Seattle: Association for Computational Linguistics. S. 6031–6037.

Plaza-del-Arco, Flor Miriam/Martín-Valdivia, María-Teresa/Klinger, Roman (2022): Natural Language Inference Prompts for Zero-shot Emotion Classification in Text across Corpora. In: Proceedings of the 29th International Conference on Computational Linguistics. Gyeongju: International Committee on Computational Linguistics. S. 6805–6817.