Muteract: interactive and iterative prompt mutation interface for LLM developers and evaluators

Potta, Mukul Paras

doi:10.1145/3768633.3770129

Muteract: interactive and iterative prompt mutation interface for LLM developers and evaluators

Source

16th International Conference of Human-Computer Interaction (HCI) Design & Research (India HCI 2025)

Date Issued

2025-11-01

Author(s)

Meena, Yogesh Kumar

Mondal, Shouvick

Potta, Mukul Paras

DOI

10.1145/3768633.3770129

Abstract

Large Language Models (LLMs) are next-token predictors trained on massive datasets. However, their use is often restricted to interaction within pristine environments and controlled contexts. While the focus on natural language prompt-driven response generation has increased significantly, there is still limited attention given to how adversarial mutations of prompts affect the responses of LLMs. Adversarial inputs in real-world scenarios can be used to deceive the model and elicit questionable responses. Most existing works on adversarial inputs are based on algorithmic and system-centric approaches rather than capturing critical aspects of human experience and interaction. To address this gap, we introduce Muteract, a human-in-the-loop interactive and iterative prompt mutation interface that facilitates LLM developers and evaluators in applying manually-hard-to-produce byte-level data mutations to input prompts, and analysing variations in responses such as text, audio, image, etc. Performing byte-level perturbations largely makes it possible to generate adversaries using a single interface regardless of the input modality. We implemented Muteract and used it to interact with a state-of-the-art closed-source LLM, gpt-4o-mini. We sampled 116 natural language prompts (text) out of the 738 available in the AdvGLUE developer dataset for classification tasks, demonstrating Muteract’s potential to deceive models and elicit significantly dissimilar responses (text), leading to declines in model accuracy (task-specific) by 15-30 percentage points. Following this, we conducted a pilot study with 26 participants using gpt-4.1, where the task was to prompt the model to elicit responses that violate OpenAI’s Usage Policy. 12 participants were successful within three successive mutations using Muteract. This work demonstrates Muteract’s adversarial capabilities for LLM developers and evaluators. It provides potential use cases for assessing model robustness to noise during training and supporting HCI research, particularly in evaluating resilience to adversarial inputs and aiding red-teaming efforts.

Publication link

https://dl.acm.org/doi/pdf/10.1145/3768633.3770129

URI

https://repository.iitgn.ac.in/handle/IITG2025/33636

Subjects

Human-machine interaction

Eye-tracking

Automatic speech recognition

Multimodal Interfaces

Embedded Systems