
“CrossMPI can steer the model’s interpretation of both textual and visual inputs via image-only prompt injection,” the researchers wrote in the paper.
Unlike traditional prompt injection attacks, which typically rely on malicious text instructions embedded in prompts or webpages, the new technique attempts to change how the model interprets a benign user request by manipulating images alone.
“The perturbed image can manipulate the model’s understanding of the user’s instruction,” the paper said.
In one example described in the paper, researchers subtly modified an image of an airplane using nearly imperceptible pixel-level perturbations invisible to human users. When a multimodal AI system was then asked whether the airplane belonged to Air Canada, the manipulated image caused the model to incorrectly identify the object as “a mobile phone,” illustrating how the attack could distort both visual understanding and interpretation of the user’s task.
