[Paper]
The advancement of Large Vision-Language Models (LVLMs) has increasingly
highlighted the critical issue of their tendency to hallucinate non-existing
objects in the images. To address this issue, previous works focused on using
specially curated datasets or powerful LLMs (e.g., GPT-3.5) to rectify the
outputs of LVLMs. However, these approaches require either expensive
training/fine-tuning or API access to advanced LLMs to correct the model’s
output post-generation. In this paper, we tackle this challenge by introducing
a framework called Mitigating hallucinAtion via classifieR-Free guIdaNcE
(MARINE), which is both training-free and API-free, and can effectively and
efficiently reduce object hallucinations during the generation process.
Specifically, MARINE enriches the visual context of LVLMs by integrating
existing open-source vision models, and employs classifier-free guidance to
incorporate the additional object grounding features to improve the precision
of LVLMs’ generations. Through comprehensive evaluations across