Fake News In Sheep's Clothing: Robust Fake News Detection Against Llm-empowered Style Attacks · The Large Language Model Bible Contribute to LLM-Bible

Fake News In Sheep's Clothing: Robust Fake News Detection Against Llm-empowered Style Attacks

Jiaying Wu, Jiafeng Guo, Bryan Hooi. Arxiv 2023 – 20 citations

[Paper]    
Reinforcement Learning Security Training Techniques

It is commonly perceived that fake news and real news exhibit distinct writing styles, such as the use of sensationalist versus objective language. However, we emphasize that style-related features can also be exploited for style-based attacks. Notably, the advent of powerful Large Language Models (LLMs) has empowered malicious actors to mimic the style of trustworthy news sources, doing so swiftly, cost-effectively, and at scale. Our analysis reveals that LLM-camouflaged fake news content significantly undermines the effectiveness of state-of-the-art text-based detectors (up to 38% decrease in F1 Score), implying a severe vulnerability to stylistic variations. To address this, we introduce SheepDog, a style-robust fake news detector that prioritizes content over style in determining news veracity. SheepDog achieves this resilience through (1) LLM-empowered news reframings that inject style diversity into the training process by customizing articles to match different styles; (2) a style-agnostic training scheme that ensures consistent veracity predictions across style-diverse reframings; and (3) content-focused veracity attributions that distill content-centric guidelines from LLMs for debunking fake news, offering supplementary cues and potential intepretability that assist veracity prediction. Extensive experiments on three real-world benchmarks demonstrate SheepDog’s style robustness and adaptability to various backbones.

Similar Work