[Paper]
Large Language Models (LLMs) have catalyzed transformative advances across a
spectrum of natural language processing tasks through few-shot or zero-shot
prompting, bypassing the need for parameter tuning. While convenient, this
modus operandi aggravates hallucination'' concerns, particularly given the
enigmatic
black-box’’ nature behind their gigantic model sizes. Such concerns
are exacerbated in high-stakes applications (e.g., healthcare), where
unaccountable decision errors can lead to devastating consequences. In
contrast, human decision-making relies on nuanced cognitive processes, such as
the ability to sense and adaptively correct misjudgments through conceptual
understanding. Drawing inspiration from human cognition, we propose an
innovative \textit{metacognitive} approach, dubbed \textbf{CLEAR}, to equip
LLMs with capabilities for self-aware error identification and correction. Our
framework facilitates the construction of concept-specific sparse subnetworks
that illuminate transparent decision pathways. This provides a novel interface
for model \textit{intervention} after deployment. Our intervention offers
compelling advantages: (\textit{i})~at deployment or inference time, our
metacognitive LLMs can self-consciously identify potential mispredictions with
minimum human involvement, (\textit{ii})~the model has the capability to
self-correct its errors efficiently, obviating the need for additional tuning,
and (\textit{iii})~the rectification procedure is not only self-explanatory but
also user-friendly, enhancing the interpretability and accessibility of the
model. By integrating these metacognitive features, our approach pioneers a new
path toward engendering greater trustworthiness and accountability in the
deployment of LLMs.