Can Base Chatgpt Be Used For Forecasting Without Additional Optimization?
Pham Van, Cunningham Scott. Arxiv 2024
[Paper]
Applications
Efficiency And Optimization
GPT
Model Architecture
Prompting
RAG
Reinforcement Learning
Training Techniques
This study investigates whether OpenAI’s ChatGPT-3.5 and ChatGPT-4 can
forecast future events. To evaluate the accuracy of the predictions, we take
advantage of the fact that the training data at the time of our experiments
(mid 2023) stopped at September 2021, and ask about events that happened in
- We employed two prompting strategies: direct prediction and what we call
future narratives which ask ChatGPT to tell fictional stories set in the future
with characters retelling events that happened in the past, but after ChatGPT’s
training data had been collected. We prompted ChatGPT to engage in
storytelling, particularly within economic contexts. After analyzing 100
trials, we find that future narrative prompts significantly enhanced
ChatGPT-4’s forecasting accuracy. This was especially evident in its
predictions of major Academy Award winners as well as economic trends, the
latter inferred from scenarios where the model impersonated public figures like
the Federal Reserve Chair, Jerome Powell. As a falsification exercise, we
repeated our experiments in May 2024 at which time the models included more
recent training data. ChatGPT-4’s accuracy significantly improved when the
training window included the events being prompted for, achieving 100% accuracy
in many instances. The poorer accuracy for events outside of the training
window suggests that in the 2023 prediction experiments, ChatGPT-4 was forming
predictions based solely on its training data. Narrative prompting also
consistently outperformed direct prompting. These findings indicate that
narrative prompts leverage the models’ capacity for hallucinatory narrative
construction, facilitating more effective data synthesis and extrapolation than
straightforward predictions. Our research reveals new aspects of LLMs’
predictive capabilities and suggests potential future applications in
analytical contexts.
Similar Work