A new artificial intelligence system has demonstrated the ability to predict labour market shocks up to two weeks before government statistics by analysing the “digital traces” of job seekers on social media.
The study, published in PNAS Nexus, reveals that the “JoblessBERT” model can identify subtle signals of economic distress — including slang and misspelt posts — that traditional rule-based systems miss. By processing data from 31.5 million X (formerly Twitter) users, the system outperforms professional forecasters and provides critical early warnings during economic crises.
“This episode epitomises that timely and disaggregated information about the labour market is vital for economic well-being,” the authors note.
Stress tests
The system’s capabilities were starkly illustrated during the onset of the COVID-19 pandemic, a period the authors describe as a “stress test” for forecasting models. In the week ending 21 March 2020, professional consensus models predicted just 327,200 unemployment claims, completely missing the magnitude of the unfolding crisis.
In contrast, JoblessBERT detected a substantial surge in unemployment disclosures, forecasting 2.66 million claims — a figure that closely mirrored the government’s actual 2.9 million claims released days later.
Unlike previous attempts to track unemployment via social media, which relied on rigid keyword lists, JoblessBERT uses a fine-tuned transformer model trained via “Active Learning”. This approach enables the system to understand context and capture 13 times as many relevant users as rule-based models.
Signs of distress
The model successfully identifies non-standard expressions of distress such as “needa job” or “neeeeeed a job”, which standard tools typically ignore.
To ensure accuracy, the researchers applied “post-stratification” techniques to adjust the data for the platform’s skewed demographics. While social media users do not perfectly represent the general population, the model reweights inputs based on inferred age, gender and location to align with US Census Bureau estimates.
The result is a 54.3 per cent reduction in forecasting errors compared to industry baselines. The researchers also demonstrated that the model performs at the subnational level, accurately tracking unemployment trends across individual states and cities, where official data are often delayed or irregular.