AI forecasting tournament tried to predict 2025. It couldn’t.

Two of the smartest people I follow in the KI world Recently sat down Check how the field runs.
One was François Challet, creator of the widespread Cereal library and author of the Bogen-Agi benchmarkWhat tests when AI has achieved “general” or general human intelligence. Chollet has the reputation of a AI bear who strives to empty the most boostery and overoptimistic predictions about where the technology leads. In the discussion, however, Challet said that his schedules have recently become shorter. The researchers had made great progress in what he saw as the main obstacles for the achievement of artificial general intelligence, how the weakness of the models, to remember and apply things that they have learned before.
Register Here To examine the big, complicated problems in the world, and the most efficient opportunities to solve them. Sent twice a week.
Chollets conversation partner – Dwarkesh PatelHis podcast has become the most important place to follow what Top -Ki scientist think -was in the opposite direction in response to his own reporting. While people are great learn continuously Or “at work” Pessimistic has become more pessimistic that AI models can win this ability shortly.
“[Humans are] learn from their mistakes. They take small improvements and efficiency increases while working, “Patel noticed.
All of this is to be said, two very connected, clever people who know the field, and everyone else can come to absolutely reasonable, but contradictory conclusions about the pace of AI progress.
How is someone like me, who is certainly less knowledgeable than Challet or Patel, who is supposed to find out who is right?
The forecastic wars, three years in
One of the most promising approaches that I decided in solving or at least have come from a small group called the the the the Forecast of the research institute.
In the summer of 2022 the institute started what it calls it Existential risk -convictions tournament (XPT short). XPT was intended “To create high -quality forecasts of the risks of humanity in the next century.” For this purpose the researchers (including Penn Psychologist and Predictions of the pioneer Philip Tetlock and Frire boss Josh Rosenberg) surveyed themed experts who could at least be conceivable in the summer of 2022, who were at least able to survive humanity (like AI).
But they also asked “Superforecasters“A group of people who were identified by Tetlock and others who have proven to be unusually precisely in the forecast of events in the past. The Superforecaster group did not consist of experts in existential threats to mankind, but of generalists from a variety of professions with solid predictive options.
At every risk, including AI, there was Large gaps between the Ölich -specific experts and the generalist forecasts. The experts were much more common than the generalists that the risk they could study could either lead to the extinction of man or to the masses. This gap also remained after the researchers introduced the two groups of structured discussions that should identify Why They disagreed.
The two basically had different worldviews. In the event of AI, the fact that experts were of the subject of the fact that the burden of proof should be on skeptics to show why a hyper-intelligent digital species would not be dangerous. The generalists thought that the burden of proof should be the experts to explain why a technology that does not yet exist could kill us all.
So far so insoluble. Fortunately for US observers, every group was not only asked to appreciate long-term risks in the next century, which can not be confirmed soon, but also events in the near future. They were specially commissioned to predict the pace of the AI progress in a short, medium and long -term prerequisite.
In A New paperThe authors – Tetlock, Rosenberg, Simas Kučinskas, Rebecca Ceppas de Castro, Zach Jacobs and Ezra Karger – are declining and evaluating how well the two groups have predicted the three years of the AI progress since the summer of 2022.
In theory, this could tell us which group we should believe. If the affected AI experts would predict what would happen between 2022 and 2025, this may be an indication that they have the longer future of technology read better, and therefore we should give their warnings a greater credibility.
Unfortunately in the words of Ralph Fiennes“Would it be so easy!” It turns out that the three -year results left us without much sense of faith.
Both the AI experts and the superforecasters systematically underestimated the pace of the AI progress. In four benchmarks, the actual performance of state-of-the-art models in the summer of 2025 was better forecast than superforecasters or AI experts (although the latter were closer). For example, superforecasters believed that a KI 2035 would get gold in the international mathematical Olympics. Experts thought in 2030. happens this summer.
“Overall, superforecasters have assigned an average probability of only 9.7 percent to the observed results in these four KI benchmarks,” concluded the report, “compared to 24.6 percent of domain experts”.
This makes the domain experts look better. They sit down light A higher probability that would actually happen – but when they grind the numbers across all questions, the authors came to the conclusion that there was no statistically significant difference in the overall accuracy between the domain experts and superforecasters. In addition, there was no correlation between the exact person how someone projected 2025, and how dangerous he considered to be AI or other risks. Predict remains difficult, especially about the future and particularly About the future of AI.
The only trick that worked reliably was to aggregate the forecasts of all – to bring all predictions together and take the median that made much more precise forecasts than a person or group. We may not know which these fortune tellers are wise, but the crowd remains wise.
Maybe I should have come this result. Ezra Karger, economist and co-author both on the first and on this new XPT paper, it told me When publishing the first paper in 2023 The “in the next 10 years there was really not so much disagreement between groups of people who did not agree on these longer questions.” This means that they already knew that the predictions of the people who worried about AI and the less concerned people were quite similar.
So it should not surprise us too much that a group was dramatically better than the other than the years 2022-2025 predicted. Real disagreement was not about the short -term future of the AI, but about the danger that it represents in medium and long -term, which is naturally more difficult to judge and to assess.
There may be some valuable information that both groups underestimated the rate of the AI progress: Maybe this is a sign that we have all underestimated the technology, and it will continue to improve faster than expected. On the other hand, the predictions in 2022 were made before the publication of Chatgpt in November this year. Who do you remember before the rollout of this app predicts that AI chatbots in work and school would become omnipresent? Don't we already have? knowledge This AI made great jumps in skills in 2022-2025? Does this tell us something about whether the technology may not slow down, which in turn is the key to predicting your long -term threat?
Reading the latest FR report ended up in a location similar to My former colleague Kelsey Piper last year. Piper noted that in the past, people, people, especially exponential trends, not to extrapolate into the future to the future. The fact that relatively few Americans had Covid in January 2020 did not mean that Covid had no threat. This meant that the country stood at the beginning of an exponential growth curve. A similar type of failure would lead to the KI progress and thus a potential existential risk being underestimated.
At the same time, exponential growth in most contexts cannot go on forever. It maximizes at some point. It is noteworthy that, for example: The law of Moore has largely predicted the growth of the microprocessing density For decades, the law of moores has been partly famous because it is unusual for trends to follow human technologies to make a pattern so clean.
“I am increasingly convinced that there is no replacement for the deep into the weeds if you consider these questions,” concluded Piper. “While there are questions that we can answer from the first principles, we can answer. [AI progress] Is not one of them. “
I am afraid – and that is right – and worse, mere respect for experts is not enough either, not if experts do not agree on special features and wide airways. We don't really have a good alternative to try to learn as much as individuals, and fail, wait and see. This is not a satisfactory conclusion to a newsletter – or a calming answer to one of the most important questions that humanity is confronted with, but it's the best I can do.