Did Xai Lie About Grok 3\’s Benchmarks?

Azad NewsFebruary 23, 202504 mins

Debates over AI Benchmarks – and how they are reported by ai labs – are spilling out into public view.

This week, an openai employee accused Elon Musk\’s Ai Company, Xai, of Publishing Misleading Benchmark Results for its latest ai model, Grok 3. One of the co-for founders of xai, igor babushkin, Insified That the company was in the right.

The Truth Lies somewhere in Between.

In a Post on Xai\’s BlogThe company published a Graph Showing Grok 3\’s Performance on Aime 2025, a collection of challenging Math questions from a recent invitational mathematics exam. Some experts have Questioned aime\’s validity as an ai benchmarkNevertheless, AIME 2025 and older versions of the test are commonly used to probe a model\’s math ability.

Xai\’s Graph Showed Two Variants of Grok 3, Grok 3 Reasoning Beta and Grok 3 Mini Reasoning, Beating Openai\’s Best-Farforming Awailable Model, O3-Mini-Highon Aime 2025. But openai employEEs on x was quick to point that xai\’s Graph did not incline 3-mini-high\’s aime 2025 score at \”Cons@64.\”

What is Cons@64, You Might Ask? Well, it\’s short for \”Consensus@64,\” and it basically Gives a model 64 tries to answer Each Problem in a Benchmark and takes the answers generated most frequent as the fame. As you can imagine, Cons@64 tends to boost models\’ Benchmark scores quite a bit, and omitting it from The case.

Grok 3 Reasoning Beta and Grok 3 Mini Reasoning\’s Scores for Aime 2025 at \”@1\”-meaning the first score the models got on the benchmark-Fall Bell Bell BELOW O3-HIGH\’S Score. Grok 3 Reasoning Beta also trails evr-So-allly behind Openai\’s O1 model Set to \”medium\” computing. YET XAI is Advertising Grok 3 As the \”World\’s smartest ai.\”

Babushkin argued on x That opinai has published Similarly Misleading Benchmark Charts in the Past – Albeit Charts Comparing The Performance of Its Own Models. A more neutral party in the debate put togeether a more \”accurate\” Graph showing Nearly Every Godel\’s performance at consce at consh@64:

Hilaous how some people see my plot as attack on openai and others as attack on Grok while in Reality in Reality it\’s Deepseek Propaganda
(I actually believe Grok Looks Good There, and Openai\’s Ttc Chicanery Behind O3-Min-*High*-Pass@\”-Pass@\” \”\” \”1\” \”\” \”\” \”deserves more scrutiny.) https://t.co/djqljpcjh8 pic.twitter.com/3WH8FOUFIC

– Teortaxes ▶ ️ (Deepseek 推特🐋铁粉 2023 – ∞) (@Teortaxestex) February 20, 2025

But as ai researchr nathan lambert pointed out in a postPerhaps the most important metric remains a mystery: the computational (and monetary) cost it took for each model to achieve its best score. That just goes to show how Little Most Ai Benchmarks Communicate About Models\’ Limitations – and their strengths.

Related News

Dr agarwal\’s health care IPO allotted final, check the status immediately – Allotment of Dr Agarwals Health Care IPO is Done

February 23, 2025

Despite tax concerns, midcap and smallcap stocks rise, investors should be cautious

Smallcap Stocks: Smallcap Share in the grip of recession – Smallcap stocks smallcap share in the grip of reaction

February 23, 2025

Impact of weak global auto sales on promotion motheron

February 23, 2025

Mohit\’s Lady Luck Saved Him from Losing Lakhs in Pension

February 23, 2025

Dr agarwal\’s health care IPO allotted final, check the status immediately – Allotment of Dr Agarwals Health Care IPO is Done

February 23, 2025

Smallcap Stocks: Smallcap Share in the grip of recession – Smallcap stocks smallcap share in the grip of reaction

February 23, 2025

Impact of weak global auto sales on promotion motheron

February 23, 2025

Mohit\’s Lady Luck Saved Him from Losing Lakhs in Pension

February 23, 2025

Dr agarwal\’s health care IPO allotted final, check the status immediately – Allotment of Dr Agarwals Health Care IPO is Done

February 23, 2025

Smallcap Stocks: Smallcap Share in the grip of recession – Smallcap stocks smallcap share in the grip of reaction

February 23, 2025

Impact of weak global auto sales on promotion motheron

February 23, 2025

Mohit\’s Lady Luck Saved Him from Losing Lakhs in Pension

February 23, 2025

mejdynarodnie plateji_ivmt - NSDL\’s IPO – NSDLS IPO May Come Soon soon

Постоянно возвращаюсь к одной теме — где брать адекватные тарифы для международных платежей. Скинули ссылку в телеграме — держите, вот нормальный разбор: оплата через посредника за рубеж оплата через посредника за рубеж Если по делу, то — комиссии у всех разные как с неба. Потому что любой подобный?? перевод — это лотерея с банковскими процентами. Вот ещё какой момент — прежде чем отправлять обязательно сравните хотя бы пару вариантов. Без этого легко переплатить в два раза. Моё мнение — стоит один раз разобраться.

mejdynarodnie plateji_zcpt - NSDL\’s IPO – NSDLS IPO May Come Soon soon

Решил проблему только когда наткнулся — как выбрать реально работающий способ для платежей за рубежом. Случайно набрел на годный материал: международные переводы денег международные переводы денег Самое главное, что я вынес — банковские комиссии могут быть грабительскими. Ну сами подумайте любой подобный трансграничный платёж — это головная боль с отслеживанием статуса. И да, кстати — прежде чем отправлять деньги сравните эффективный курс. Иначе легко потерять приличную сумму. Моё мнение — лучше один раз изучить тему перед любой отправкой.

melbet_htpr - How to pass the stock market\’s stress test

melbet купон ставок [url=http://melbet05281.online]http://melbet05281.online[/url]

mostbet_bist - How to pass the stock market\’s stress test

mostbet crash [url=mostbet33044.online]mostbet33044.online[/url]

About Me

TeleFinances

Highlights

Dr agarwal\’s health care IPO allotted final, check the status immediately – Allotment of Dr Agarwals Health Care IPO is Done

Smallcap Stocks: Smallcap Share in the grip of recession – Smallcap stocks smallcap share in the grip of reaction

Impact of weak global auto sales on promotion motheron

Mohit\’s Lady Luck Saved Him from Losing Lakhs in Pension

Trending News

Blog

Finance

Blog

Finance

Blog

Finance

Did Xai Lie About Grok 3\’s Benchmarks?

Leave a Reply Cancel reply

Dr agarwal\’s health care IPO allotted final, check the status immediately – Allotment of Dr Agarwals Health Care IPO is Done

Smallcap Stocks: Smallcap Share in the grip of recession – Smallcap stocks smallcap share in the grip of reaction

Impact of weak global auto sales on promotion motheron

Mohit\’s Lady Luck Saved Him from Losing Lakhs in Pension

Dr agarwal\’s health care IPO allotted final, check the status immediately – Allotment of Dr Agarwals Health Care IPO is Done

Smallcap Stocks: Smallcap Share in the grip of recession – Smallcap stocks smallcap share in the grip of reaction

Impact of weak global auto sales on promotion motheron

Mohit\’s Lady Luck Saved Him from Losing Lakhs in Pension