Grok 4 by xAI was released on July 9, and it's surged ahead of competitors like DeepSeek and Claude at LMArena, a leaderboard for ranking generative AI models. However, these types of AI rankings ...
There is a common problem for all AI companies for overfitting to benchmarks. XAI Grok 4 has some problems with prompt adherence. XAI could have had overfitting resulted from the reinforcement ...
Elon Musk-owned xAI is testing Grok 4.20, a new model update to Grok 4, which already competes with GPT-5 in some benchmarks, such as ARC-AGI 2. GPT-5 is one of the best models for coding, and it ...
Artificial Intelligence (AI) is becoming an integral part of daily life, including everyday calculations. But how well do these systems actually handle basic math? And how much should users trust them ...
xAI is preparing the rollout of Grok 4, which replaces Grok 3 as the new state-of-the-art model. Ahead of the rollout, testers on X have again spotted references to a few new Grok 4 models. One of the ...
A grade of 45 might not seem gold star-worthy by old school human exam standards, but that's how xAI's Grok 3 chose to illustrate this column when I interviewed the chatbot on "leaked" rumors that its ...
Is Grok 4.2 the most intelligent coding model we’ve seen yet? With its release in January 2026, this AI powerhouse has already sparked conversations across the tech world. In this comparison, World of ...
What if the future of AI could not only dream up stunning web designs but also code them into reality with unmatched precision? In this overview, Universe of AI explores how Grok 4.2, codenamed ...
A Mathematician with early access to XAI Grok 4.20, found a new Bellman function for one of the problems he had been working on with my student N. Alpay. Not an Erdős problem, but original research.
In what appeared to be a bid to soak up some of Google's limelight prior to the launch of its new Gemini 3 flagship AI model — now recorded as the most powerful LLM in the world by multiple ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results