Grok dropped recently and I thought there was something interesting people were missing in the noise.
The groundbreaking discovery is how Grok 4 heavy approaches problems. It’s a novel approach we’ll see this in other models. Everyone is expecting ChatGPT 5 soon and Gemini will answer in some way not too long from now.
The Supergrok 4 Heavy (gotta love the naming) uses a mixture of Agents to get maximum performance. It excels in some of the hardest tests. These include one called HLE (Humanity’s last exam) as well as the AIME exam. These tests are getting so hard that sole humans would be unable to solve problems they pose.
In order to get the best scores, Grok spins up multiple agents tries to solve a problem. The agents then compare how they answered a question and agree on which one of them solved the problem the best way.
This is a great way to solve for those extremely complex problem, but how about solving real world problems?
That’s where the Vending Bench test pits different models. Google is now sitting at the top of that benchmark with Cladue famously having failed a similar test in the real world.
This is similar to something we had before called Mixture of experts where each model tried to solve a problem by using different sub-models. But now, the agents are autonomous separate entities with now access to tools.
The Evolution of AI
So here’s the evolution of AI model/technologies that we’ve seen over time and with Grok, we did arrive at a new way of dealing with complex problems.
- Large Language Models
- Omnimodels (many modalities)
- Mixture of Experts
- Reasoning
- Tool Usage
- Multi Agents
On the down side, this is clearly a way to rate higher to try and solve for the hardest tests that are going to give Grok the best looking benchmarks. Grok’s strength has always been its access to real time data and because of its sometimes controversial leader and list of faux pas (it’s not the only one whose models have gone off the rails), it’s often ignored, but that’s a mistake.

AI leaders like Musk and Altman often give their models aspirational names like Stargate and in Elon’s case Colossus. I think it clearly refers to this amazing Sci-fi movie from the 70’s that shows you a dystopian fantasy of what might happen for better or worse if we give an AI the tools to control governments.