When DeepSeek released its R1 claiming it had achieved its generative AI large language model for just $6 million, the billions being spent by U.S. AI market leaders including Microsoft-funded OpenAI immediately came under scrutiny.

The DeepSeek cost analysis remains dogged by skepticism, and faith in OpenAI from investors hasn’t slowed. It is reportedly set to raise a $40 billion financing round at a valuation as high as $300 billion and says revenue will triple this year to $12.7 billion. Hot AI chip name CoreWeave is hoping to revive a shaky IPO market, and kickstart an AI stock offering boom, this week, too. But the worries about whether the AI market is moving too fast, and with spending levels that are too high, haven’t stopped either.

The “Magnificent 7” tech stocks have been among the worst performers in the market year-to-date, and just this week, Alibaba co-founder Joe Tsai warned of an AI bubble in the U.S. he sees signs of being formed. As expectations for AI development and America’s lead in the AI race continue to be recalibrated, the repercussions have reached far and wide, from calls for even tougher chip embargos to slow China, to on the other side, venture capitalists pouring even money into Chinese AI developers. 

But for some in the AI field, it’s full speed ahead within the U.S., as a bargain buying spree in gen AI allows researchers to push the ability of large language model building in ways that hadn’t seemed open to them pre-DeepSeek.

UC Berkeley researchers were among the first to develop a small-scale language model reproduction of DeepSeek – for just $30. That’s the price to rent two Nvidia H200 GPUs on a public cloud and use a simple game to train the “3B” model — a reference to the billions of parameters in the model that are actually at a much lower total “billions” count than the most complex LLMs which can reach into the hundreds of trillions. 

“We started this project immediately after the release of DeepSeek R1,” said TinyZero project leader and campus graduate researcher Jiayi Pan.

Breakthroughs from OpenAI were just as critical to the team’s interest, with Pan saying they were fascinated by a new reasoning paradigm for AI “designed to spend more time thinking before they respond.”

But DeepSeek R1 was the first open research that helped to explain how to achieve this ability to “think” before answering, which improves an AI model’s capability. “We were very curious about how this algorithm works,” Pan said. But far from helping on the cost hurdles, even DeepSeek’s reported $6 million for training its R1 was “too expensive for us,” Pan added.

The main intuition behind the TinyZero project was the notion that if the task complexity was reduced alongside the model size, it would still be capable of showing the emergent reasoning behavior. These reductions would greatly lower costs, while still allowing researchers to test and observe the reasoning behavior in action. 

The AI ‘aha’ moment

To test this intuition, the team reproduced the DeepSeek R1-Zero algorithm in a math game called “Countdown,” which focuses more on the ability to reason than to find solutions based on preexisting “domain” i.e. math knowledge. To play the game, the AI needs to reach a target number. This can be achieved via addition, subtraction, multiplication, or division. 

At first, TinyZero took a random approach to finding the target number; however, with training it started to learn to adjust its approach and find better and faster solutions. And even though the task complexity and model size were reduced, the model will still be able to show emergent reasoning behavior. It learned to reason by learning to play the game, within the parameters of the game.

“We show that with a model as small as 3B, it can learn to reason about simple problems and start to learn to self-verify and search for better solutions,” Pan said. And that is a key result in both the DeepSeek R1 and OpenAI o1 releases that she said is usually known as the “Aha moment.”

While there are significant differences between the largest AI models, DeepSeek and a project like TinyZero, the emergent reasoning behavior is similar, and successes like TinyZero shows that frontier AI algorithms can be accessible to researchers, engineers, and hobbyists with limited budgets.

“Our project has attracted many people to our GitHub page and reproduce the experiments and experience the ‘aha’ moment themselves,” Pan said.

Researchers from Stanford recently released their preprint paper on experiments using the Countdown game to see how AI learns, and overcoming engineering challenges that had previously held back their progress. 

“TinyZero was great,” said Kanishk Gandhi, lead researcher of the project, since it used Countdown, a task that the Stanford team had introduced and was studying.

Open sourcing of other AI projects was also instrumental, including what’s known as the volcano engine reinforcement learning system (VERL) created by TikTok corporate parent ByteDance. “VERL was essential for running our experiments,” Gandhi said. “This alignment significantly helped us with our experiments and enabled faster iteration cycles.”

Besting ‘the big labs’ but banking on open source

The Stanford team is trying to understand why some LLMs show dramatic improvements in reasoning, while others plateau, and Gandhi says he doesn’t expect the computer science breakthroughs related to reasoning, intelligence and improvement to necessarily come from the big labs anymore. “A scientific understanding of current LLMs is missing, even within the big labs, as capabilities keep improving. There is a lot of room for DIY AI, open source and academia to contribute here,” he said.

Projects like those at Stanford and Berkeley will result in more shared development based on the research on how to train models that can improve their reasoning on their own.

But even these ultra-low-cost models are more expensive than researchers explain.

Nina Singer, senior lead machine learning scientist at AI business consultant OneSix, said the open source aspect of projects such as TinyZero rely on training atop other foundational models, which included not only VERL but Alibaba Cloud’s Qwen open-sourced LLM. “The quoted $30 training cost does not include the original training time for Qwen, which Alibaba invested millions into before releasing it as open weights,” she said. 

That’s not meant as a critique of TinyZero, said Singer, but, rather, it underscores the importance of open-weight models — which release training parameters to the public if not fully open-sourced AI data and architecture — that enable further research and innovation. 

“Smaller AI models that are tuned to specific tasks are able to rival much larger models at a fraction of the size and cost,” Singer said.

As more individuals, academics, and smaller companies expect to be able to engage with AI without requiring massive infrastructure investments, the trend of trying to mimic the performance of foundational models and tune them to specific tasks is growing. Singer cited the examples of Sky-T1, which provides the ability for users to train their own o1 for $450, and Alibaba’s Qwen, that offers AI model finetuning for as little as $6

Singer expects the open-weight models of smaller projects to push major players to adopt more open approaches. “The success of DIY fine-tuning and community-driven model improvements puts pressure on companies like OpenAI and Anthropic to justify their API-restricted models, particularly as open alternatives begin to match or exceed their capabilities in specific domains,” she said.

One of the most significant findings from TinyZero is that data quality and task-specific training matter more than sheer model size. 

“This is a major revelation because it challenges the prevailing industry belief that only massive models like ChatGPT or [Anthropic’s] Claude, with hundreds of billions of parameters, are capable of self-correction and iterative learning,” Singer said. “This project suggests that we may have already crossed the threshold where additional parameters provide diminishing returns — at least for certain tasks.”

That means the AI landscape may be shifting focus from size to efficiency, accessibility, and targeted intelligence.

Or as TinyZero’s team put it in their own words on the project page, “You can experience the Aha moment yourself for



Source link

Leave A Reply

Exit mobile version