[LINK] Berkeley researchers replicate DeepSeek R1 for $30

Sun Feb 2 01:30:43 AEDT 2025

DeepSeek R1 reproduced for $30: 

Berkeley researchers replicate DeepSeek R1 for $30 — casting doubt on H100 claims and controversy

Nickie Louise Posted On January 31, 2025  https://techstartups.com/2025/01/31/deepseek-r1-reproduced-for-30-berkeley-researchers-replicate-deepseek-r1-for-30-casting-doubt-on-h100-claims-and-controversy/

The rise of Chinese AI startup DeepSeek has been nothing short of remarkable. After surpassing ChatGPT on the App Store, DeepSeek sent shockwaves to the tech world, triggering a frenzy in the market. 

But the attention hasn’t all been positive. DeepSeek’s website faced an attack that forced the company to suspend registrations, and some skeptics questioned whether the startup had relied on export-restricted Nvidia H100 chips rather than the H800 chips it claimed to use—raising concerns about compliance and cost efficiency.

Now, a breakthrough from researchers at the University of California, Berkeley, is challenging some of these assumptions. 

A team led by Ph.D. candidate Jiayi Pan has managed to replicate DeepSeek R1-Zero’s core capabilities for less than $30—less than the cost of a night out. Their research could spark a new era of small model RL revolution.

Their findings suggest that sophisticated AI reasoning doesn’t have to come with a massive price tag, potentially shifting the balance between AI research and accessibility.

Berkeley Researchers Recreate DeepSeek R1 for Just $30—A Challenge to H100 Narrative

The Berkeley team says they worked with a 3-billion-parameter language model from DeepSeek, training it through reinforcement learning to develop self-verification and search abilities. The goal was to solve arithmetic-based challenges by reaching a target number—an experiment they managed to complete for just $30. 

By comparison, OpenAI’s o1 APIs cost $15 per million input tokens—more than 27 times the price of DeepSeek-R1, which runs at just $0.55 per million tokens. Pan sees this project as a step toward lowering the barrier to reinforcement learning scaling research, especially given its minimal cost.

But not everyone is on board. Machine learning expert Nathan Lambert questions DeepSeek’s claim that training its 671-billion-parameter model only costs $5 million. 

He argues that the figure likely excludes key expenses such as research personnel, infrastructure, and electricity. His estimates put DeepSeek AI’s annual operating costs somewhere between $500 million and over $1 billion. 

Even so, the achievement stands out—especially considering that top U.S. AI firms are pouring $10 billion a year into their AI efforts.

Breaking Down the Experiment: Small Models, Big Impact

According to Jiayi Pan’s post on Nitter, the team successfully reproduced DeepSeek R1-Zero using a small language model with 3 billion parameters. 

Running reinforcement learning on the Countdown game, the model developed self-verification and search strategies—key abilities in advanced AI systems.

Key takeaways from their work:

    They successfully reproduced DeepSeek R1-Zero’s methods for under $30.
    Their 1.5-billion-parameter model demonstrated advanced reasoning skills.
    Performance was on par with larger AI systems.

    “We reproduced DeepSeek R1-Zero in the CountDown game, and it just works. Through RL, the 3B base LM develops self-verification and search abilities all on its own. You can experience the Ahah moment yourself for < $30,” Pan said on X.

We reproduced DeepSeek R1-Zero in the CountDown game, and it just works

Through RL, the 3B base LM develops self-verification and search abilities all on its own

You can experience the Ahah moment yourself for < $30
Code: https://t.co/B2IsN1PrXV

Here’s what we learned 🧵 pic.twitter.com/43BVYMmS8X

— Jiayi Pan (@jiayi_pirate) January 24, 2025

Reinforcement Learning Breakthrough

The researchers began with a base language model, a structured prompt, and a ground-truth reward. They then introduced reinforcement learning through Countdown, a logic-based game adapted from a British TV show. In this challenge, players must reach a target number using arithmetic operations—a setup that encourages AI models to refine their reasoning skills.

Initially, the AI produced random answers. Through trial and error, it began verifying its own responses, adjusting its approach with each iteration—mirroring how humans solve problems. Even the smallest 0.5-billion-parameter model could only make simple guesses, but once scaled to 1.5 billion and beyond, the AI started exhibiting more advanced reasoning.

    “We reproduced DeepSeek R1-Zero in the CountDown game, and it just works. Through RL, the 3B base LM develops self-verification and search abilities all on its own You can experience the Ahah moment yourself for < $30

    https://github.com/Jiayi-Pan/TinyZero

    Here’s what we learned,” Pan said in a post on Nitter

Surprising Discoveries

One of the most interesting findings was how different tasks led the model to develop distinct problem-solving techniques. In Countdown, it refined its search and verification strategies, learning to iterate and improve its answers. When tackling multiplication problems, it applied the distributive law—breaking numbers down much like humans do when solving complex calculations mentally.

Another notable finding was that the choice of reinforcement learning algorithm—whether PPO, GRPO, or PRIME—had little impact on overall performance. The results were consistent across different methods, suggesting that structured learning and model size play a greater role in shaping AI capabilities than the specific algorithm used. 

This challenges the notion that sophisticated AI requires vast computational resources, demonstrating that complex reasoning can emerge from efficient training techniques and well-structured models.

A key takeaway from the research was how the model adapted its problem-solving techniques based on the task at hand.

Smarter AI Through Task-Specific Learning

One of the most interesting takeaways is how the AI adapted to different challenges. For the Countdown game, the model learned search and self-verification techniques. When tested with multiplication problems, it approached them differently—using the distributive law to break down calculations before solving them step by step.

Instead of blindly guessing, the AI refined its approach over multiple iterations, verifying and revising its own answers until it landed on the correct solution. This suggests that models can evolve specialized skills depending on the task, rather than relying on a one-size-fits-all reasoning method.

A Shift in AI Accessibility

With the full project costing less than $30 and the code publicly available on GitHub, this research makes advanced AI more accessible to a wider range of developers and researchers. It challenges the notion that groundbreaking progress requires billion-dollar budgets, reinforcing the idea that smart engineering can often outpace brute-force spending.

This work reflects a vision long championed by Richard Sutton, a leading figure in reinforcement learning, who argued that simple learning frameworks can yield powerful results. The Berkeley team’s findings suggest he was right—complex AI capabilities don’t necessarily require massive-scale computing, just the right training environment.

Conclusion

As AI development accelerates, breakthroughs like this could reshape how researchers think about efficiency, cost, and accessibility. What started as an effort to understand DeepSeek’s methods may end up setting new standards for the field.

 --