Researchers Replicated DeepSeek’s R1-Zero Model for Just $30

In an impressive demonstration of cost-effective AI research, a group of researchers has successfully replicated DeepSeek’s R1-Zero model for just $30.

Dubbed TinyZero, this project focuses on countdown and multiplication tasks, leveraging reinforcement learning (RL) to enable a 3-billion-parameter (3B) base language model (LM) to develop self-verification and search abilities autonomously.

Built on the veRL framework, TinyZero showcases how reinforcement learning can help large language models (LLMs) evolve reasoning capabilities independently.

SIEM as a Service

The researchers behind this project highlight an “Aha!” moment that users can experience firsthand with minimal computational costs.

For those interested in exploring the methodology, a detailed experiment log is available on Weights & Biases, with further insights shared in a Twitter thread. The team has also confirmed that a formal research paper is forthcoming.

The research team selected the “countdown game” as their test environment, a mathematical challenge where the AI generates equations from a set of numbers to reach a specific target.

This game is ideal for testing problem-solving capabilities, as it requires logical reasoning and strategic trial-and-error to improve over time. Initially, the model produced random outputs with no clear strategy.

However, through reinforcement learning, it gradually refined its approach, developing logical reasoning skills independently.

Running TinyZero: Installation and Setup

To replicate TinyZero, users can follow a straightforward setup process:

Installation Steps

Create Environment:conda create -n zero python=3.9
Install Torch (optional):pip install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu121
Install vLLM:pip3 install vllm==0.6.3
Install veRL and Dependencies:pip install -e . pip3 install flash-attn --no-build-isolation pip install wandb IPython matplotlib

Countdown Task: Training TinyZero

Data Preparation

Activate the environment and preprocess the dataset:

conda activate zero
python ./examples/data_preprocess/countdown.py --local_dir {path_to_your_dataset}

Training on a Single GPU

For models up to 1.5B parameters, a single GPU setup works effectively:

export N_GPUS=1
export BASE_MODEL={path_to_your_model}
export DATA_DIR={path_to_your_dataset}
export ROLLOUT_TP_SIZE=1
export EXPERIMENT_NAME=countdown-qwen2.5-0.5b
export VLLM_ATTENTION_BACKEND=XFORMERS

bash ./scripts/train_tiny_zero.sh

Scaling Up: Training a 3B+ Model

For larger models that exhibit more advanced reasoning skills, a two-GPU configuration is recommended:

export N_GPUS=2
export BASE_MODEL={path_to_your_model}
export DATA_DIR={path_to_your_dataset}
export ROLLOUT_TP_SIZE=2
export EXPERIMENT_NAME=countdown-qwen2.5-3b
export VLLM_ATTENTION_BACKEND=XFORMERS

bash ./scripts/train_tiny_zero.sh

Instruct Ablation: Experimenting with Qwen-2.5-3B

The team also experimented with an instruction-tuned version of Qwen-2.5-3B. This requires additional data preprocessing:

conda activate zero
python examples/data_preprocess/countdown.py --template_type=qwen-instruct --local_dir={path_to_your_dataset}

Training follows a similar two-GPU setup:

export N_GPUS=2
export BASE_MODEL={path_to_your_model}
export DATA_DIR={path_to_your_dataset}
export ROLLOUT_TP_SIZE=2
export EXPERIMENT_NAME=countdown-qwen2.5-3b-instruct
export VLLM_ATTENTION_BACKEND=XFORMERS

bash ./scripts/train_tiny_zero.sh

TinyZero was developed based on the veRL framework and employs the Qwen2.5 series base models. The research team, comprising Jiayi Pan, Junjie Zhang, Xingyao Wang, Lifan Yuan, Hao Peng, and Alane Suhr, has made the project open-source, accessible on GitHub here.

With the success of TinyZero, this experiment demonstrates that state-of-the-art AI capabilities can be developed and studied on a remarkably small budget, potentially paving the way for more affordable AI research.

Find this News Interesting! Follow us on Google News, LinkedIn, and X to Get Instant Updates

Source link

Researchers Replicated DeepSeek’s R1-Zero Model for Just $30

Running TinyZero: Installation and Setup

Installation Steps

Countdown Task: Training TinyZero

Data Preparation

Training on a Single GPU

Scaling Up: Training a 3B+ Model

Instruct Ablation: Experimenting with Qwen-2.5-3B

Read Next

New Windows-Based DarkCloud Stealer Attacking Computers to Steal Login Credentials and Financial Data

Axis Camera Server Vulnerabilities Exposes Thousands of Organizations to Attack

VexTrio TDS System Developing Several Malicious Apps Mimic as VPNs to Publish in Google Play and App Store

What’s New With the Next-Generation AI Agent

RubyGems Malware Attack Weaponizes 60+ Packages to Steal Credentials from Social Media and Marketing Tools

PyPI Released Advisory to Prevent ZIP Parser Confusion Attacks on Python Package Installers

US Confirms Shutdown of BlackSuit Ransomware That Hacked Over 450 Organizations

Columbia University Data Breach – Hackers Stolen 870,000 Individuals Personal and Financial Data

Windows User Account Control Bypassed Using Character Editor to Escalate Privileges

Exploiting ECS Protocol on EC2 to Exfiltrate Cross-Task IAM and Execution Role Credentials

New Windows-Based DarkCloud Stealer Attacking Computers to Steal Login Credentials and Financial Data

Axis Camera Server Vulnerabilities Exposes Thousands of Organizations to Attack

VexTrio TDS System Developing Several Malicious Apps Mimic as VPNs to Publish in Google Play and App Store

What’s New With the Next-Generation AI Agent

RubyGems Malware Attack Weaponizes 60+ Packages to Steal Credentials from Social Media and Marketing Tools

PyPI Released Advisory to Prevent ZIP Parser Confusion Attacks on Python Package Installers

US Confirms Shutdown of BlackSuit Ransomware That Hacked Over 450 Organizations

Columbia University Data Breach – Hackers Stolen 870,000 Individuals Personal and Financial Data

Windows User Account Control Bypassed Using Character Editor to Escalate Privileges

Exploiting ECS Protocol on EC2 to Exfiltrate Cross-Task IAM and Execution Role Credentials

Running TinyZero: Installation and Setup

Installation Steps

Countdown Task: Training TinyZero

Data Preparation

Training on a Single GPU

Scaling Up: Training a 3B+ Model

Instruct Ablation: Experimenting with Qwen-2.5-3B

Read Next

New Windows-Based DarkCloud Stealer Attacking Computers to Steal Login Credentials and Financial Data

Axis Camera Server Vulnerabilities Exposes Thousands of Organizations to Attack

VexTrio TDS System Developing Several Malicious Apps Mimic as VPNs to Publish in Google Play and App Store

What’s New With the Next-Generation AI Agent

RubyGems Malware Attack Weaponizes 60+ Packages to Steal Credentials from Social Media and Marketing Tools

PyPI Released Advisory to Prevent ZIP Parser Confusion Attacks on Python Package Installers

US Confirms Shutdown of BlackSuit Ransomware That Hacked Over 450 Organizations

Columbia University Data Breach – Hackers Stolen 870,000 Individuals Personal and Financial Data

Windows User Account Control Bypassed Using Character Editor to Escalate Privileges

Exploiting ECS Protocol on EC2 to Exfiltrate Cross-Task IAM and Execution Role Credentials

Related Articles