Home Tech DeepSeek claims its reasoning model beats OpenAI’s o1 in certain benchmarks

Tech

DeepSeek claims its reasoning model beats OpenAI’s o1 in certain benchmarks

January 20, 2025

1133

Chinese AI lab DeepSeek has released an open source version of DeepSeek-R1, a so-called reasoning model, which it claims performs as well as OpenAI’s o1 in certain AI benchmarks. R1 is available from AI dev platform Hugging Face under the MIT license, meaning it can be used commercially without restrictions. According to DeepSeek, R1 beats o1 in AIME, MATH-500, and SWE-bench Verified benchmarks. AIME uses other models to evaluate model performance, while MATH-500 is a collection of word problems. SWE-bench Verified, meanwhile, focuses on programming tasks. Being a reasoning model, R1 effectively checks its own facts, which helps to avoid some of the pitfalls that are common in modeling. Reasoning models take longer – typically seconds to minutes – to reach a solution compared to typical nonreasoning models. The upside is that they tend to be more reliable in domains such as physics, science, and math. R1 contains 671 billion parameters, DeepSeek revealed in a technical report. Parameters are roughly proportional to the model’s problem-solving skill, and models with more parameters generally perform better than models with fewer parameters. 671 billion parameters is huge, but DeepSeek also released a “refined” version of R1 ranging in size from 1.5 billion parameters to 70 billion parameters. The smallest can run on a laptop. As for the full R1, it requires beefier hardware, but is available through DeepSeek’s API at prices 90%-95% lower than OpenAI’s o1. There are drawbacks to R1. Being a Chinese model, it is subject to benchmarking by China’s internet regulator to ensure that its response “represents core socialist values.” R1 won’t answer questions about Tiananmen Square, for example, or Taiwanese autonomy. Filter R1 in action. Image Credit: DeepSeek Many Chinese AI systems, including other reasoning models, refuse to respond to topics that could anger the country’s regulators, such as speculation about Xi Jinping’s regime. R1 comes days after the outgoing Biden administration proposed tougher export rules and restrictions on AI technology for Chinese ventures. Companies in China have been prevented from buying advanced AI chips, but if the new rules go into effect as written, companies will face tighter caps on the semiconductor technology and models needed to bootstrap advanced AI systems. In a policy document last week, OpenAI called on the US government to support US AI development, lest the Chinese model fall short of or exceed its capabilities. In an interview with Information, OpenAI’s VP of policy Chris Lehane singled out High Flyer Capital Management, DeepSeek’s parent company, as the organization concerned. So far, at least three Chinese laboratories – DeepSeek, Alibaba, and Kimi, owned by the Chinese unicorn Moonshot AI – have produced models that they claim compete with o1. (Note, DeepSeek was the first — announcing an R1 preview in late November.) In a post on X, Dean Ball, an AI researcher at George Mason University, said the trend suggests China’s AI labs will continue to be “fast followers.” “Impressive performance of the DeepSeek flute model […] meaning that qualified reasoning will continue to evolve and work on local hardware,” Ball writes, “away from top-down control regimes.”

Source link

DeepSeek claims its reasoning model beats OpenAI’s o1 in certain benchmarks

LEAVE A REPLY

APPLICATIONS

Unable to unpack, ‘Christians, and Caviar Bumps

Review Bitwarden: Passwords are best

Microsoft is reported to reinforce the data center plan

Doge tried to give gifts from a building $ 500 million, nirking the file

HOT NEWS

‘Airline theory’ will make you miss your flight

EVEN MORE NEWS

Common Medical Billing Mistakes Every Practice Must Avoid

CEO Shopify tells the team to consider using AI before your...

A AI AI mode now allows users to ask questions about...

POPULAR CATEGORY

A Mini Crossword Answer on March 10

Want to go free free? The test of water by leasing...

The Prediction Market How to Help Klshi Go Go Mainstream