So-called reasoning AI models are becoming easier – and cheaper – to develop. On Friday, NovaSky, a team of researchers from UC Berkeley’s Sky Computing Lab, released Sky-T1-32B-Preview, a reasoning model that is competitive with the previous version of OpenAI’s o1 in several key benchmarks. Sky-T1 appears to be the first open-source reasoning model in the sense that it can be replicated from scratch; The team released the data set they used to train it as well as the necessary training code. “Indeed, the Sky-T1-32B-Preview was trained for less than $450,” the team wrote in a blog post, “demonstrating that it can affordably and efficiently replicate high-level reasoning capabilities.” $450 may not be affordable. But not so long ago, the price of training a model with the same performance often reached millions of dollars. Unlike most AIs, reasoning models effectively check their own facts, which helps them avoid some of the pitfalls that modeling typically entails. Reasoning models take longer – typically seconds to minutes – to reach a solution compared to typical non-reasoning models. The best thing is, they tend to be more reliable in fields like physics, science, and math. The NovaSky team said it used another reasoning model, Alibaba’s QwQ-32B-Preview, to generate initial training data for Sky-T1, then “curated” the data mix and used OpenAI’s GPT-4o-mini to refactor the data further. usable format. The 32-billion-parameter Sky-T1 training took about 19 hours using a rack of 8 Nvidia H100 GPUs. (The parameters roughly correspond to the model’s problem-solving skills.) According to the NovaSky team, the Sky-T1 performed better than the early preview version o1 in MATH500, a collection of “competition-level” math challenges. The model also beat the o1 preview on a set of difficult problems from LiveCodeBench, a coding evaluation. However, Sky-T1 falls short of the o1 preview in GPQA-Diamond, which contains physics, biology, and chemistry questions that a PhD graduate would be familiar with. Also important to note is that the GA OpenAI release of o1 is a more robust model than the preview version of o1, and OpenAI is expected to release an improved reasoning model, o3, in the coming weeks. But the NovaSky team says Sky-T1 only marks the beginning of a journey to develop an open source model with more advanced reasoning capabilities. “Moving forward, we will focus on developing more efficient models that maintain robust reasoning performance and explore advanced techniques that further improve model efficiency and accuracy at test time,” the team wrote in the post. “Stay tuned as we move forward on this exciting initiative.”