Home Tech This researcher is using the NPR week puzzle question to model ‘balance’

This researcher is using the NPR week puzzle question to model ‘balance’

201
0
This researcher is using the NPR week puzzle question to model ‘balance’

Daily Sunday, NPR hosts will be shortz, the fresh crop puzzle in New York Times, will get the thousands of listeners in a long segment called Sunday called Sunday. When it is written to be ignored without a forewledge too much, Brainteaser is usually challenging even when for skilled contestants. Therefore some experts think of the way I promise to try limits to solve ai problem. In recent research, hailing researchers of wellesley College, Oberlin College, University of Texas, and early arms use AI, use episodes puzzle days. The team says that the test can adopt the insights, as called a model called – OpenAI’s O1, occasionally “give up” and give the answers you know. “We want to develop a benchmark with a human being able to understand humanity,” the author to the north-east, telling techcrrunch. A little AI industry about Benchmarking Quandary the moment. Most tests are usually used to evaluate the AI ​​model to try, as an average user, that it is not worth the benchmark – even benchmarks released by receiptly closer to a saturated point. The advantage The general radio queries like a week’s puzzle is not trying to knowledge, and the challenge is projected, explaining what the problem is worthy to make a matter of relation to the problem until you complete – it’s When all clicking all, “guha’s said.” That requires a combination of insights and processes processes. “No perfect benchmark, of course. Sunday, our-centric and English-only. And because the quiz is available in general, it may be the model that is trained in her and can “lie” with the understanding, even if the guha says he has not seen this evidence. “The new question was released every week, and we can expect the latest question to be invisible,” he said. “We want to maintain fresh benches and tracks my model’s model way.” The benchmark researchers, consisting of about 600 Sunday, consideration such as R1 and other rest furuses. Model experimentally to check themselves before giving results, which helps them avoid pitfalls that are usually trip model AI. The trading is the model model takes longer to achieve solutions – usually seconds to minutes longer. At least there is at least a model, R1 EFSEEK, gives you the wrong solution know that there is a puzzle question on the week. R1 will be the Verbatim country “I give up,” followed by the wrong answer to be selected randomly – this human behavior can be relevant. The model made another weird choice, as giving the wrong answer just to quickly retreat, try to tease the better, and failed again. He also stuck “thinks” forever and give them an existence for the answers, or come to the correct answer, but it cannot consider an unclear alternative answer. “In difficult matter, the R1 literally says that it’s gaining ‘frustration,'” Guha says. “You are funny to see how the model is said. Staying the ‘frustration’ with the quality of the model.” R1 Get “frustration on the puzzle day of week set. Credits: Guha et al. The best model is now on the benchmark is O1 with a 59%-mini-mini-minute attempt to be released “(R1 print 35%.) Is a plan to spread the trial for additional models, who hope it will be able to identify an area that can be found. The team model score on benchmark. , “Guhakan said.” The Branch’s signature with a wider access allows more researchers to know and analyze the results, which can cause better solutions in the future. Furthermore, as the model of these countries increased in the suitables that everyone can affect everyone, we believe that everyone should know what the model is – and not – no. ”

Source link