Eminetra.com

People use Super Mario to the Benchmark AI now

People use Super Mario to the Benchmark AI now

Pokémon’s thoughts are angel benchmark for AI? One group of researchers argue that Super Mario Bros. Even so much difficult. Hao Ai Lab, ORG research at the University of San Diego’s California, on Friday spending Super Mario Bros. Claude antropropic 3.7 do the best, followed by a 3.5 claude. Gemini 1.5 Pro and openai GPT-4O struggled. There isn’t enough version of Super Mario Bros. is the original 1985 release, so clear. The game runs in the emulator and integrate with the framework, gamingagert, to give the AIS control during Mario. File: Gamingagent Hao, who developed at home, “If obstacles or enemies close, move / jump / jump to dodge” and Screenshot in the game. Ai then produces inputs in the form of Python code to control Mario. Still, Hao says the game pressure each model for “Learn” to design complex maneuvers and create boarding strategies. Very good, the laboratory finds that the O1 model, the “think” through the problem, in a worse way, although it is generally more powerful in most. One of the main reasons why the model has a problem to play real-time games such as that time, she takes time – seconds, usually – to decide. The Super Mario Bros, it’s all. The second can mean the difference between safe and plummet until death. The game has been used for benchmark AI for several decades. But some experts have asked about wisdom drawing the connection between game skills and AI technology. Not like the real world, the game tends to be abstract and simple enough, and give the amount of infinite data to practice AI. Bencharky New Flashy shows what andre karpathy, research scientists and users of the opequai, called “the evaluation crisis.” “I don’t know what [AI] Metrics look now, “he wrote in post on X.” Tldr me reactions I don’t know how this model is now. “At least I can watch the AI ​​Play Mario.

Source link

Exit mobile version