Eminetra.com

OpenAI launches Operators—agents that can use your computer for you

OpenAI launches Operators—agents that can use your computer for you

Like Anthropic’s Computer Use and Google DeepMind’s Mariner, Operator takes a picture of a computer screen and scans the pixels to determine possible actions. CUA, the model behind it, is trained to interact with the same graphical user interfaces – buttons, text boxes, menus – that people use when doing things online. Scan the screen, take an action, scan the screen again, take another action, and so on. That allows models to perform tasks on most websites that people use. “Traditionally the way models use software is through custom APIs,” said Reiichiro Nakano, a scientist at OpenAI. (APIs, or application programming interfaces, are pieces of code that act as a kind of connector, allowing different software to connect to one another.) That makes many apps and most websites off-limits, he said. : “But if you create a model that can use the same interface that humans use every day, it opens up some new software that was previously inaccessible.” CUA also breaks tasks down into smaller steps and tries to complete them one at a time, backing off when stuck. OpenAI says that CUA is trained using the same techniques used for the reasoning models, o1 and o3. Operators can be assigned to search for campsites in Yosemite with a good picnic table. OPENAI OpenAI has been tested by CUA against several industry benchmarks designed to assess the agent’s ability to perform tasks on the computer. The company claims that its model beats Computer Use and Mariner in everything. For example, in OSWorld, which tests how agents perform tasks such as combining PDF files or manipulating images, CUA scores 38.1% to Computer Use 22.0% In comparison, humans score 72.4%. In a benchmark called WebVoyager, which tests how well agents perform tasks in a browser, CUA scored 87%, Mariner 83.5%, and Using a Computer 56%. (Mariner can only perform tasks in the browser and therefore does not count in OSWorld.) Currently, the Operator can also only perform tasks in the browser. OpenAI plans to make more CUA capabilities available in the future through an API that other developers can use to build their own applications. This is how Anthropic released Computer Use in December. OpenAI says it has tested CUA’s security, using a red team to explore what happens when users are asked to perform unacceptable tasks (such as researching how to create a bioweapon), when websites contain hidden instructions designed to destroy them, and when models broken down. “We have trained the model to stop and ask the user for information before doing anything with external side effects,” said Casey Chu, another researcher in the team. Look! No hands To use Operator, you simply type instructions into the text box. But instead of calling the browser on the computer, the Operator sends instructions to a remote browser running on the OpenAI server. OpenAI says this makes the system more efficient. This is the main difference between Operator, Computer Use and Mariner (which runs in Google’s Chrome browser on your own computer).

Source link

Exit mobile version