An OpenAI agent tool may soon be released

user

1 day ago

An OpenAI agent tool may soon be released

OpenAI may be about to release an AI tool that can take control of your PC and perform actions for you. Tibor Blaho, a software engineer with a reputation for leaking upcoming AI products, claims to have found evidence of OpenAI’s long-rumored Operator tool. Publications including Bloomberg have previously reported on Operator, a so-called “agent” system that can handle autonomous tasks like writing codes and booking travel. According to Information, OpenAI is targeting January as the release month of Operator. The code discovered by Blaho this weekend adds credence to the report. OpenAI’s ChatGPT client for macOS has gained an option, hidden for now, to assign shortcuts to “Toggle Operator” and “Force Quit Operator,” per Blaho. And OpenAI has added a reference to the Operator on its website, Blaho said — although it’s a reference that hasn’t been seen publicly. Confirmed – macOS ChatGPT desktop app has hidden options to set shortcuts for desktop launcher to “Toggle Operator” and “Force Quit Operator” https://t.co/rSFobi4iPN pic.twitter.com/j19YSlexAS – Tibor Blaho (@btibor91 ) January 19, 2025 According to Blaho, the OpenAI site also contains a table that has not yet been published comparing the performance of Operators with others. AI systems use computers. Tables may also be placeholders. But if the numbers are accurate, they suggest that the Operator is not 100% reliable, depending on the task. The OpenAI website already has references for Operators / OpenAI CUA (Computer Usage Agents) – “Operator System Card Table”, “Operator Research Eval Table” and “Operator Rejection Rate Table” Including comparisons of Claude 3.5 Sonnet Computer Usage, Google Mariner, etc. (table preview… pic.twitter.com/OOBgC3ddkU — Tibor Blaho (@btibor91) January 20, 2025 At OSWorld, a benchmark that tries to simulate a real computer environment, “OpenAI Computer Use Agent (CUA)” – may be an AI model that powers Operator – score 38.1%, ahead of Anthropic computer control model but less than 72.4% human score OpenAI CUA surpasses human performance in WebVoyager, which evaluates AI’s ability to navigate and interact with websites, but the model scores less than human-level benchmarks on another web-based benchmark, WebArena, according to which the leaked operator also struggles with the task, if the leak is to be believed. that Operator is tasked with signing up with a cloud provider and launching a virtual machine, Operator is only successful 60% of the time.Operator is tasked to create a Bitcoin wallet only 10% of the time. OpenAI’s imminent entry into the AI agent space comes as competitors including Anthropic, Google, and others mentioned above, make a play for a whole new segment. AI agents may be risky and speculative, but tech giants are already touting them as the next big thing in AI. According to analytics firm Markets and Markets, the market for AI agents could be worth $47.1 billion by 2030. Today’s agents are rather primitive. But some experts are concerned about safety, if the technology is improving quickly. One of the leaked charts shows the Operator doing well in selected safety evaluations, including tests that try to prevent systems from performing “illegal activities” and searching for “sensitive personal data.” Reportedly, safety testing is one of the reasons for the operator’s long development cycle. In a recent X post, OpenAI co-founder Wojciech Zaremba criticized Anthropic for releasing agents that he claimed lacked security mitigations. “I can only imagine the negative reaction if OpenAI makes a similar release,” Zaremba wrote. It is worth noting that OpenAI has been criticized by AI researchers, including former staff, for allegedly not emphasizing safety work to produce the technology quickly.

Source link