Anthropic’s AI Can Use Your Computer – Here Is All You Need To Know

Today, October 22, 2024, Anthropic launched two big updates to its AI lineup: Claude 3.5 Sonnet and Claude 3.5 Haiku. These updates were expected by many in the AI space and bring some impressive improvements. In fact, we touched on some of these trends in our recent article The Future of AI: When Are ChatGPT 4.5, Claude 3.5 Opus, andGemini 2.0 Coming?.

The new models offer improvements in coding and tool use, which were already strong points for Claude. But the real game-changer is the computer use feature. This experimental update lets AI interact with a computer much like a human—seeing the screen, moving the cursor, clicking, and typing.

Benchmark Comparisons for Claude 3.5 Sonnet (New) and Claude 3.5 Haiku vs. Competitors

Updated Claude 3.5 Sonnet

The new Claude 3.5 Sonnet is a step up from its predecessor, especially in coding and agentic tasks. It scored 49.0% on the SWE-bench Verified benchmark, outperforming OpenAI’s o1-preview and other specialized systems. It also showed big improvements in TAU-bench, a tool use benchmark, with gains in domains like retail and airlines.

What makes Claude 3.5 Sonnet stand out, though, is its ability to use computers. This feature, now available in public beta, allows developers to direct Claude to interact with computer interfaces. It can look at a screen, move the mouse, and click buttons, effectively automating tasks like software testing, QA, and research.

While this new functionality opens exciting possibilities, it’s still experimental. Anthropic is clear that the model struggles with simple actions like scrolling and dragging, and sometimes makes unexpected choices. For example, in one demo, Claude accidentally stopped a screen recording and at another point wandered off to view photos of Yellowstone National Park during a coding task.

Companies like ReplitCanva, y Asana are already testing these capabilities. Replit, for instance, is using Claude’s computer use for an autonomous verifier that evaluates apps as they’re being built.

Detailed Benchmark Performance for New Claude 3.5 Sonnet

Confusing Branding

It’s a good point to ask why Anthropic didn’t just call it Sonnet 4 or at least Sonnet 3.6 for clarity. Given the significant upgrades, especially the groundbreaking computer use feature, the jump from Claude 3.5 Sonnet might seem more fitting with a clearer version bump.

However, it seems Anthropic might be choosing to keep the 3.5 branding to reflect the model’s continuity with its predecessors while signaling that this is an incremental step rather than a full generational leap. While the computer usefunctionality is impressive, it’s still in the experimental phase, so perhaps they’re holding off on a full version jump until that feature matures or stabilizes. For now, they might want to emphasize that the model is still part of the Claude 3 lineup, just with major enhancements.

Claude 3.5 Haiku

Anthropic also launched Claude 3.5 Haiku, the latest in their fast, cost-effective model series. Despite its speed and affordability, Claude 3.5 Haiku delivers state-of-the-art performance in coding. It even surpasses the original Claude 3.5 Sonnet in certain coding tasks, scoring 40.6% on SWE-bench Verified.

Claude 3.5 Haiku is perfect for handling large-scale data tasks, like analyzing purchase history or inventory data, without sacrificing speed or cost-efficiency. It’s expected to roll out later this month as a text-only model, with image input functionality to follow.

Detailed Benchmark Performance for Claude 3.5 Haiku

When Is Claude 3.5 Opus Coming?

As Anthropic rolls out Claude 3.5 Sonnet and Haiku, attention is now shifting toward their next major release—Claude 3.5 Opus. Expected to arrive by the end of 2024, Claude 3.5 Opus is shaping up to be the company’s most advanced AI model yet, building on the strengths of Sonnet while introducing groundbreaking new features.

Opus is rumored to bring enhanced intelligence and versatility, particularly in complex tasks like coding, data analysis, and long-form content generation. Insiders suggest it will surpass its predecessors significantly, positioning it as an essential tool for developers and enterprises that require advanced AI capabilities. This model is expected to outperform even the already impressive Claude 3.5 Sonnet, making it a key player in the AI space.

One of the most anticipated features of Claude 3.5 Opus is its expanded context window. With speculation that it could hold up to 500,000 tokens, this would make it ideal for managing tasks that require a deep understanding of large amounts of information over long interactions. This improvement will allow Opus to excel in handling long-form projects or complex tasks that require continuity and detailed context.

Another exciting possibility is the introduction of more autonomous capabilities. There are rumors that Claude 3.5 Opus will be able to manage multi-step tasks with minimal supervision, a step toward agentic AI. If true, this could allow Opus to take on more independent roles, acting as a virtual assistant that can manage projects and workflows without constant user input.

While the exact release date of Claude 3.5 Opus hasn’t been confirmed, many expect it to launch by the end of 2024. Given the success of Sonnet and Haiku, Opus is poised to make a major impact, especially for those looking for a more intelligent, context-aware, and autonomous AI solution.

The Future of AI Computer Use

The introduction of computer use in Claude 3.5 Sonnet is a major leap forward for AI. Instead of relying on specialized tools designed for specific tasks, Anthropic is teaching Claude how to perform general computer skills. This opens up a wide range of possibilities, from automating tedious workflows to tackling more creative, open-ended research tasks. With this feature, Claude can see a computer screen, move the cursor, click buttons, and type—essentially interacting with computers like a human would.

This innovation could revolutionize how we use AI in real-world applications, especially in fields that rely heavily on digital interactions, like software testing, QA processes, and even business operations that require frequent data handling.

Early-Stage, But Promising

That said, computer use is still in its experimental phase, and it’s not without its flaws. On the OSWorld benchmark, which measures how well AI can navigate computer interfaces, Claude 3.5 Sonnet scored 14.9% on screenshot-only tasks—twice as high as the next-best system. While this is an impressive start, the technology still struggles with more nuanced tasks, like scrolling, handling notifications, or reacting to short-lived actions.

Anthropic has made it clear that while the potential of this feature is enormous, it’s still in development. As a result, they encourage developers to start with low-risk tasks as they continue to fine-tune the system. The beta version of computer use is already available through Anthropic’s APIAmazon Bedrock, y Google Cloud’s Vertex AI, giving developers a chance to test the waters and explore how this groundbreaking feature could transform their workflows.

Safety and Risks

With any powerful technology, there are safety concerns. Anthropic has put measures in place to reduce risks with Claude’s computer use functionality. The company has built classifiers to prevent harmful actions and has partnered with the US AI Safety Institute and the UK Safety Institute for safety evaluations.

Claude’s ability to access computers presents potential risks, such as misuse for harmful activities. Anthropic says it will continuously evaluate and improve its safety measures. Screenshots from Claude’s interactions are retained for 30 days, allowing the company to track misuse, and additional restrictions on features or websites can be applied if necessary.

When Is Claude 3.5 Sonnet Coming to Fello AI?

For those using Fello AI, the good news is that you won’t have to wait long. Within the next 24 hours, all interactions with Claude 3.5 Sonnet on Fello AI will be upgraded to the newest version.

This means users can soon experience the enhanced coding capabilities and enhanced answering skills of newly updated Claude 3.5 Sonnet. Be ready for a smoother and more powerful AI experience with the latest Claude 3.5 Sonnet update on Fello AI.

Conclusión

Anthropic’s latest releases, Claude 3.5 Sonnet and Claude 3.5 Haiku, represent major steps forward in AI capabilities. While Claude 3.5 Sonnet’s new computer use feature is still in its experimental phase, it holds huge potential for automating tasks, conducting research, and interacting with software in a human-like way. Both models offer exciting advancements, particularly in coding and tool use, positioning them as strong contenders for developers and businesses looking for cutting-edge AI tools.

As Anthropic prepares to launch Claude 3.5 Opus by the end of the year, the AI landscape is becoming increasingly competitive. Opus is expected to bring even more intelligence, autonomy, and an expanded context window, raising the bar in the industry.

The upcoming release will likely push boundaries, particularly in handling complex multi-step tasks and deep contextual understanding. It will be interesting to see how other major players, like OpenAIMeta, y Mistral, respond with their own flagship models, as they are all preparing their next-generation AI systems to compete in this rapidly evolving space.

As these technologies develop, the race to offer the most powerful and versatile AI model continues. Whether competitors will match Anthropic’s strides with Claude 3.5 Opus remains to be seen, but it’s clear that the competition will heat up as we move toward the end of 2024.

Get Exclusive AI Tips to Your Inbox!

Stay ahead with expert AI insights trusted by top tech professionals!

es_ESEspañol