Exploring Claude 3.7 Sonnet and Claude Code by Anthropic

We are thrilled to introduce Claude 3.7 Sonnet¹, our most sophisticated model yet and the first hybrid reasoning model of its kind available today. Claude 3.7 Sonnet is designed to generate quick, near-instantaneous answers, as well as comprehensive, step-by-step analyses that are clearly displayed for users. Those using the API can finely tune the duration of the model’s thought process.

Notably, Claude 3.7 Sonnet has made significant strides in coding and front-end web development. To enhance this capability, we’re also unveiling Claude Code, a command line tool for agentic coding. Currently available as a limited research preview, Claude Code allows developers to assign extensive engineering tasks to Claude right from their terminal.

Screen displaying Claude Code onboarding

Claude 3.7 Sonnet is now accessible through all Claude plans—including Free, Pro, Team, and Enterprise—as well as via the Anthropic API, Amazon Bedrock, and Google Cloud’s Vertex AI. The extended thinking feature is available on all platforms except the free version of Claude.

In both standard and extended thinking modes, the pricing for Claude 3.7 Sonnet remains the same as for its earlier versions: $3 per million input tokens and $15 per million output tokens, which includes tokens used for thinking.

Claude 3.7 Sonnet: Practical Frontier Reasoning

Claude 3.7 Sonnet has been developed with a unique philosophy that sets it apart from other reasoning models currently available. Similar to how humans utilize a single brain for both rapid responses and careful contemplation, we believe reasoning should be an inherent part of advanced models, rather than being relegated to separate systems. This integrated philosophy enhances the overall user experience.

This concept is reflected in several features of Claude 3.7 Sonnet. Primarily, it functions as both a standard LLM and a reasoning model, allowing users to select whether to receive immediate answers or to have the model think longer before responding. In standard mode, it offers improvements over Claude 3.5 Sonnet, while in extended thinking mode, it engages in self-reflection before articulating responses, thus enhancing its efficiency in tasks like mathematics, science, instruction-following, coding, and beyond. We generally observe similar prompting effectiveness in both modes.

Secondly, API users have the ability to manage the thinking budget; you can instruct Claude to think for a set number of tokens, with a limit up to 128K tokens. This flexibility allows for a balance of speed (and associated costs) with answer quality.

Moreover, we have tailored our reasoning model development to focus less on competitive math and computer science problems, emphasizing instead real-world applications that mirror typical business utilization of LLMs.

Initial assessments have highlighted Claude’s exceptional coding abilities across various metrics. Cursor identified Claude as a top performer for real-world coding tasks, showcasing marked enhancements in areas such as managing intricate codebases and utilizing advanced tools. Cognition found Claude significantly outperformed other models in planning code alterations and executing full-stack updates. Vercel praised Claude’s remarkable accuracy in intricate agent workflows, while Replit successfully implemented Claude to create comprehensive web applications and dashboards from the ground up, succeeding where other models faltered. Canva’s evaluations indicated that Claude consistently delivers production-ready code with superior design aesthetics, dramatically minimizing errors.

Bar chart showing Claude 3.7 Sonnet as state-of-the-art for SWE-bench Verified — Claude 3.7 Sonnet achieves state-of-the-art results on SWE-bench Verified, which assesses AI models’ proficiency in resolving actual software challenges. For further details, refer to the appendix.

Bar chart showing Claude 3.7 Sonnet as state-of-the-art for TAU-bench — Claude 3.7 Sonnet achieves leading performance on TAU-bench, a framework that evaluates AI agents on intricate real-world tasks involving user interactions and tools. Refer to the appendix for additional information.

Benchmark table comparing frontier reasoning models — Claude 3.7 Sonnet excels across various areas including instruction-following, general reasoning, multimodal capabilities, and agentic coding, with extended thinking offering a significant enhancement in math and science. It has even outperformed earlier models in our Pokémon gameplay assessments.

Introducing Claude Code

Since its launch in June 2024, Sonnet has established itself as the top choice for developers globally. Today, we’re further enhancing the developer experience with the launch of Claude Code, our initial agentic coding tool, currently available as a limited research preview.

Claude Code acts as an active collaborator, capable of searching and interpreting code, modifying files, writing and executing tests, and pushing code to GitHub, while keeping you engaged throughout the process.

Although still in its early stages, Claude Code has become essential for our team, especially for tasks like test-driven development, resolving complex bugs, and large-scale code refactoring. In preliminary tests, Claude Code accomplished tasks in a single attempt that would typically require over 45 minutes of manual effort, significantly reducing development time.

In the upcoming weeks, we intend to continuously refine it based on usage feedback: enhancing tool reliability, supporting long-running commands, improving in-app rendering, and expanding Claude’s understanding of its own capabilities.

Our objective with Claude Code is to clarify how developers utilize Claude for coding purposes, guiding future enhancements to the model. By joining this preview, you will gain access to the same advanced tools that assist us in developing and enhancing Claude, while your feedback will play a crucial role in shaping its future.

Collaboration with Claude on Your Codebase

We have also enhanced the coding experience within Claude.ai. Our GitHub integration is now accessible across all Claude plans, allowing developers to link their code repositories directly to Claude.

Claude 3.7 Sonnet is our most proficient coding model to date. By gaining deeper insights into your personal, work-related, and open-source projects, it becomes a more effective ally for resolving bugs, developing features, and creating documentation across your critical GitHub initiatives.

Commitment to Responsible Development

We have conducted comprehensive testing and evaluations of Claude 3.7 Sonnet, collaborating with external experts to ensure it adheres to our security, safety, and reliability standards. Claude 3.7 Sonnet also demonstrates enhanced ability to distinguish between harmful and benign requests, leading to a 45% reduction in unnecessary refusals compared to its predecessor.

The system card for this release provides fresh safety metrics in multiple categories, offering a clear breakdown of our Responsible Scaling Policy evaluations that can be applied by other AI labs and researchers. The card also outlines the emerging risks associated with computer usage, especially prompt injection attacks, detailing our strategies to evaluate these vulnerabilities and train Claude to counteract and manage them. Additionally, it explores potential safety advantages offered by reasoning models: the capability to understand decision-making processes and assess the trustworthiness and dependability of model reasoning. Read the full system card for an in-depth understanding.

Future Perspectives

The introduction of Claude 3.7 Sonnet and Claude Code marks a significant advancement toward AI systems that can genuinely enhance human abilities. With their capacity for deep reasoning, autonomous functioning, and effective collaboration, these tools bring us closer to a future where AI amplifies and expands what humans can accomplish.

Milestone timeline illustrating Claude's journey from assistant to pioneer

We are eager for you to explore these innovative features and can’t wait to see what you will create with them. As always, we value your feedback as we continue to refine and enhance our models.

Source link