Elevata

Article

Should Claude Sonnet 5 be your default model on AWS Bedrock?

Paulo Frugis
View profilePublished June 30, 20269 min read

Anthropic published Claude Sonnet 5 on June 30, 2026, and AWS announced availability on Amazon Bedrock and Claude Platform on AWS. The practical question for AWS teams is not whether the new Sonnet model is interesting. It is whether Sonnet 5 should become the default model for production workloads, where Opus 4.8 still belongs, and what must be measured before changing a real architecture.

The short answer: start testing Sonnet 5 for high-volume, repeatable, reviewable workloads. Route to Opus 4.8 when one more correct answer materially changes the outcome or when failure is expensive to fix. Do not switch production from a public benchmark alone: use your prompts, tools, data, latency, review burden, and real cost.

Before rollout, check the current AWS model card and pricing page; model availability, routing, and pricing can change.

The practical decision

Start by testing Sonnet 5 when...Route to Opus 4.8 when...Do not switch yet when...
The workload is high-volume, repeatable, and reviewable through tests, citations, schemas, or human review.The task is ambiguous, high-stakes, or expensive to correct later.You do not yet have a baseline for quality, latency, retries, and cost.
Cost and latency matter inside a product or operational workflow.Accuracy matters more than turnaround time, or the workflow is asynchronous.Regional routing, logging, IAM, or data-handling requirements are not approved.
You need a sustainable default for many similar tasks.You have a narrow premium lane with business-owner approval.The output cannot be checked before it affects users, customers, or systems.

Claude Sonnet 5 at a glance

QuestionPractical answer
What is it?Anthropic's most capable Sonnet model so far, aimed at coding, agents, and professional work at scale.
Public announcementAnthropic's post is dated June 30, 2026. AWS's Bedrock model card lists June 25, 2026 as the model launch date.
AWS availabilityAmazon Bedrock and Claude Platform on AWS. On Bedrock, paths include bedrock-runtime and bedrock-mantle.
Main Bedrock IDsanthropic.claude-sonnet-5, us.anthropic.claude-sonnet-5, and global.anthropic.claude-sonnet-5, depending on endpoint and routing choice.
Context and outputAWS's model card lists a 1M-token context window and 128K-token maximum output. Anthropic's docs say the 1M window is both the default and maximum.
PricingAWS lists promotional pricing of $2 per million input tokens and $10 per million output tokens through August 31, 2026; then $3 and $15. Anthropic lists Opus 4.8 at $5 and $25 per million input and output tokens.
First workload to testHigh-volume coding agents, review, documentation, support, analysis, and automation where Sonnet 4.6 was close but Opus was too expensive.

Sources for fast-changing facts: the AWS Bedrock model card, Amazon Bedrock pricing, and Anthropic's Sonnet 5 model notes.

What the benchmarks should and should not change

Anthropic positions Sonnet 5 as a strong improvement over Sonnet 4.6 and close to Opus 4.8 in several agent and professional-work evaluations. That is a useful signal, but it is not a deployment plan. Benchmark curves compress different concerns into one score: success rate, latency, reasoning effort, token use, prompt stability, and error recovery.

Use the benchmarks this way:

  • For default model selection: Sonnet 5 is now a credible first model for many production Bedrock workloads that previously needed an Opus trial.
  • For Opus 4.8 justification: reserve Opus for the ambiguous, high-stakes, or failure-expensive tasks where a small accuracy gain changes the business result.
  • For effort controls: do not compare only top benchmark settings. Higher reasoning effort can improve quality but also changes latency and cost.
  • For migration: re-run your real eval set. A public benchmark will not reveal your tool-call failure modes, schema breakage, retrieval drift, or human-review burden.

This is the same reason our Opus 4.8 benchmark guide argues against treating public leaderboards as procurement decisions. The ranking tells you where to investigate. It does not tell you what to ship.

What Artificial Analysis found about cost per task

Artificial Analysis published a June 30, 2026 evaluation of Claude Sonnet 5 that makes the cost story more complicated than list pricing suggests. In its methodology, Sonnet 5 scored 53 on the Intelligence Index, but at standard pricing it cost $2.29 per Intelligence Index task: about 2x Sonnet 4.6 and about 15% more than Opus 4.8.

The reason was not the list price. Sonnet 5 keeps the same standard $3/$15 per million input/output token pricing as Sonnet 4.6, below Opus 4.8's $5/$25. The difference came from usage: at max effort, Artificial Analysis found Sonnet 5 used about 40% more output tokens than Sonnet 4.6 per Intelligence Index task and about 3x the agentic turns on AA-Briefcase and GDPval-AA. On GDPval-AA, max effort used about 6x more turns than low effort.

That does not make Sonnet 5 a poor choice. Artificial Analysis found it matched or outperformed Opus 4.8 on AA-Briefcase and GDPval-AA, while Opus 4.8 stayed stronger on heavier reasoning and knowledge benchmarks. The operational lesson is narrower: during promotional pricing, Sonnet 5 may look obviously cheaper. At standard pricing from September 1, list token price and cost per completed task can point in opposite directions.

Sonnet 5 or Opus 4.8 on Bedrock?

RequirementStart by testing Sonnet 5 when...Route to Opus 4.8 when...
High-volume production agentsCost, speed, and repeatable tool use matter more than squeezing out the last few points of reasoning accuracy.The agent makes consequential decisions where one more successful resolution is worth the premium.
Engineering workflowsThe work is reviewable: refactors, tests, documentation, triage, code search, and CI-assisted fixes.The work is ambiguous, cross-system, and expensive to correct after the fact.
Document and knowledge workThe output can be checked against source documents, citations, or structured acceptance criteria.The task requires deep judgment across conflicting evidence and weak source material.
Latency-sensitive workflowsUsers wait inside a product or operational workflow and latency is part of the user experience.Accuracy is more important than turnaround time, or the workflow is asynchronous.
FinOps postureYou need a sustainable default that can scale beyond a pilot.You have a narrow premium lane with owner approval and a measured cost-per-success advantage.

The best architecture is often not one model. Use Sonnet 5 as the default execution model, then route selected cases to Opus 4.8 when the confidence score, risk class, or reviewer signal says the premium path is justified.

How to calculate cost per accepted result

Introductory pricing makes Sonnet 5 unusually attractive for experimentation through August 31, 2026, with standard pricing applying from September 1. Do not let that window hide the durable economics. Benchmark both the promotional price and the standard price so teams know whether a workload remains viable after the promotion ends.

The useful calculation is not just input and output tokens. Measure cost per accepted result: tokens, latency, effort setting, agent turns, retries, review time, tool-call failures, Opus escalations, schema rejections, and human intervention. A cheaper model can become expensive if it creates more loops. A more expensive model can be cheaper if it completes work with less review.

This is why list token price and cost per completed task can point in opposite directions. For agentic workloads, the number of turns and amount of generated output can matter as much as the published input/output token price.

Do not project cost only from old token counts either. Model and tokenizer changes can alter how much token usage the same text produces. Measure actual usage on the prompts, documents, and tools that would run in production.

What changes for AWS teams

1. Regional routing needs explicit approval

The Bedrock model card currently shows Sonnet 5 in-region access in us-east-1, plus geo and global cross-region inference options for other source regions. For Canadian and Brazilian teams, this matters: a source region appearing in the table does not automatically mean the model runs only in that country or region. Document whether the pilot can use geo or global inference before putting sensitive data into prompts.

2. Endpoint choice affects controls

Bedrock-runtime and bedrock-mantle are not interchangeable operationally. The model card separates endpoint support, APIs, feature coverage, and routing IDs. If you rely on Bedrock invocation logging, IAM conditions, guardrails, PrivateLink, or an existing gateway pattern, verify the exact endpoint path before you standardize.

3. Model IDs should be allowlisted

For production, avoid broad foundation-model/* or inference-profile/* permissions as the final posture. Allowlist the approved Sonnet 5 and escalation model IDs, include the routed backing models required by your inference profile, and deny runtime use outside approved regions. Keep Marketplace subscription, model access, and logging configuration out of day-to-day engineer runtime roles.

A practical Sonnet 5 rollout on Bedrock

  1. Pick one workload. Choose a real workflow with enough volume to expose cost and enough reviewability to catch failures.
  2. Freeze the baseline. Capture current model, prompt, success rate, token use, latency, retries, review time, and failure classes.
  3. Run Sonnet 5 against the same eval set. Use the exact model ID, region, endpoint, and effort setting you would use in production.
  4. Compare against Opus 4.8 only where the decision matters. Do not run every request through Opus just because it scores higher publicly.
  5. Set routing rules. Decide what stays on Sonnet 5, what escalates to Opus, what requires human approval, and what should not run through an LLM yet.
  6. Lock the AWS controls. IAM allowlists, region restrictions, Bedrock invocation logging where supported, budgets, anomaly alerts, and owner review should be in place before expansion.
  7. Re-price after August 31. Re-run the cost model at standard pricing before converting pilot usage into a standing operating cost.

Where Sonnet 5 fits in agentic architecture

Sonnet 5 is not just a chat model upgrade. It changes the practical cost envelope for governed AI agent sandboxes on AWS, coding assistants on Bedrock, and internal automation agents that were too expensive or too brittle on prior defaults.

That does not remove the need for architecture. Agents still need scoped tools, constrained credentials, bounded memory, audit trails, fallback behavior, and cost controls. The model got stronger; the operating model still decides whether the workflow is safe to scale.

If the agent works inside Slack or shared channels, read the Claude Tag control-boundary guide as well: Claude Tag in Slack: how it works, what it can access, and a safe AWS rollout. The same principle applies: understand the identity, data, runtime, and cost boundaries before expanding access.

FAQ

Is Sonnet 5 actually cheaper than Opus 4.8?

Not automatically. Sonnet 5 has a lower standard per-token list price than Opus 4.8, and promotional pricing makes it especially attractive through August 31, 2026. But Artificial Analysis found Sonnet 5 cost more per Intelligence Index task than Opus 4.8 at standard pricing because it used more output tokens and turns. Treat Sonnet 5 as a strong model to test, not a guaranteed cost reduction.

Does Sonnet 5 replace Opus 4.8 for coding workloads?

No. It should replace many default Sonnet 4.6 and exploratory Opus tests, but not every Opus workload. Use Sonnet 5 as the first production candidate, then route to Opus 4.8 for hard, ambiguous, or failure-expensive tasks where testing proves the premium is justified.

What happens after August 31, 2026?

The AWS pricing page states that Sonnet 5 launch pricing runs through August 31, 2026. Standard pricing applies from September 1. Any pilot cost model should include both price points.

Is Sonnet 5 available in Canada or Brazil?

Check the Bedrock model card at deployment time. The current card distinguishes in-region, geo cross-region, and global cross-region routing. Do not treat source-region availability as a residency guarantee.

Do existing IAM policies automatically cover Sonnet 5?

Only if they allow the relevant model IDs and endpoint path. Production roles should be explicit: approved model IDs, approved regions, runtime-only actions, and no Marketplace or Bedrock admin permissions in engineer runtime roles.

Should we use Claude Platform on AWS or Amazon Bedrock?

Use Bedrock when you need AWS-native integration, IAM, billing, and established Bedrock operating controls. Use Claude Platform on AWS when the native Anthropic platform experience is the main requirement and the commercial relationship still needs to run through AWS. Treat them as related options, not the same architecture.

How Elevata can help

Bring one real workload, the current model path, and your AWS constraints. Elevata can help you benchmark Sonnet 5 against your existing workflow, compare it with Opus 4.8 where needed, model the post-promotion cost, and review the Bedrock controls required for production.

Useful next reads: Amazon Bedrock consulting, AWS cost optimization, and the Opus 4.8 benchmark guide.

Talk to Elevata about a Sonnet 5 workload review.

Related

Continue reading

Related reading on this topic.