July 04, 2026

Amazon Bedrock - Model Selection Guide

Bedrock model decision map

Bedrock guide says model selection must be based on its capabilities, cost, endpoint, region and throughput.

Step 1: What are you building?

Need to build a normal chatbot / assistant?

Use Converse API

Pick a model based on its capability, quality, cost, latency and context window.
This provides you a AWS-native, model-agnostic chat interface.

Good candidates are,

Amazon Nova Pro / Nova Lite / Nova 2 Lite
Anthropic Claude models
Meta Llama models
Mistral models
OpenAI gpt-oss models
Qwen models
DeepSeek models
Cohere Command models

Need raw model-specific control?

Use InvokeModel API

This is the lowest level API which can be used to interact directly with models.

Good for,

Embeddings
Image generation
Reranking
Custom payloads
Model specific parameters
Non-chat inference

Migrating OpenAI-style applications to Bedrock?

Use bedrock-mantle endpoint with Responses API or Chat Completions API

Good models are,

OpenAI GPT series / gpt-oss models
Qwen
Mistral
MiniMax
DeepSeek
NVIDIA Nemotron
Z.AI GLM
xAI Grok
Some Google Gemma models

Responses API is the newer and recommended for stateful / agentic / tools supported apps.

Chat Completions API is the older and simpler chat app where you maintain the histroy of the conversation.

Using Claude-native SDK or Anthropic Message Format?

Use Messaged API on bedrock-mantle and supported models are Claude.

Basically bedrock-mantle supportes Responses API, Chat Completions API and Messages API, while bedrock-runtime supports InvokeModel, Converse and BidirectionalStreams API.

Step 2: Choose model by workload

General enterprise assistant

For balanced quality and AWS-native integration use Amazon Nova Pro / Nova 2 Lite / Claude / Llama / Mistral with Converse API.

This is best when,

You want one common interface
You may switch models

Low-cost, fast text automation

Amazon Nova Micro / Nova Lite / Nova 2 Lite

This is useful for summarization, classification, extraction and simple Q&A type of workloads. Both Converse or InvokeModel API can be used.

Complex reasoning / coding

If you need strong reasoning or coding or software engineering then,

Claude
OpenAI GPT / gpt-oss
Mistral Devstral
Qwen Coder
DeepSeek
Kimi Thinking

Either Converse or Responses or Messages APIs can be used.

RAG Application

If you need to answer from the enterprise documents then use below three types of models.

Embedding
Optionla Reranker
Response generation

Both InvokeModel and Converse API can be used.

Image Understanding

If you need to understand the images or documents, then prefer using below model

Nova Lite / Nova Pro / Nova 2 Lite
Cluade vision-capable models
Llama vision model
Qwen VL
Pixtral
Palmyra Vision

Both InvokeModel and Converse API can be used.

Image Generation / Editing

For image generation / editing,

Amazon Nova Canvas
Titan Image Generator
Stability AI image models

Use InvoleModel API, useful for generating image, Inpaint, Outpaint, Remove background, Upscale, Style transfer, Search and replace and Erase object.

Video Generation

Amazon Nova Reel with StartAsyncInvoke.

Video generation is long-running. StartAsyncInvoke is for long-running requests where output is written to S3.

Video / Audio Embeddings

TwelveLabs Marengo Embed
Amazon Nova Multimodal Embeddings

Use:

StartAsyncInvoke for large media workloads
InvokeModel for smaller supported inputs

Real-time Voice Conversation

For speech-to-speech and live conversation, use Amazon Nova Sonic or Nova 2 Sonic with InvokeModelWithBidirectionalStream.

This keeps a full-duplex channel open so audio can flow both ways continuously.

Safety / Moderation

Bedrock Guardrails - for policy enforcement around generation
OpenAI GPT OSS Safeguard models - for moderation-style classification

Step 3: Endpoint Choice

Converse = AWS-native standard chat

InvokeModel = raw/direct model control

Responses = modern OpenAI-compatible, stateful/agentic

Chat Completions = simple OpenAI-compatible chat

Messages = Anthropic-compatible Claude style

StartAsyncInvoke = long-running media/batch jobs

BidirectionalStream = real-time voice

Search This Blog

Karthik Venkatesalu