Amazon Bedrock - Model Selection Guide
Bedrock model decision map
Bedrock guide says model selection must be based on its capabilities, cost, endpoint, region and throughput.
Step 1: What are you building?
Need to build a normal chatbot / assistant?
Use Converse API
- Pick a model based on its capability, quality, cost, latency and context window.
- This provides you a AWS-native, model-agnostic chat interface.
Good candidates are,
- Amazon Nova Pro / Nova Lite / Nova 2 Lite
- Anthropic Claude models
- Meta Llama models
- Mistral models
- OpenAI gpt-oss models
- Qwen models
- DeepSeek models
- Cohere Command models
Need raw model-specific control?
Use InvokeModel API
This is the lowest level API which can be used to interact directly with models.
Good for,
- Embeddings
- Image generation
- Reranking
- Custom payloads
- Model specific parameters
- Non-chat inference
Migrating OpenAI-style applications to Bedrock?
Use bedrock-mantle endpoint with Responses API or Chat Completions API
Good models are,
- OpenAI GPT series / gpt-oss models
- Qwen
- Mistral
- MiniMax
- DeepSeek
- NVIDIA Nemotron
- Z.AI GLM
- xAI Grok
- Some Google Gemma models
Responses API is the newer and recommended for stateful / agentic / tools supported apps.
Chat Completions API is the older and simpler chat app where you maintain the histroy of the conversation.
Using Claude-native SDK or Anthropic Message Format?
Use Messaged API on bedrock-mantle and supported models are Claude.
Basically bedrock-mantle supportes Responses API, Chat Completions API and Messages API, while bedrock-runtime supports InvokeModel, Converse and BidirectionalStreams API.
Step 2: Choose model by workload
General enterprise assistant
For balanced quality and AWS-native integration use Amazon Nova Pro / Nova 2 Lite / Claude / Llama / Mistral with Converse API.
This is best when,
- You want one common interface
- You may switch models
Low-cost, fast text automation
Amazon Nova Micro / Nova Lite / Nova 2 Lite
This is useful for summarization, classification, extraction and simple Q&A type of workloads. Both Converse or InvokeModel API can be used.
Complex reasoning / coding
If you need strong reasoning or coding or software engineering then,
- Claude
- OpenAI GPT / gpt-oss
- Mistral Devstral
- Qwen Coder
- DeepSeek
- Kimi Thinking
Either Converse or Responses or Messages APIs can be used.
RAG Application
If you need to answer from the enterprise documents then use below three types of models.
- Embedding
- Optionla Reranker
- Response generation
Both InvokeModel and Converse API can be used.
Image Understanding
If you need to understand the images or documents, then prefer using below model
- Nova Lite / Nova Pro / Nova 2 Lite
- Cluade vision-capable models
- Llama vision model
- Qwen VL
- Pixtral
- Palmyra Vision
Both InvokeModel and Converse API can be used.
Image Generation / Editing
For image generation / editing,
- Amazon Nova Canvas
- Titan Image Generator
- Stability AI image models
Use InvoleModel API, useful for generating image, Inpaint, Outpaint, Remove background, Upscale, Style transfer, Search and replace and Erase object.
Video Generation
Amazon Nova Reel with StartAsyncInvoke.
Video generation is long-running. StartAsyncInvoke is for long-running requests where output is written to S3.
Video / Audio Embeddings
- TwelveLabs Marengo Embed
- Amazon Nova Multimodal Embeddings
Use:
- StartAsyncInvoke for large media workloads
- InvokeModel for smaller supported inputs
Real-time Voice Conversation
For speech-to-speech and live conversation, use Amazon Nova Sonic or Nova 2 Sonic with InvokeModelWithBidirectionalStream.
This keeps a full-duplex channel open so audio can flow both ways continuously.
Safety / Moderation
- Bedrock Guardrails - for policy enforcement around generation
- OpenAI GPT OSS Safeguard models - for moderation-style classification
Step 3: Endpoint Choice
Converse = AWS-native standard chat
InvokeModel = raw/direct model control
Responses = modern OpenAI-compatible, stateful/agentic
Chat Completions = simple OpenAI-compatible chat
Messages = Anthropic-compatible Claude style
StartAsyncInvoke = long-running media/batch jobs
BidirectionalStream = real-time voice


Comments
Post a Comment