Amazon Bedrock - Model Selection Guide

Bedrock model decision map

Bedrock guide says model selection must be based on its capabilities, cost, endpoint, region and throughput.

Step 1: What are you building?

Need to build a normal chatbot / assistant?

Use Converse API

  • Pick a model based on its capability, quality, cost, latency and context window.
  • This provides you a AWS-native, model-agnostic chat interface.
Good candidates are,
  • Amazon Nova Pro / Nova Lite / Nova 2 Lite
  • Anthropic Claude models
  • Meta Llama models
  • Mistral models
  • OpenAI gpt-oss models
  • Qwen models
  • DeepSeek models
  • Cohere Command models

Need raw model-specific control?

Use InvokeModel API

This is the lowest level API which can be used to interact directly with models.

Good for,
  • Embeddings
  • Image generation
  • Reranking
  • Custom payloads
  • Model specific parameters
  • Non-chat inference

Migrating OpenAI-style applications to Bedrock?

Use bedrock-mantle endpoint with Responses API or Chat Completions API

Good models are,
  • OpenAI GPT series / gpt-oss models
  • Qwen
  • Mistral
  • MiniMax
  • DeepSeek
  • NVIDIA Nemotron
  • Z.AI GLM
  • xAI Grok
  • Some Google Gemma models
Responses API is the newer and recommended for stateful / agentic / tools supported apps.
Chat Completions API is the older and simpler chat app where you maintain the histroy of the conversation.

Using Claude-native SDK or Anthropic Message Format?

Use Messaged API on bedrock-mantle and supported models are Claude.

Basically bedrock-mantle supportes Responses API, Chat Completions API and Messages API, while bedrock-runtime supports InvokeModel, Converse and BidirectionalStreams API.

Step 2: Choose model by workload

General enterprise assistant

For balanced quality and AWS-native integration use Amazon Nova Pro / Nova 2 Lite / Claude / Llama / Mistral with Converse API.

This is best when,
  • You want one common interface
  • You may switch models

Low-cost, fast text automation

Amazon Nova Micro / Nova Lite / Nova 2 Lite

This is useful for summarization, classification, extraction and simple Q&A type of workloads. Both Converse or InvokeModel API can be used.

Complex reasoning / coding 

If you need strong reasoning or coding or software engineering then,
  • Claude 
  • OpenAI GPT / gpt-oss
  • Mistral Devstral
  • Qwen Coder
  • DeepSeek
  • Kimi Thinking
Either Converse or Responses or Messages APIs can be used.

RAG Application

If you need to answer from the enterprise documents then use below three types of models.
  • Embedding
  • Optionla Reranker
  • Response generation
Both InvokeModel and Converse API can be used.

Image Understanding

If you need to understand the images or documents, then prefer using below model
  • Nova Lite / Nova Pro / Nova 2 Lite
  • Cluade vision-capable models
  • Llama vision model
  • Qwen VL
  • Pixtral
  • Palmyra Vision
Both InvokeModel and Converse API can be used.

Image Generation / Editing

For image generation / editing,
  • Amazon Nova Canvas
  • Titan Image Generator
  • Stability AI image models
Use InvoleModel API, useful for generating image, Inpaint, Outpaint, Remove background, Upscale, Style transfer, Search and replace and Erase object.

Video Generation

Amazon Nova Reel with StartAsyncInvoke.

Video generation is long-running. StartAsyncInvoke is for long-running requests where output is written to S3.

Video / Audio Embeddings

  • TwelveLabs Marengo Embed
  • Amazon Nova Multimodal Embeddings
Use:
  • StartAsyncInvoke for large media workloads
  • InvokeModel for smaller supported inputs

Real-time Voice Conversation

For speech-to-speech and live conversation, use Amazon Nova Sonic or Nova 2 Sonic with InvokeModelWithBidirectionalStream.

This keeps a full-duplex channel open so audio can flow both ways continuously.

Safety / Moderation

  • Bedrock Guardrails - for policy enforcement around generation
  • OpenAI GPT OSS Safeguard models - for moderation-style classification

Step 3: Endpoint Choice

Converse = AWS-native standard chat
InvokeModel = raw/direct model control
Responses = modern OpenAI-compatible, stateful/agentic
Chat Completions = simple OpenAI-compatible chat
Messages = Anthropic-compatible Claude style
StartAsyncInvoke = long-running media/batch jobs
BidirectionalStream = real-time voice

Comments

Popular Posts