Model Selection Guide

Choose the best model for the task

Choosing the right model for your agent directly impacts quality, speed, cost, and reliability. Different models perform better on different task types, such as tool calling, structured outputs, long-context reasoning, or content generation. Selecting the right default model ensures predictable performance in production while keeping token costs in check. The goal is not to use the most powerful model by default, but to use the most suitable one for the job.


Core Principles

  1. Start with a cost-effective default and escalate only when needed
    Begin with smaller models for standard tasks. If evaluation shows insufficient quality or reasoning depth, move to a more advanced model. Scale model capability based on actual need, not by default.

  2. Match the model to the task type
    Different models vary in strengths across:

    • Tool calling and hub queries
    • Long-context analysis
    • Structured outputs (e.g. JSON, tables)
    • Narrative generation (summaries, memos)
    • Code or document generation

Select the model that aligns with the dominant requirement of the agent’s workflow.

  1. Optimize for consistent performance, not theoretical perfection
    Every model will occasionally fail in edge cases. Prioritize stable, high-quality performance across realistic test scenarios instead of trying to prevent every possible error. In production, consistent behavior matters more than rare moments of exceptional output.

Task to Model Mapping

This table provides guidance on mapping common agent task types to the most suitable model, based on performance characteristics such as reasoning depth, structured output reliability, speed, and cost.

Task TypeRecommended Model Family
Tool Calling / Hub SearchGPT-family
Narrative Writing (Memos)Gemini-family
Long Context Document AnalysisGemini Pro / GPT-5
Structured JSON/Table OutputsGPT-family
Code or HTML GenerationAnthropic-family
Classification / TaggingMini / Flash / Nano
Light Text CleanupMini / Flash / Nano
Pre/Post-ProcessingMini / Flash / Nano


Recommended Default Strategy

Use the following escalation approach when selecting models in V7 Go:

  1. Start with a smaller or lower-cost model
    Begin with a smaller model for early development and straightforward tasks. These models are typically sufficient for prompt iteration, simple transformations, and initial workflow configuration.
    • Good for: initial scaffolding, prompt experimentation, and simple transforms.
    • Example types: “Mini”, “Nano”, “Flash” tiers (provider-specific).
  2. Move to a mid- or high-tier model if quality is insufficient
    If outputs are inconsistent or lack depth, upgrade to a more capable model. This is particularly important for:
    • Complex reasoning (e.g., multi-document analysis, nuanced investment judgment)
    • Long documents (e.g., full IMs, DDQs)
    • High-stakes outputs (IC memos, external write-ups)
  3. Use top-tier reasoning models selectively

Reserve the most advanced models for scenarios where lower tiers consistently fail.
Typical use cases include:

  • Hard cases in your test set that cheaper models consistently fail
  • Code/HTML generation where you need more robustness
  • Very complex, multi-step analysis with lots of ambiguity

This tiered approach balances cost efficiency with production-grade reliability.

Model Comparison Matrix

Model TierBest AtWeaknessesCostUse When…
Mini / Flash / NanoLight transforms, classification, taggingWeak reasoning, no tool useLowestYou need speed, low cost, or helper steps
GPT 5.xTool calling, hub search, structured outputsSlightly verbose, “AI-ish” toneMid-rangeYou need structure, reliability, and function calling
Gemini ProSummarization, narrative writing, long-context reads. Has the highest token context window limits.Not ideal for tool calling, not great at Structured Output → JSON consistency.Mid-rangeYou want clean, human-like language or long document analysis
Claude 4.xCode, HTML templates, complex prompt logicExpensive, slowerHighestYou’re building a template or running a complex scoring/calculation workflow

Model Roles and When to Use Them

GPT-family models

Best at:

  • Tool calling (skills, hub search, structured workflows)
  • Hub-based retrieval and reasoning
  • Structured outputs (JSON, tables, well-formed schemas)
  • Good balance of cost, reliability, and flexibility

Use GPT models when:

  • You’re building hub-heavy workflows (e.g., screening/IM agents using Knowledge Hubs).
  • You rely on skills/triggers and need robust function calling.
  • You want predictable adherence to JSON schemas and structured outputs.


Gemini-family models

Best at:

  • Narrative writing tasks (memos, summaries, narrative sections)
  • Long context analysis (very large IMs, multi-document reads)
  • Strong reasoning vs cost tradeoff, especially in recent “Pro” versions.

Use Gemini when:

  • You’re generating written outputs that need to feel more natural, especially with additional prompt tuning.
  • You’re summarizing or analyzing long documents or collections.
  • Cost is a concern, but you still need good reasoning.


Claude / “Opus” level models

Best at:

  • Code and HTML generation, including:
    • Complex HTML report templates
    • More intricate code scaffolding
  • Handling more complex prompt logic when you need extra reasoning.

Caveats:

  • Expensive relative to other options.
  • Use sparingly - for example, to generate or refine a template once, not on every run in production.

Use “Opus-level” models when:

  • You’re generating complex HTML/Python once, which you’ll then keep as a static template.
  • You’ve proven cheaper models fail on specific, complex code-generation tasks.


Mini / Nano / Flash-style models

Best at:

  • Simple, high-volume tasks:
    • Classification (tags, flags, routing)
    • Light transformations (cleaning text, simple extraction)
    • Pre/post-processing around heavier steps
  • Where “good enough” is sufficient and correctness is easy to verify.

Use Mini/Nano/Flash when:

  • You’re iterating quickly on prompts and want cheap experimentation.
  • You’re adding small “helper” properties (e.g., classify risk as Low/Med/High).
  • Latency and cost matter more than deep reasoning.

Model Providers & Cross‑Provider Equivalents

Different providers use different naming schemes, but in practice most models fall into a few functional tiers:

  • Light / Fast
  • Mid‑Tier / Balanced
  • Strong Reasoning
  • Top‑Tier Reasoning
  • Code / Template Specialists

This table shows approximate functional equivalents across the three major providers used in Go. These are not exact matches - just practical comparisons based on cost, speed, and reasoning strength.

Use Case TierOpenAIGoogleAnthropic
Light / FastGPT‑5.2 Mini / NanoGemini Flash / LiteClaude Haiku
Mid‑Tier BalancedGPT‑5.2Gemini 2.5 ProClaude Sonnet
Strong ReasoningGPT‑5.x Reasoning TierGemini 2.5 Pro (high reasoning)Claude Sonnet (high reasoning)
Top‑Tier ReasoningGPT‑5.x AdvancedGemini 3 ProClaude Opus
Code / TemplatesGPT‑5.x (Code bias)Gemini 3 ProClaude Opus
Long Context AnalysisGPT‑5.xGemini 3 ProClaude Sonnet / Opus
Narrative WritingGPT‑5.xGemini 3 ProClaude Sonnet
Structured JSON / ToolsGPT‑5.xGemini 3 ProClaude Sonnet
High‑Volume ClassificationGPT Mini / NanoGemini FlashClaude Haiku

Reasoning Levels: What They Are & When to Use Them

Adjusting the Reasoning Level in Go controls how much analytical depth the model applies to a prompt, balancing response quality and nuance against speed and token cost.

What Are Reasoning Levels?

In Go, most reasoning-capable models (e.g. GPT-5.x, Gemini Pro, Claude Sonnet/Opus) allow you to set a Reasoning Level, which adjusts how much computational depth the model uses when processing a given property or prompt.

This affects:

SettingWhat It Does
AutoAllows Go to decide based on model and prompt complexity
OffFastest, lowest effort, works for trivial queries
LowSlight increase in coherence, still cheap and fast
MidBalanced depth and cost, good for most tasks
HighMaximum reasoning depth, highest cost, used for nuanced analysis


Using a higher reasoning level is especially useful when:

  • The prompt requires multi-step reasoning
  • The data is ambiguous or incomplete
  • You need interpretation or judgment, not just extraction

How Reasoning Impacts Performance

Reasoning LevelSpeedCostDepth of ThoughtBest For
MinFastestCheapestShallowTagging, classification, routing
LowFastLowBasic logicEntity name cleanup, labeling
MediumBalancedModerateStrong defaultSummaries, light reasoning
HighSlowerMore expensiveDeep / multi-stepInvestment merit analysis, complex summaries

Common use case Examples:

Use CaseSuggested Reasoning LevelWhy
Classify doc type as NDA / Lease / POMinObvious keywords, pattern match
Clean up company name from metadataLowSome fuzzy matching needed
Summarize “Business Model” from CIMMediumRequires extraction and summarization
Extract investment meritsHighMulti-point reasoning and judgment
Match PO to MSA via LLM referenceMedium or HighRequires understanding nuance between entities
Extract risk factors from IMHighNeeds interpretive skill and document understanding

Practical Patterns in Go

1: Mixed-model pipelines

In many agents, it’s optimal to combine models:

  • Cheaper model:
    • For early-stage extraction, tagging, or routing.
  • Stronger model:
    • For final narrative sections, investment merits, risks, or sensitive conclusions.

Example:

  • Use a Mini model to:
    • Extract key fields from an IM and classify sections.
  • Use GPT/Gemini to:
    • Write “Investment Merits”, “Risks”, and “Conclusion” blocks.

2: Use GPT for Tool-heavy workflows

Where your workflow:

  • Calls hubs extensively
  • Uses skills to integrate with DealCloud, Excel, CapIQ, or other tools
  • Needs structured JSON outputs

Default to GPT-family for more reliable tool calling and schema adherence.


3: Use Gemini for Output-heavy memo workflows

Where your workflow:

  • Produces long narrative outputs (e.g., screeners, IC memos)
  • Needs clear, natural and well-writtenless text

Bias strongly toward Gemini Pro / equivalent, especially after light prompt tuning.



How to Decide What to Use for a New Agent

When designing a new agent:

  1. Define the output first
    • Is it: classification, JSON, or narrative prose?
    • Is this for internal, low-stakes, or high-stakes decision-making?
  2. Choose an initial model:
    • Tool-heavy and structured - use GPT
    • Long-form writing - use Gemini
    • Code/HTML scaffolding - use Opus-level (for design phase only)
    • Simple routing or classification - use Mini/Nano/Flash
  3. Test on 3–5 representative examples:
    • Include edge cases to find weaknesses early on.
    • Compare models side-by-side if needed by cloning a property and switching the model.
  4. Escalate only if necessary:
    • If you see systematic errors or style issues that matter for the use case, step up to a stronger model.
    • Keep an eye on token usage once you move beyond the prototype phase.


Key Takeaways

  • Accuracy over Cost for critical workflows and extractions
    • Keep cheaper models for ancillary tasks around those critical steps.
  • Use Mini/Flash for volume, GPT for precision, Gemini for narrative, Opus for templates
  • Clone and test properties with different models to compare performance quickly
  • When in doubt:
    • Start with a cheaper model.
    • Benchmark against your golden test set (3–5 well-understood deals/memos).
    • If output doesn’t meet your bar for investment decisions, move up a tier.

When in doubt or pressed on time, you can just use the AI option to let Go decide which model Tier and Family to use for the task at hand.