Docs
AI Pricing Models

AI Pricing Models

How AI tool and provider pricing works — subscription quotas vs pay-per-token — and how to choose the right model for your workload.


Understanding how AI pricing works helps you make better decisions about which tool to use for a given task and avoid unexpected costs. There are two fundamentally different pricing models in use across the tools and providers at Aircury.


Subscription / Quota Model

You pay a fixed monthly fee and receive a usage quota that refreshes on a daily, weekly, or monthly cycle.

How it works: The provider converts your subscription fee into a pool of request capacity. Requests above your quota are either blocked (you wait for the quota to refresh) or charged as overages.

Examples at Aircury:

Tool / PlanMonthly costQuota style
Cursor Pro~$20/monthMonthly credit pool (~$20 equivalent); unlimited Tab completions
Claude Pro$20/monthDaily usage limits; refreshes every 24 hours
Claude Max ($100)$100/month5× Pro limits
ChatGPT Plus$20/monthDaily message limits per model tier
OpenCode Go$10/month$12 per 5 hours, $30/week, $60/month cap

Pros:

  • Predictable cost — you know exactly what you spend each month
  • Generally cheaper per request for heavy, sustained daily use
  • No surprise bills

Cons:

  • Quota can run out mid-session, forcing you to wait or switch models
  • You pay the same whether you use it or not
  • Less flexible for bursts of high-volume work (e.g. processing a large codebase)

Pay-per-Token Model

You pay for exactly the tokens you consume — every character sent and received costs money. There is no base fee and no monthly quota.

How it works: Input tokens (your prompt + context) and output tokens (the model’s response) are billed separately per million tokens. A typical coding interaction might use 2,000–8,000 tokens total.

Examples at Aircury:

ProviderModelInput / 1M tokensOutput / 1M tokens
AWS BedrockClaude Sonnet 4.6$3.00$15.00
AWS BedrockClaude Opus 4.6$5.00$25.00
AWS BedrockClaude Haiku 4.5$1.00$5.00
Anthropic API (direct)Claude Sonnet 4.6$3.00$15.00
OpenAI APIGPT-4o$2.50$10.00

What does a request actually cost?

A 4,000-token interaction (2,000 input + 2,000 output) with Claude Sonnet 4.6 costs:

  • Input: 2,000 × ($3.00 / 1,000,000) = $0.006
  • Output: 2,000 × ($15.00 / 1,000,000) = $0.030
  • Total: ~$0.036 per request

At that rate, you would need ~555 requests to reach $20 — roughly the cost of a Claude Pro subscription. If you make fewer requests, pay-per-token is cheaper. If you make more, a subscription wins.

Pros:

  • No quota limits — you can run large-scale or burst workloads freely
  • Pay only for what you use — ideal for occasional or variable usage
  • Access to the full model without rate limiting

Cons:

  • Costs can be unpredictable — a session with a large codebase context can be expensive
  • Output-heavy tasks (long explanations, full file rewrites) are particularly costly
  • Requires active monitoring to avoid bill surprises
Context size multiplies cost

Pay-per-token pricing makes large context windows expensive. Sending a 100KB codebase as context on every request with Claude Sonnet 4.6 costs roughly $0.30 per request in input tokens alone — before any output. Keep context lean when using token-billed providers.


Hybrid Model

Some tools sit between the two extremes.

OpenCode Zen is pay-per-token but routes through OpenCode’s infrastructure with zero data retention and pre-negotiated model access — effectively a managed API layer. Pricing tracks the underlying provider rates. Some models are available free (with data training caveats; see the AI Tools page).

Cursor BYOK (Bring Your Own Key) lets you attach your own API key to Cursor, bypassing Cursor’s credit system for standard chat. You pay per token directly to the provider. Cursor’s own Tab completions and agent features still run on Cursor’s infrastructure regardless.


Choosing the Right Model

Use caseRecommended pricing modelWhy
Daily coding assistance, regular useSubscription (Cursor Pro, Claude Pro)Predictable cost, no quota anxiety for normal workloads
Client work requiring maximum privacyPay-per-token via AWS BedrockStructural data isolation; worth the higher cost
Occasional large tasks (refactors, audits)Pay-per-token (direct API or Bedrock)Subscription quota would drain quickly; pay only for what you use
Experimental or one-off requestsFree tiers (Antigravity preview, OpenCode free models)No cost — but check privacy rules before using on client code
Sustained high-volume agent runsSubscription with higher tier (Claude Max, Cursor Pro+)Avoids per-token costs adding up; flat rate is cheaper at scale
When in doubt, use a subscription tool

For most day-to-day work, subscription tools (Cursor, Claude Code with Pro plan) are the right default. Reserve AWS Bedrock and direct API access for situations where privacy requirements or workload volume specifically justify them.