AI Pricing Models

Understanding how AI pricing works helps you make better decisions about which tool to use for a given task and avoid unexpected costs. There are two fundamentally different pricing models in use across the tools and providers at Aircury.

Subscription / Quota Model

You pay a fixed monthly fee and receive a usage quota that refreshes on a daily, weekly, or monthly cycle.

How it works: The provider converts your subscription fee into a pool of request capacity. Requests above your quota are either blocked (you wait for the quota to refresh) or charged as overages.

Examples at Aircury:

Tool / Plan	Monthly cost	Quota style
Cursor Pro	~$20/month	Monthly credit pool (~$20 equivalent); unlimited Tab completions
Claude Pro	$20/month	Daily usage limits; refreshes every 24 hours
Claude Max ($100)	$100/month	5× Pro limits
ChatGPT Plus	$20/month	Daily message limits per model tier
OpenCode Go	$10/month	$12 per 5 hours, $30/week, $60/month cap

Pros:

Predictable cost — you know exactly what you spend each month
Generally cheaper per request for heavy, sustained daily use
No surprise bills

Cons:

Quota can run out mid-session, forcing you to wait or switch models
You pay the same whether you use it or not
Less flexible for bursts of high-volume work (e.g. processing a large codebase)

Pay-per-Token Model

You pay for exactly the tokens you consume — every character sent and received costs money. There is no base fee and no monthly quota.

How it works: Input tokens (your prompt + context) and output tokens (the model’s response) are billed separately per million tokens. A typical coding interaction might use 2,000–8,000 tokens total.

Examples at Aircury:

Provider	Model	Input / 1M tokens	Output / 1M tokens
AWS Bedrock	Claude Sonnet 4.6	$3.00	$15.00
AWS Bedrock	Claude Opus 4.6	$5.00	$25.00
AWS Bedrock	Claude Haiku 4.5	$1.00	$5.00
Anthropic API (direct)	Claude Sonnet 4.6	$3.00	$15.00
OpenAI API	GPT-4o	$2.50	$10.00

What does a request actually cost?

A 4,000-token interaction (2,000 input + 2,000 output) with Claude Sonnet 4.6 costs:

Input: 2,000 × ($3.00 / 1,000,000) = $0.006
Output: 2,000 × ($15.00 / 1,000,000) = $0.030
Total: ~$0.036 per request

At that rate, you would need ~555 requests to reach $20 — roughly the cost of a Claude Pro subscription. If you make fewer requests, pay-per-token is cheaper. If you make more, a subscription wins.

Pros:

No quota limits — you can run large-scale or burst workloads freely
Pay only for what you use — ideal for occasional or variable usage
Access to the full model without rate limiting

Cons:

Costs can be unpredictable — a session with a large codebase context can be expensive
Output-heavy tasks (long explanations, full file rewrites) are particularly costly
Requires active monitoring to avoid bill surprises

Context size multiplies cost

Pay-per-token pricing makes large context windows expensive. Sending a 100KB codebase as context on every request with Claude Sonnet 4.6 costs roughly $0.30 per request in input tokens alone — before any output. Keep context lean when using token-billed providers.

Hybrid Model

Some tools sit between the two extremes.

OpenCode Zen is pay-per-token but routes through OpenCode’s infrastructure with zero data retention and pre-negotiated model access — effectively a managed API layer. Pricing tracks the underlying provider rates. Some models are available free (with data training caveats; see the AI Tools page).

Cursor BYOK (Bring Your Own Key) lets you attach your own API key to Cursor, bypassing Cursor’s credit system for standard chat. You pay per token directly to the provider. Cursor’s own Tab completions and agent features still run on Cursor’s infrastructure regardless.

Choosing the Right Model

Use case	Recommended pricing model	Why
Daily coding assistance, regular use	Subscription (Cursor Pro, Claude Pro)	Predictable cost, no quota anxiety for normal workloads
Client work requiring maximum privacy	Pay-per-token via AWS Bedrock	Structural data isolation; worth the higher cost
Occasional large tasks (refactors, audits)	Pay-per-token (direct API or Bedrock)	Subscription quota would drain quickly; pay only for what you use
Experimental or one-off requests	Free tiers (Antigravity preview, OpenCode free models)	No cost — but check privacy rules before using on client code
Sustained high-volume agent runs	Subscription with higher tier (Claude Max, Cursor Pro+)	Avoids per-token costs adding up; flat rate is cheaper at scale

When in doubt, use a subscription tool

For most day-to-day work, subscription tools (Cursor, Claude Code with Pro plan) are the right default. Reserve AWS Bedrock and direct API access for situations where privacy requirements or workload volume specifically justify them.