AI Pricing Models
How AI tool and provider pricing works — subscription quotas vs pay-per-token — and how to choose the right model for your workload.
Understanding how AI pricing works helps you make better decisions about which tool to use for a given task and avoid unexpected costs. There are two fundamentally different pricing models in use across the tools and providers at Aircury.
Subscription / Quota Model
You pay a fixed monthly fee and receive a usage quota that refreshes on a daily, weekly, or monthly cycle.
How it works: The provider converts your subscription fee into a pool of request capacity. Requests above your quota are either blocked (you wait for the quota to refresh) or charged as overages.
Examples at Aircury:
| Tool / Plan | Monthly cost | Quota style |
|---|---|---|
| Cursor Pro | ~$20/month | Monthly credit pool (~$20 equivalent); unlimited Tab completions |
| Claude Pro | $20/month | Daily usage limits; refreshes every 24 hours |
| Claude Max ($100) | $100/month | 5× Pro limits |
| ChatGPT Plus | $20/month | Daily message limits per model tier |
| OpenCode Go | $10/month | $12 per 5 hours, $30/week, $60/month cap |
Pros:
- Predictable cost — you know exactly what you spend each month
- Generally cheaper per request for heavy, sustained daily use
- No surprise bills
Cons:
- Quota can run out mid-session, forcing you to wait or switch models
- You pay the same whether you use it or not
- Less flexible for bursts of high-volume work (e.g. processing a large codebase)
Pay-per-Token Model
You pay for exactly the tokens you consume — every character sent and received costs money. There is no base fee and no monthly quota.
How it works: Input tokens (your prompt + context) and output tokens (the model’s response) are billed separately per million tokens. A typical coding interaction might use 2,000–8,000 tokens total.
Examples at Aircury:
| Provider | Model | Input / 1M tokens | Output / 1M tokens |
|---|---|---|---|
| AWS Bedrock | Claude Sonnet 4.6 | $3.00 | $15.00 |
| AWS Bedrock | Claude Opus 4.6 | $5.00 | $25.00 |
| AWS Bedrock | Claude Haiku 4.5 | $1.00 | $5.00 |
| Anthropic API (direct) | Claude Sonnet 4.6 | $3.00 | $15.00 |
| OpenAI API | GPT-4o | $2.50 | $10.00 |
What does a request actually cost?
A 4,000-token interaction (2,000 input + 2,000 output) with Claude Sonnet 4.6 costs:
- Input: 2,000 × ($3.00 / 1,000,000) = $0.006
- Output: 2,000 × ($15.00 / 1,000,000) = $0.030
- Total: ~$0.036 per request
At that rate, you would need ~555 requests to reach $20 — roughly the cost of a Claude Pro subscription. If you make fewer requests, pay-per-token is cheaper. If you make more, a subscription wins.
Pros:
- No quota limits — you can run large-scale or burst workloads freely
- Pay only for what you use — ideal for occasional or variable usage
- Access to the full model without rate limiting
Cons:
- Costs can be unpredictable — a session with a large codebase context can be expensive
- Output-heavy tasks (long explanations, full file rewrites) are particularly costly
- Requires active monitoring to avoid bill surprises
Context size multiplies cost
Pay-per-token pricing makes large context windows expensive. Sending a 100KB codebase as context on every request with Claude Sonnet 4.6 costs roughly $0.30 per request in input tokens alone — before any output. Keep context lean when using token-billed providers.
Hybrid Model
Some tools sit between the two extremes.
OpenCode Zen is pay-per-token but routes through OpenCode’s infrastructure with zero data retention and pre-negotiated model access — effectively a managed API layer. Pricing tracks the underlying provider rates. Some models are available free (with data training caveats; see the AI Tools page).
Cursor BYOK (Bring Your Own Key) lets you attach your own API key to Cursor, bypassing Cursor’s credit system for standard chat. You pay per token directly to the provider. Cursor’s own Tab completions and agent features still run on Cursor’s infrastructure regardless.
Choosing the Right Model
| Use case | Recommended pricing model | Why |
|---|---|---|
| Daily coding assistance, regular use | Subscription (Cursor Pro, Claude Pro) | Predictable cost, no quota anxiety for normal workloads |
| Client work requiring maximum privacy | Pay-per-token via AWS Bedrock | Structural data isolation; worth the higher cost |
| Occasional large tasks (refactors, audits) | Pay-per-token (direct API or Bedrock) | Subscription quota would drain quickly; pay only for what you use |
| Experimental or one-off requests | Free tiers (Antigravity preview, OpenCode free models) | No cost — but check privacy rules before using on client code |
| Sustained high-volume agent runs | Subscription with higher tier (Claude Max, Cursor Pro+) | Avoids per-token costs adding up; flat rate is cheaper at scale |
When in doubt, use a subscription tool
For most day-to-day work, subscription tools (Cursor, Claude Code with Pro plan) are the right default. Reserve AWS Bedrock and direct API access for situations where privacy requirements or workload volume specifically justify them.