What we track, what we can't, and how the dollar math works.
Claude Max, ChatGPT Pro, Cursor, Codex — every plan you pay flat-rate for is basically a buffet. They sell unlimited (-ish) at a fixed price, betting the average customer eats less than the menu would charge à la carte. The whole game is: do you eat enough to make the buffet worth it, or are you the table they make money on?
tokenusage sits at the door of the dining room with a notebook. Every time you send a prompt, it scribbles down what you ate and the menu price. At the end of the day it adds the bill back up and compares it to what you actually paid in subscriptions. If you ate $250 of menu price on a $200 plan, congrats — you walked out with $50 worth of free food and the chef's frown.
The dashboard is just that notebook, made readable. Period filter at the top decides the time window. KPI cards show tokens and money. The subscription panel shows your loadout and how close you are to breaking even. The recent sessions list is your meal log. Everything is local — we don't watch you eat, we just count.
| Source | Where | What we count |
|---|---|---|
| Claude Code CLI | ~/.claude/projects/**/*.jsonl | Per-call message.usage from each assistant turn — input, output, cache_read, cache_creation tokens. Deduped by message.id. |
| Codex CLI | ~/.codex/state_5.sqlite + ~/.codex/sessions/**.jsonl | Cumulative total_token_usage from the last token_count event of each thread (input, output, cached, reasoning). |
| Hermes gateway | ~/.hermes/state.db | One row per session — input, output, cache_read/write tokens. Uses actual_cost_usd when recorded, otherwise estimated_cost_usd. |
cost = input × inputRate
+ output × outputRate
+ cacheRead × cacheReadRate (default: 10% of input)
+ cacheWrite × cacheWriteRate (default: 125% of input)
+ reasoning × reasoningRate (default: same as output)Estimates, not invoices. Provider billing has rounding, promotions, free-tier credits, and the occasional model rename. Trust the provider invoice for final numbers.
Claude Code serializes one API response as multiple JSONL lines (one per content block — text / tool-use / thinking), each carrying the same usage. We dedupe by message.id (fallback requestId) so each API call counts once. In a typical session this trims 30–50% of duplicate token counts.
Single-user mode is fully local — no network, no telemetry. Multi-user mode sends usage counts (tokens, timestamps, cost) to the server you control. Raw conversation content is never read or transmitted.