How tokenusage works

What we track, what we can't, and how the dollar math works.

What is tokenusage, really?

Short version: it's the receipt that tells you whether the all-you-can-eat AI buffet was a steal or a sucker move.

Claude Max, ChatGPT Pro, Cursor, Codex — every plan you pay flat-rate for is basically a buffet. They sell unlimited (-ish) at a fixed price, betting the average customer eats less than the menu would charge à la carte. The whole game is: do you eat enough to make the buffet worth it, or are you the table they make money on?

tokenusage sits at the door of the dining room with a notebook. Every time you send a prompt, it scribbles down what you ate and the menu price. At the end of the day it adds the bill back up and compares it to what you actually paid in subscriptions. If you ate $250 of menu price on a $200 plan, congrats — you walked out with $50 worth of free food and the chef's frown.

The dashboard is just that notebook, made readable. Period filter at the top decides the time window. KPI cards show tokens and money. The subscription panel shows your loadout and how close you are to breaking even. The recent sessions list is your meal log. Everything is local — we don't watch you eat, we just count.

What we track

Each row below is a real source on your machine. Numbers come from local files — nothing leaves your computer in single-user mode.

Source	Where	What we count
Claude Code CLI	~/.claude/projects/*/.jsonl	Per-call message.usage from each assistant turn — input, output, cache_read, cache_creation tokens. Deduped by message.id.
Codex CLI	~/.codex/state_5.sqlite + ~/.codex/sessions/**.jsonl	Cumulative total_token_usage from the last token_count event of each thread (input, output, cached, reasoning).
Hermes gateway	~/.hermes/state.db	One row per session — input, output, cache_read/write tokens. Uses actual_cost_usd when recorded, otherwise estimated_cost_usd.

What we can't track

These are limitations of the providers, not the dashboard.

Claude desktop app — closed protocol, no local usage logs, no public API for Pro/Max accounts.
ChatGPT desktop & web — same situation: usage is not exposed to non-API customers.
Sessions whose log files have been deleted — local data is the only source.
Other machines you haven't synced — single-user mode only sees the local machine. Use multi-user mode (server + agents) to merge.

How costs are computed

Token counts are multiplied by a per-token price table (data/prices.default.json). Edit prices at /prices to override.

cost =  input       × inputRate
      + output      × outputRate
      + cacheRead   × cacheReadRate    (default: 10% of input)
      + cacheWrite  × cacheWriteRate   (default: 125% of input)
      + reasoning   × reasoningRate    (default: same as output)

Estimates, not invoices. Provider billing has rounding, promotions, free-tier credits, and the occasional model rename. Trust the provider invoice for final numbers.

Deduplication

Claude Code serializes one API response as multiple JSONL lines (one per content block — text / tool-use / thinking), each carrying the same usage. We dedupe by message.id (fallback requestId) so each API call counts once. In a typical session this trims 30–50% of duplicate token counts.

Privacy

Single-user mode is fully local — no network, no telemetry. Multi-user mode sends usage counts (tokens, timestamps, cost) to the server you control. Raw conversation content is never read or transmitted.

Source

Where

What we count

Claude Code CLI

~/.claude/projects/**/*.jsonl

Per-call message.usage from each assistant turn — input, output, cache_read, cache_creation tokens. Deduped by message.id.

Codex CLI

~/.codex/state_5.sqlite + ~/.codex/sessions/**.jsonl

Cumulative total_token_usage from the last token_count event of each thread (input, output, cached, reasoning).

Hermes gateway

~/.hermes/state.db

One row per session — input, output, cache_read/write tokens. Uses actual_cost_usd when recorded, otherwise estimated_cost_usd.

cost = input × inputRate + output × outputRate + cacheRead × cacheReadRate (default: 10% of input) + cacheWrite × cacheWriteRate (default: 125% of input) + reasoning × reasoningRate (default: same as output)