The Three-Hour Wall
It was a Tuesday morning. I had six automated pipelines running against Claude—proposal generation, reply classification, semantic scoring, error triage, code review, and document enrichment. The system was humming along. Then, three hours after the weekly reset, every pipeline stopped returning responses.
Claude Max has a usage allowance. When you hit it, your requests get deprioritized or rejected entirely. There is no dashboard that shows you a ticking percentage in real time. There is no webhook that fires at 80%. You find out you have hit the wall when your responses start timing out.
The problem was not that I was using too much. The problem was that I had no visibility into how much I was using, no way to prioritize which workloads matter, and no mechanism to automatically scale back before hitting the limit.
The real cost is not the subscription fee. It is the silent downtime. When your automated pipelines stop, inbound replies go unanswered, error triage halts, and time-critical work stacks up. You don't get those three hours back.
Two Windows, Not One
Claude Max enforces two separate quota windows: a 5-hour rolling window and a 7-day weekly cap. Most developers only think about the weekly number. That is a mistake.
The 5-hour window is what catches you on burst workloads. You can be well within your weekly budget and still hit the 5-hour ceiling by running a large batch job. Conversely, you can pace yourself perfectly within each 5-hour block but accumulate enough usage over the week to hit the weekly cap by Thursday.
Tracking both windows simultaneously is the foundation of any quota management strategy. Here is what that looks like in code:
import { checkQuota, shouldConserve } from 'claude-quota-manager'; const quota = checkQuota(); // { // fiveHour: { percent: 72, limit: 85, resetsAt: '2026-05-20T14:00:00Z' }, // weekly: { percent: 34, limit: 90, resetsAt: '2026-05-25T00:00:00Z' }, // isStale: false // } console.log(`5h window: ${quota.fiveHour.percent}%`); console.log(`Weekly: ${quota.weekly.percent}%`); console.log(`5h resets: ${quota.fiveHour.resetsAt}`);
The checkQuota() function reads from a local cache file that gets updated by polling the Anthropic usage endpoint. Both windows are returned as percentages along with their reset timestamps. The isStale flag tells you if the cache is outdated—you should not make routing decisions on stale data.
Conservation Mode: Defer, Don't Stop
When quota is running low, the naive approach is to stop everything. That is wrong. Some workloads are cheap and critical. Others are expensive and can wait.
Conservation mode categorizes every task by effort level and criticality. When the 5-hour window crosses 85% or the weekly window crosses 90%, conservation mode activates. It defers expensive, non-critical work while keeping cheap and critical tasks running.
The effort levels map to real model costs:
- low — classification, extraction, health monitoring (Haiku-class work)
- medium — analysis, scoring, proofreading (Sonnet-class work)
- high — proposal generation, complex replies (Opus-class work)
- max — extended thinking tasks (Opus + extended context)
import { checkQuota, shouldConserve, filterTasks } from 'claude-quota-manager'; const quota = checkQuota(); const { conserve, reason } = shouldConserve(quota); if (conserve) { console.log(`Conservation ON: ${reason}`); // "5h window at 87% (threshold: 85%)" } // Filter a task queue to only what should run now const pending = [ { name: 'classify_replies', effort: 'medium', critical: false }, { name: 'reply_responses', effort: 'high', critical: true }, { name: 'proposals_email', effort: 'high', critical: false }, { name: 'extract_names', effort: 'low', critical: false }, { name: 'monitor_health', effort: 'low', critical: false }, ]; const allowed = filterTasks(pending, quota); // Result: reply_responses (critical), extract_names (low), monitor_health (low) // Deferred: classify_replies (medium), proposals_email (high)
Notice what happened: reply_responses runs despite being high-effort because it is marked critical—inbound replies cannot wait. The two low-effort tasks run because they are cheap. But the expensive proposal generation and medium-effort classification are deferred until the quota window resets.
Conservation mode also includes auto-recovery with hysteresis. When usage drops below the threshold minus 10 percentage points, conservation mode deactivates. The hysteresis gap prevents rapid on-off cycling when usage hovers near the threshold.
Per-Task Cost Attribution
Knowing your total usage is necessary. Knowing which tasks consume it is what lets you optimize. Per-task cost attribution wraps any workload in a start/end measurement that captures the quota delta.
import { startTask, endTask, listTasks } from 'claude-quota-manager'; // Begin tracking const { taskId } = startTask('Write proposals batch'); // ... run your workload ... // End tracking — pipeline ran for 0.5 hours in background const result = endTask(taskId, { pipelineHours: 0.5 }); // { // grossCost: 12, // 12% weekly quota consumed // pipelineNoise: 1, // 0.5h x 2%/h background noise // weeklyNet: 11, // actual cost of this task // duration: '00:47:22' // wall clock time // } // Review all tracked tasks const history = listTasks(10); history.forEach(t => console.log(`${t.name}: ${t.weeklyNet}% weekly quota`) );
The pipeline noise subtraction is important. If you have background orchestration running—health checks, enrichment, scoring—it consumes roughly 2% of weekly quota per hour. When you measure a specific task, you need to subtract that baseline to get the true cost of the task itself.
After a week of tracking, you will know exactly which workloads are the quota hogs. In our case, proposal generation consumed 38% of weekly quota while reply classification consumed only 4%. That data drove the conservation mode priority rules—we knew which tasks to defer because we had measured them.
What This Looks Like in Practice
Here is the real workflow after implementing two-window tracking with conservation mode:
- Orchestrator polls usage every 5 minutes and writes the cache file
- Before dispatching each batch,
checkQuota()reads current usage - If either window exceeds its threshold,
shouldConserve()returns true filterTasks()removes non-critical, high-effort work from the queue- Critical and low-effort tasks continue running uninterrupted
- When the 5-hour window resets and usage drops, conservation mode auto-recovers
- Deferred tasks are picked up in the next dispatch cycle
Result: Six weeks of running this system with zero quota walls hit. The same workloads run, they just get sequenced intelligently instead of all firing at once. Weekly quota utilization averages 78%—high enough to get value from the subscription, low enough to never hit the ceiling.
The Module: 66 Tests, Zero Dependencies
The Claude Quota Manager package extracts all of this into a standalone Node.js module. No database required—it uses a JSON file adapter by default and includes an optional adapter pattern if you want Postgres-backed storage. It ships with 66 tests covering two-window parsing, conservation logic, hysteresis recovery, cost attribution, and edge cases like stale cache data and missing files.
What you get:
- Two-window tracking — 5-hour rolling + 7-day weekly, with reset timestamps
- Conservation mode — configurable thresholds, effort-based filtering, auto-recovery
- Per-task cost attribution — start/end measurement, pipeline noise subtraction, full history
- CLI interface —
claude-quota statusandclaude-quota tasksfrom the terminal - Adapter pattern — JSON file default, swap in Postgres or SQLite for production
Stop flying blind on Claude Max quota
Two-window tracking, automatic conservation mode, and per-task cost attribution. 66 tests. Works out of the box with a JSON file store—no database setup needed.
One-time purchase. Includes 12 months of updates. 30-day refund policy.
The LLM Ops Toolkit bundles Quota Manager + LLM Cost Router + LLM Quality Monitor (save 36%).
If you are running any kind of automated workload on a Claude Max subscription—agent pipelines, batch processing, orchestrated multi-step workflows—you need visibility into where your quota goes. The alternative is finding out at the worst possible moment that your allowance is gone and everything has stopped.
The code is extracted from a production system handling 300K+ daily LLM API calls across 7 providers. It has been running for months. It works.