Maui runs as a single high-density compute node serving every operator and client at once. We scale vertically until the node is full, then horizontally by cloning it — and a live P0–P3 priority system decides what the cluster works on first. This is the math, the dashboard, and the trigger points behind "how many businesses can it handle."
A single Mac Studio M4 Max (16 cores / 128 GB) absorbs dozens of concurrent clients before we add hardware. We scale the node, not the org chart.
When utilization crosses 85% sustained, we add an identical node behind a dispatcher. State is shared; the persona is one. No client notices the seam.
Every unit of work is tagged P0–P3 on arrival. Compute always flows to the highest-priority open item first — across all clients, all nodes.
Concurrency is bounded by compute pressure, not a seat count. The binding constraint is the number of simultaneously active heavy operations — live voice turns, deep-research fan-outs, document builds — not the number of clients on the books. Most clients are idle most of the time, so one node carries far more accounts than its peak-concurrency number suggests.
Per-node ceiling at today's usage mix before load_5m enters the elevated band (12–20).
Sustained utilization >85% for 15 min, OR load_5m >20, OR >8 live-voice sessions — any one trips an add-a-node recommendation.
Each added Mac-mini carries a further 25–40 businesses. Shared state makes nodes additive until the store itself needs sharding.
We push one node to ~85% before adding hardware, because a fuller node is cheaper and simpler than a second one. Adding capacity means cloning the node behind a priority-aware dispatcher — state stays shared, the persona stays singular.
Telephony DIDs, relay, channels. One identity ("Maui"), many lines — clients never talk to "node 3."
Priority-aware router + queue. Reads P0–P3 tags, routes each unit to the least-loaded capable node, enforces SLA.
1..N identical compute nodes — Frank (M4 Max) + Mac-minis. Stateless-ish workers; add one to add capacity.
Memory/RAG, entity graph, secrets bridge, work ledger. The single source of truth every node reads — keeps the cluster consistent.
The system never auto-buys hardware. It raises the call to an operator with the evidence — the heatmap cell turns red, the gauge crosses the line, and the cluster banner appears. Clustering is specifically a P0/P1-protection move: we add a node when queue depth threatens critical SLAs, not merely because the box is warm.
Every inbound signal is auto-classified on arrival. The tier sets the SLA, the escalation path, and the order compute drains the queue. The cluster works highest-priority-first, globally — across all clients and nodes.
| Tier | Definition | Response SLA | Resolve | Behaviour |
|---|---|---|---|---|
| P0 Critical | Revenue-blocking / client-down: live call failing, money or legal deadline, security event. | < 2 min | < 1 hr | Pages operator; auto-sheds lower load; pre-empts P1–P3. |
| P1 High | Time-sensitive client work: onboarding live now, deliverable due today, SLA at risk. | < 15 min | < 4 hr | Jumps ahead of routine work; operator notified. |
| P2 Normal | Standard delivery: a build, a research dossier, a scheduled follow-up. Default tier. | < 2 hr | < 24 hr | Queued; aging past SLA auto-promotes to P1 visibility. |
| P3 Background | Maintenance, enrichment, ingestion. Fills idle compute. | best-effort | < 7 days | First to pause under load; never blocks paid work. |
Work clustering: inbound items are also grouped by client, theme, and root-cause before compute spends on them — so an operator sees "12 items, 3 real problems," and resolving the cluster head clears the tail. That collapse is why throughput-per-operator (214/day) far exceeds manual handling.
Every instrument is a surface, not a picture. Click anything to go a level deeper; selecting a client or priority cross-filters the whole board.
Drill-down view.