Maui runs as a single high-density compute node serving every operator and client at once. We scale vertically until the node is full, then horizontally by cloning it — and a live P0–P3 priority system decides what the cluster works on first. This is the real math, the operating picture, and the trigger points behind "how many businesses can it handle."
A single Mac Studio M4 Max (16 cores / 128 GB) absorbs dozens of concurrent clients before we add hardware. We scale the node, not the org chart.
When utilization crosses 85% sustained, we add an identical node behind a dispatcher. State is shared; the persona is one. No client notices the seam.
Every unit of work is tagged P0–P3 on arrival. Compute always flows to the highest-priority open item first — across all clients, all nodes.
Concurrency is bounded by compute pressure, not a seat count. The binding constraint is the number of simultaneously active heavy operations — live voice turns, deep-research fan-outs, document builds — not the number of clients on the books. Most clients are idle most of the time, so one node carries far more accounts than its peak-concurrency number suggests. Today: 46 clients on one node at load_5m 4.07 — comfortably inside 16-core headroom.
Measured per-node ceiling at today's usage mix before load_5m enters the elevated band (12–20).
Sustained utilization >85% for 15 min, OR load_5m >20, OR >8 live-voice sessions — any one trips an add-a-node recommendation.
Each added Mac-mini carries a further 25–40 businesses. Shared state makes nodes additive until the store itself needs sharding.
Every priority tier is a live surface, not a number. Open a tier to list the work it holds, then open any item to trace its full lifecycle — intake, triage/tag, cluster, in-progress, review, done — with timestamps and the owning client and cluster.
P0–P3 on the board above. Each opens a panel of that tier's representative work items with status and age.
Each item expands to its six-stage lifecycle timeline with real timestamps and the owning client + work cluster.
A back control returns you to the tier's item list; close returns to the board. The whole path is keyboard-dismissable.
Drive the real scaling model. Set the book of business, the heavy/active share, and concurrent live-voice load — the model projects per-node compute pressure, how many nodes you'd run, the headroom to the cluster trigger, and an estimated cost per client. Defaults reproduce today's snapshot (46 clients, load_5m 4.07, one node).
We push one node to ~85% before adding hardware, because a fuller node is cheaper and simpler than a second one. Adding capacity means cloning the node behind a priority-aware dispatcher — state stays shared, the persona stays singular.
Telephony DIDs, relay, channels. One identity ("Maui"), many lines — clients never talk to "node 3."
Priority-aware router + queue. Reads P0–P3 tags, routes each unit to the least-loaded capable node, enforces SLA.
1..N identical compute nodes — Node-01 (M4 Max) + Mac-minis. Stateless-ish workers; add one to add capacity.
Memory/RAG, entity graph, secrets bridge, work ledger. The single source of truth every node reads — keeps the cluster consistent.
The system never auto-buys hardware. It raises the call to an operator with the evidence — the gauge crosses the line and the cluster banner appears. Clustering is specifically a P0/P1-protection move: we add a node when queue depth threatens critical SLAs, not merely because the box is warm.
Every inbound signal is auto-classified on arrival. The tier sets the SLA, the escalation path, and the order compute drains the queue. The cluster works highest-priority-first, globally — across all clients and nodes.
| Tier | Definition | Response SLA | Resolve | Behaviour |
|---|---|---|---|---|
| P0 Critical | Revenue-blocking / client-down: live call failing, money or legal deadline, security event. | < 2 min | < 1 hr | Pages operator; auto-sheds lower load; pre-empts P1–P3. |
| P1 High | Time-sensitive client work: onboarding live now, deliverable due today, SLA at risk. | < 15 min | < 4 hr | Jumps ahead of routine work; operator notified. |
| P2 Normal | Standard delivery: a build, a research dossier, a scheduled follow-up. Default tier. | < 2 hr | < 24 hr | Queued; aging past SLA auto-promotes to P1 visibility. |
| P3 Background | Maintenance, enrichment, ingestion. Fills idle compute. | best-effort | < 7 days | First to pause under load; never blocks paid work. |
Work clustering: inbound items are also grouped by client, theme, and root-cause before compute spends on them — so an operator sees "12 items, 3 real problems," and resolving the cluster head clears the tail. That collapse is why one node clears far more work than seat-by-seat handling would predict. Open any tier on the board to trace a real item from intake to done.