MF Automations · Platform Scale

One brain today.
A cluster on demand.

Maui runs as a single high-density compute node serving every operator and client at once. We scale vertically until the node is full, then horizontally by cloning it — and a live P0–P3 priority system decides what the cluster works on first. This is the real math, the operating picture, and the trigger points behind "how many businesses can it handle."

46active clients
1compute node
257/261services online
2live calls now
1,690capabilities
01

Density before sprawl

A single Mac Studio M4 Max (16 cores / 128 GB) absorbs dozens of concurrent clients before we add hardware. We scale the node, not the org chart.

02

Clone, don't rewrite

When utilization crosses 85% sustained, we add an identical node behind a dispatcher. State is shared; the persona is one. No client notices the seam.

03

Priority is the scheduler

Every unit of work is tagged P0–P3 on arrival. Compute always flows to the highest-priority open item first — across all clients, all nodes.

"How many businesses can Maui run at once, and when do we cluster?"

Concurrency is bounded by compute pressure, not a seat count. The binding constraint is the number of simultaneously active heavy operations — live voice turns, deep-research fan-outs, document builds — not the number of clients on the books. Most clients are idle most of the time, so one node carries far more accounts than its peak-concurrency number suggests. Today: 46 clients on one node at load_5m 4.07 — comfortably inside 16-core headroom.

40–70 clients / node

Measured per-node ceiling at today's usage mix before load_5m enters the elevated band (12–20).

> 85% = cluster

Sustained utilization >85% for 15 min, OR load_5m >20, OR >8 live-voice sessions — any one trips an add-a-node recommendation.

~linear headroom

Each added Mac-mini carries a further 25–40 businesses. Shared state makes nodes additive until the store itself needs sharding.

Operations

Live operating picture

SNAPSHOT · 2026-06-29 · measured on the live node
Active clients
46
served from one node
Service health
99.2%
257 / 261 online
Voice calls handled
185
35 outbound
Live calls now
2
concurrent
load_5m · 16 cores
4.07
normal band 2–12
Capabilities
1,690
registered
Conversation events
1,836
logged
Compute
1
M4 Max · 128 GB

Priority board — click a tier to drill in

Tier → representative work items → full lifecycle timeline. The scheduler always drains highest-priority-first.

P0–P3 priority load

Work items opened per day · last 7 days · click a tier above to open its queue

Node utilization

Composite compute + memory pressure vs 85% cluster trigger
68%
utilized · trigger at 85%

Workflow funnel

Intake → delivered · trailing 7 days · modeled from the work ledger

Open priority queue

What compute is on right now
P0 Litigation Client — case filing deadline 52m
P1 Real-Estate Client — co-living deck due today 1h
P1 Autonomous Account — research dossier 2h
P2 MSP Client — security brief 3h
P2 Music & Brand Client — launch follow-up build 4h
P3 Nightly memory ingest done 04:15
Drill-down

Tier → item → lifecycle timeline

Every priority tier is a live surface, not a number. Open a tier to list the work it holds, then open any item to trace its full lifecycle — intake, triage/tag, cluster, in-progress, review, done — with timestamps and the owning client and cluster.

① Pick a tier

P0–P3 on the board above. Each opens a panel of that tier's representative work items with status and age.

② Open an item

Each item expands to its six-stage lifecycle timeline with real timestamps and the owning client + work cluster.

③ Step back

A back control returns you to the tier's item list; close returns to the board. The whole path is keyboard-dismissable.

Capacity planning

The clustering calculator

Drive the real scaling model. Set the book of business, the heavy/active share, and concurrent live-voice load — the model projects per-node compute pressure, how many nodes you'd run, the headroom to the cluster trigger, and an estimated cost per client. Defaults reproduce today's snapshot (46 clients, load_5m 4.07, one node).

46
10300
20%
5%100%
2
040
Projected load_5m / node
4.06
normal band 2–12
Nodes required
1
Node-01 only
Headroom to trigger
17%
util 68% of 85%
Est. cost / client
$12.73
per month · est.

Clients vs nodes required

At the current active & voice mix · marker shows your selection
How it scales

Single node → governed cluster

We push one node to ~85% before adding hardware, because a fuller node is cheaper and simpler than a second one. Adding capacity means cloning the node behind a priority-aware dispatcher — state stays shared, the persona stays singular.

LAYER 1

Edge / persona

Telephony DIDs, relay, channels. One identity ("Maui"), many lines — clients never talk to "node 3."

LAYER 2

Dispatcher

Priority-aware router + queue. Reads P0–P3 tags, routes each unit to the least-loaded capable node, enforces SLA.

LAYER 3

Node fleet

1..N identical compute nodes — Node-01 (M4 Max) + Mac-minis. Stateless-ish workers; add one to add capacity.

LAYER 4

Shared state

Memory/RAG, entity graph, secrets bridge, work ledger. The single source of truth every node reads — keeps the cluster consistent.

> 85%
Sustained node utilization for 15 min → CLUSTER recommendation surfaced to operator
load_5m > 12
Compute pressure enters the elevated band (normal 2–12; P1 at >20) → add-node candidate
> 8 live
Concurrent live-voice sessions exceed a single node's voice budget → split across nodes

The system never auto-buys hardware. It raises the call to an operator with the evidence — the gauge crosses the line and the cluster banner appears. Clustering is specifically a P0/P1-protection move: we add a node when queue depth threatens critical SLAs, not merely because the box is warm.

Priority taxonomy

P0–P3 — the scheduler's rulebook

Every inbound signal is auto-classified on arrival. The tier sets the SLA, the escalation path, and the order compute drains the queue. The cluster works highest-priority-first, globally — across all clients and nodes.

TierDefinitionResponse SLAResolveBehaviour
P0 CriticalRevenue-blocking / client-down: live call failing, money or legal deadline, security event.< 2 min< 1 hrPages operator; auto-sheds lower load; pre-empts P1–P3.
P1 HighTime-sensitive client work: onboarding live now, deliverable due today, SLA at risk.< 15 min< 4 hrJumps ahead of routine work; operator notified.
P2 NormalStandard delivery: a build, a research dossier, a scheduled follow-up. Default tier.< 2 hr< 24 hrQueued; aging past SLA auto-promotes to P1 visibility.
P3 BackgroundMaintenance, enrichment, ingestion. Fills idle compute.best-effort< 7 daysFirst to pause under load; never blocks paid work.

Work clustering: inbound items are also grouped by client, theme, and root-cause before compute spends on them — so an operator sees "12 items, 3 real problems," and resolving the cluster head clears the tail. That collapse is why one node clears far more work than seat-by-seat handling would predict. Open any tier on the board to trace a real item from intake to done.