MF Automations — Platform Scale

"How many businesses can Maui run at once, and when do we cluster?"

Concurrency is bounded by compute pressure, not a seat count. The binding constraint is the number of simultaneously active heavy operations — live voice turns, deep-research fan-outs, document builds — not the number of clients on the books. Most clients are idle most of the time, so one node carries far more accounts than its peak-concurrency number suggests. Today: 46 clients on one node at load_5m 4.07 — comfortably inside 16-core headroom.

40–70 clients / node

Measured per-node ceiling at today's usage mix before load_5m enters the elevated band (12–20).

> 85% = cluster

Sustained utilization >85% for 15 min, OR load_5m >20, OR >8 live-voice sessions — any one trips an add-a-node recommendation.

~linear headroom

Each added Mac-mini carries a further 25–40 businesses. Shared state makes nodes additive until the store itself needs sharding.

Operations

Live operating picture

SNAPSHOT · 2026-06-29 · measured on the live node

Active clients

46

served from one node

Service health

99.2%

257 / 261 online

Voice calls handled

185

35 outbound

Live calls now

2

concurrent

load_5m · 16 cores

4.07

normal band 2–12

Capabilities

1,690

registered

Conversation events

1,836

logged

Compute

1

M4 Max · 128 GB

Priority board — click a tier to drill in

Tier → representative work items → full lifecycle timeline. The scheduler always drains highest-priority-first.

P0–P3 priority load

Work items opened per day · last 7 days · click a tier above to open its queue

Node utilization

Composite compute + memory pressure vs 85% cluster trigger

68%

utilized · trigger at 85%

Workflow funnel

Intake → delivered · trailing 7 days · modeled from the work ledger

Open priority queue

What compute is on right now

P0 Litigation Client — case filing deadline 52m

P1 Real-Estate Client — co-living deck due today 1h

P1 Autonomous Account — research dossier 2h

P2 MSP Client — security brief 3h

P2 Music & Brand Client — launch follow-up build 4h

P3 Nightly memory ingest done 04:15

Drill-down

Tier → item → lifecycle timeline

Every priority tier is a live surface, not a number. Open a tier to list the work it holds, then open any item to trace its full lifecycle — intake, triage/tag, cluster, in-progress, review, done — with timestamps and the owning client and cluster.

① Pick a tier

P0–P3 on the board above. Each opens a panel of that tier's representative work items with status and age.

② Open an item

Each item expands to its six-stage lifecycle timeline with real timestamps and the owning client + work cluster.

③ Step back

A back control returns you to the tier's item list; close returns to the board. The whole path is keyboard-dismissable.

Capacity planning

The clustering calculator

Drive the real scaling model. Set the book of business, the heavy/active share, and concurrent live-voice load — the model projects per-node compute pressure, how many nodes you'd run, the headroom to the cluster trigger, and an estimated cost per client. Defaults reproduce today's snapshot (46 clients, load_5m 4.07, one node).

Clients on the book46

10300

Heavy / active share20%

5%100%

Concurrent live-voice sessions2

040

Projected load_5m / node

4.06

normal band 2–12

Nodes required

1

Node-01 only

Headroom to trigger

17%

util 68% of 85%

Est. cost / client

$12.73

per month · est.

Clients vs nodes required

At the current active & voice mix · marker shows your selection

How it scales

Single node → governed cluster

We push one node to ~85% before adding hardware, because a fuller node is cheaper and simpler than a second one. Adding capacity means cloning the node behind a priority-aware dispatcher — state stays shared, the persona stays singular.

LAYER 1

Edge / persona

Telephony DIDs, relay, channels. One identity ("Maui"), many lines — clients never talk to "node 3."

LAYER 2

Dispatcher

Priority-aware router + queue. Reads P0–P3 tags, routes each unit to the least-loaded capable node, enforces SLA.

LAYER 3

Node fleet

1..N identical compute nodes — Node-01 (M4 Max) + Mac-minis. Stateless-ish workers; add one to add capacity.

LAYER 4

Shared state

Memory/RAG, entity graph, secrets bridge, work ledger. The single source of truth every node reads — keeps the cluster consistent.

> 85%

Sustained node utilization for 15 min → CLUSTER recommendation surfaced to operator

load_5m > 12

Compute pressure enters the elevated band (normal 2–12; P1 at >20) → add-node candidate

> 8 live

Concurrent live-voice sessions exceed a single node's voice budget → split across nodes

The system never auto-buys hardware. It raises the call to an operator with the evidence — the gauge crosses the line and the cluster banner appears. Clustering is specifically a P0/P1-protection move: we add a node when queue depth threatens critical SLAs, not merely because the box is warm.

Priority taxonomy

P0–P3 — the scheduler's rulebook

Every inbound signal is auto-classified on arrival. The tier sets the SLA, the escalation path, and the order compute drains the queue. The cluster works highest-priority-first, globally — across all clients and nodes.

Tier	Definition	Response SLA	Resolve	Behaviour
P0 Critical	Revenue-blocking / client-down: live call failing, money or legal deadline, security event.	< 2 min	< 1 hr	Pages operator; auto-sheds lower load; pre-empts P1–P3.
P1 High	Time-sensitive client work: onboarding live now, deliverable due today, SLA at risk.	< 15 min	< 4 hr	Jumps ahead of routine work; operator notified.
P2 Normal	Standard delivery: a build, a research dossier, a scheduled follow-up. Default tier.	< 2 hr	< 24 hr	Queued; aging past SLA auto-promotes to P1 visibility.
P3 Background	Maintenance, enrichment, ingestion. Fills idle compute.	best-effort	< 7 days	First to pause under load; never blocks paid work.

Work clustering: inbound items are also grouped by client, theme, and root-cause before compute spends on them — so an operator sees "12 items, 3 real problems," and resolving the cluster head clears the tail. That collapse is why one node clears far more work than seat-by-seat handling would predict. Open any tier on the board to trace a real item from intake to done.

One brain today.
A cluster on demand.

Density before sprawl

Clone, don't rewrite

Priority is the scheduler