MF Automations — Platform Scale

"How many businesses can Maui run at once, and when do we cluster?"

Concurrency is bounded by compute pressure, not a seat count. The binding constraint is the number of simultaneously active heavy operations — live voice turns, deep-research fan-outs, document builds — not the number of clients on the books. Most clients are idle most of the time, so one node carries far more accounts than its peak-concurrency number suggests.

40–70 clients / node

Per-node ceiling at today's usage mix before load_5m enters the elevated band (12–20).

> 85% = cluster

Sustained utilization >85% for 15 min, OR load_5m >20, OR >8 live-voice sessions — any one trips an add-a-node recommendation.

~linear headroom

Each added Mac-mini carries a further 25–40 businesses. Shared state makes nodes additive until the store itself needs sharding.

Operations

Live operating picture

LIVE · auto-refresh 30s · sample data

Active clients

47

▲ +5 this week

Calls handled · 24h

1,284

▲ +12%

Avg response

1.9s

▼ -0.3s

Node utilization

68%

— headroom 17%

Concurrent operators

6

▲ +1

SLA adherence

99.2%

▲ +0.4%

Open P0

2

▲ +1

Throughput / op

214

▲ +18

P0–P3 priority load

Work items opened per day · last 7 days · click a tier to highlight

Fleet capacity

Current utilization vs 85% cluster trigger

68%

utilized · trigger at 85%

Cluster heatmap

Per-node utilization across the day

idle healthy busy hot trigger

Workflow funnel

Inbound → delivered · 24h

Open priority queue

What compute is on right now

P0 Beth — live onboarding call failing 3m

P0 Qui tam — IRS filing deadline 52m

P1 Arthur — Reinbow deck due today 1h

P1 Johnny — research dossier 2h

P2 Billy — TMF follow-up build 4h

P3 Nightly memory ingest —

How it scales

Single node → governed cluster

We push one node to ~85% before adding hardware, because a fuller node is cheaper and simpler than a second one. Adding capacity means cloning the node behind a priority-aware dispatcher — state stays shared, the persona stays singular.

LAYER 1

Edge / persona

Telephony DIDs, relay, channels. One identity ("Maui"), many lines — clients never talk to "node 3."

LAYER 2

Dispatcher

Priority-aware router + queue. Reads P0–P3 tags, routes each unit to the least-loaded capable node, enforces SLA.

LAYER 3

Node fleet

1..N identical compute nodes — Frank (M4 Max) + Mac-minis. Stateless-ish workers; add one to add capacity.

LAYER 4

Shared state

Memory/RAG, entity graph, secrets bridge, work ledger. The single source of truth every node reads — keeps the cluster consistent.

> 85%

Sustained node utilization for 15 min → CLUSTER recommendation surfaced to operator

load_5m > 20

Compute pressure past the P1 baseline (typical band 2–12) → add-node candidate

> 8 live

Concurrent live-voice sessions exceed a single node's voice budget → split across nodes

The system never auto-buys hardware. It raises the call to an operator with the evidence — the heatmap cell turns red, the gauge crosses the line, and the cluster banner appears. Clustering is specifically a P0/P1-protection move: we add a node when queue depth threatens critical SLAs, not merely because the box is warm.

Priority taxonomy

P0–P3 — the scheduler's rulebook

Every inbound signal is auto-classified on arrival. The tier sets the SLA, the escalation path, and the order compute drains the queue. The cluster works highest-priority-first, globally — across all clients and nodes.

Tier	Definition	Response SLA	Resolve	Behaviour
P0 Critical	Revenue-blocking / client-down: live call failing, money or legal deadline, security event.	< 2 min	< 1 hr	Pages operator; auto-sheds lower load; pre-empts P1–P3.
P1 High	Time-sensitive client work: onboarding live now, deliverable due today, SLA at risk.	< 15 min	< 4 hr	Jumps ahead of routine work; operator notified.
P2 Normal	Standard delivery: a build, a research dossier, a scheduled follow-up. Default tier.	< 2 hr	< 24 hr	Queued; aging past SLA auto-promotes to P1 visibility.
P3 Background	Maintenance, enrichment, ingestion. Fills idle compute.	best-effort	< 7 days	First to pause under load; never blocks paid work.

Work clustering: inbound items are also grouped by client, theme, and root-cause before compute spends on them — so an operator sees "12 items, 3 real problems," and resolving the cluster head clears the tail. That collapse is why throughput-per-operator (214/day) far exceeds manual handling.

Interaction

Drill-down contract

Every instrument is a surface, not a picture. Click anything to go a level deeper; selecting a client or priority cross-filters the whole board.

KPI tile → 30-day sparkline + the underlying records feeding the number. Delta arrows open "what changed."

Stacked-bar segment → the filtered list of those priority items for that day (client, age, owner, status). Hover shows exact count + % of load.

Heatmap cell → node panel: live load_5m, RAM pressure, active sessions, top consumers, and the CLUSTER banner if tripped. The "should I add a box?" screen.

Capacity gauge → threshold history: when utilization crossed 85%, whether a node was added, and the resulting drop.

Funnel stage → the items stuck there, sorted by age, bottleneck owner highlighted.

Cross-filter → pick a client and every instrument filters to that account (per-account scaling view); pick a priority to filter funnel + heatmap.

One brain today.
A cluster on demand.

Density before sprawl

Clone, don't rewrite

Priority is the scheduler

Live operating picture

P0–P3 priority load

Fleet capacity

Cluster heatmap

Workflow funnel

Open priority queue

Single node → governed cluster

Edge / persona

Dispatcher

Node fleet

Shared state

P0–P3 — the scheduler's rulebook

Drill-down contract

Live operating picture

P0–P3 priority load

Fleet capacity

Cluster heatmap

Workflow funnel

Open priority queue

Single node → governed cluster

Edge / persona

Dispatcher

Node fleet

Shared state

P0–P3 — the scheduler's rulebook

Drill-down contract

Detail