MF Automations · Platform Scale

One brain today.
A cluster on demand.

Maui runs as a single high-density compute node serving every operator and client at once. We scale vertically until the node is full, then horizontally by cloning it — and a live P0–P3 priority system decides what the cluster works on first. This is the math, the dashboard, and the trigger points behind "how many businesses can it handle."

47active clients
6concurrent operators
1,284calls / 24h
1.9savg response
99.2%SLA adherence
01

Density before sprawl

A single Mac Studio M4 Max (16 cores / 128 GB) absorbs dozens of concurrent clients before we add hardware. We scale the node, not the org chart.

02

Clone, don't rewrite

When utilization crosses 85% sustained, we add an identical node behind a dispatcher. State is shared; the persona is one. No client notices the seam.

03

Priority is the scheduler

Every unit of work is tagged P0–P3 on arrival. Compute always flows to the highest-priority open item first — across all clients, all nodes.

"How many businesses can Maui run at once, and when do we cluster?"

Concurrency is bounded by compute pressure, not a seat count. The binding constraint is the number of simultaneously active heavy operations — live voice turns, deep-research fan-outs, document builds — not the number of clients on the books. Most clients are idle most of the time, so one node carries far more accounts than its peak-concurrency number suggests.

40–70 clients / node

Per-node ceiling at today's usage mix before load_5m enters the elevated band (12–20).

> 85% = cluster

Sustained utilization >85% for 15 min, OR load_5m >20, OR >8 live-voice sessions — any one trips an add-a-node recommendation.

~linear headroom

Each added Mac-mini carries a further 25–40 businesses. Shared state makes nodes additive until the store itself needs sharding.

Operations

Live operating picture

LIVE · auto-refresh 30s · sample data
Active clients
47
▲ +5 this week
Calls handled · 24h
1,284
▲ +12%
Avg response
1.9s
▼ -0.3s
Node utilization
68%
— headroom 17%
Concurrent operators
6
▲ +1
SLA adherence
99.2%
▲ +0.4%
Open P0
2
▲ +1
Throughput / op
214
▲ +18

P0–P3 priority load

Work items opened per day · last 7 days · click a tier to highlight

Fleet capacity

Current utilization vs 85% cluster trigger
68%
utilized · trigger at 85%

Cluster heatmap

Per-node utilization across the day
idle healthy busy hot trigger

Workflow funnel

Inbound → delivered · 24h

Open priority queue

What compute is on right now
P0 Beth — live onboarding call failing 3m
P0 Qui tam — IRS filing deadline 52m
P1 Arthur — Reinbow deck due today 1h
P1 Johnny — research dossier 2h
P2 Billy — TMF follow-up build 4h
P3 Nightly memory ingest
How it scales

Single node → governed cluster

We push one node to ~85% before adding hardware, because a fuller node is cheaper and simpler than a second one. Adding capacity means cloning the node behind a priority-aware dispatcher — state stays shared, the persona stays singular.

LAYER 1

Edge / persona

Telephony DIDs, relay, channels. One identity ("Maui"), many lines — clients never talk to "node 3."

LAYER 2

Dispatcher

Priority-aware router + queue. Reads P0–P3 tags, routes each unit to the least-loaded capable node, enforces SLA.

LAYER 3

Node fleet

1..N identical compute nodes — Frank (M4 Max) + Mac-minis. Stateless-ish workers; add one to add capacity.

LAYER 4

Shared state

Memory/RAG, entity graph, secrets bridge, work ledger. The single source of truth every node reads — keeps the cluster consistent.

> 85%
Sustained node utilization for 15 min → CLUSTER recommendation surfaced to operator
load_5m > 20
Compute pressure past the P1 baseline (typical band 2–12) → add-node candidate
> 8 live
Concurrent live-voice sessions exceed a single node's voice budget → split across nodes

The system never auto-buys hardware. It raises the call to an operator with the evidence — the heatmap cell turns red, the gauge crosses the line, and the cluster banner appears. Clustering is specifically a P0/P1-protection move: we add a node when queue depth threatens critical SLAs, not merely because the box is warm.

Priority taxonomy

P0–P3 — the scheduler's rulebook

Every inbound signal is auto-classified on arrival. The tier sets the SLA, the escalation path, and the order compute drains the queue. The cluster works highest-priority-first, globally — across all clients and nodes.

TierDefinitionResponse SLAResolveBehaviour
P0 CriticalRevenue-blocking / client-down: live call failing, money or legal deadline, security event.< 2 min< 1 hrPages operator; auto-sheds lower load; pre-empts P1–P3.
P1 HighTime-sensitive client work: onboarding live now, deliverable due today, SLA at risk.< 15 min< 4 hrJumps ahead of routine work; operator notified.
P2 NormalStandard delivery: a build, a research dossier, a scheduled follow-up. Default tier.< 2 hr< 24 hrQueued; aging past SLA auto-promotes to P1 visibility.
P3 BackgroundMaintenance, enrichment, ingestion. Fills idle compute.best-effort< 7 daysFirst to pause under load; never blocks paid work.

Work clustering: inbound items are also grouped by client, theme, and root-cause before compute spends on them — so an operator sees "12 items, 3 real problems," and resolving the cluster head clears the tail. That collapse is why throughput-per-operator (214/day) far exceeds manual handling.

Interaction

Drill-down contract

Every instrument is a surface, not a picture. Click anything to go a level deeper; selecting a client or priority cross-filters the whole board.

KPI tile → 30-day sparkline + the underlying records feeding the number. Delta arrows open "what changed."
Stacked-bar segment → the filtered list of those priority items for that day (client, age, owner, status). Hover shows exact count + % of load.
Heatmap cell → node panel: live load_5m, RAM pressure, active sessions, top consumers, and the CLUSTER banner if tripped. The "should I add a box?" screen.
Capacity gauge → threshold history: when utilization crossed 85%, whether a node was added, and the resulting drop.
Funnel stage → the items stuck there, sorted by age, bottleneck owner highlighted.
Cross-filter → pick a client and every instrument filters to that account (per-account scaling view); pick a priority to filter funnel + heatmap.