Operation Daybreak · Patient Appointment Churn · Deep Dive
We built the churn tile yesterday. Today we stress-tested it.
The question
Is "Patient Appointment Churn" really the lever we thought?
Run two independent quant analysts + GPT-5 Codex adversarial review on the data. Find out what's real, what's overclaim, and where the actual front-desk lever lives.
What's coming in the next 8 slides

Eight findings, each in plain English. Some confirm what we built. Some change it. One we have to retire.

Slide 2 = the headline reframe. Slides 3-5 = what the data actually says. Slide 6 = the dollar question. Slide 7 = the mechanism. Slides 8-9 = what to do this week.

Finding 1 · Reframe
The "churn moves growth" story is real — but quieter than we said yesterday.
Yesterday vs today
~1 pp growth lift, not 3.4 pp
Cutting a worst-quartile practice's churn down to the network median lifts annual growth ~1 percentage point. The prior 3.4 pp estimate didn't survive controlling for practice size.
Raw correlation
−0.128
Pearson r, churn vs YoY growth (n=188). p=0.080 — borderline.
After size control
p = 0.55
Big practices grow more and have different churn — that's most of the signal.
Bootstrap 95% CI
[−0.31, +0.05]
Includes zero. We can't rule out that the true effect is nothing.
In plain English

Yesterday we said "if you cut a high-churn practice's churn down to median, you'll see +3.4pp more annual growth." That number came from a simple bivariate fit.

Today's tighter analysis controls for practice size — and most of yesterday's signal disappears. The honest number is closer to +1 pp. Still positive. Still real. Just much smaller.

Bottom line: the composite SAPS churn metric is fine as a "is this practice in chaos?" flag — but stop selling it as the network's biggest growth lever. It's not.

Finding 2 · The Real Front-Desk Lever
Confirmation rate is the one front-desk metric that actually predicts growth.
The winner among 5 churn components
+10 pp confirmation rate → +2.5 pp YoY growth
Only metric with a statistically clean signal after size control. Survives Bonferroni correction (just barely). The other components — no-show rate, cancel rate, patient-fault cancels — show nothing usable.
Confirmed_pct
+0.231
p=0.002. ΔR²=+3.4%. The signal.
No-show rate
−0.088
p=0.33. Conflicting Pearson/Spearman. Noise.
Cancel rate
+0.069
p=0.52. Wrong sign. Probably a PMS artifact.
Patient-fault %
+0.003
p=0.97. 77% of practices report literal zero. Unusable.
In plain English

We checked 5 different ways to measure churn. Only one — confirmation rate (% of booked appointments the patient confirmed before the day) — actually moves with growth.

Practice with 75% confirmation grows ~2.5 pp/year faster than one with 65%. Front desk can directly change this — it's the 3-day / 1-day / 2-hour reminder chain.

Codex caveat: this is "the strongest observed association," not yet "proven causation." A practice with great managers probably confirms more AND grows more — could be a manager-quality story. Out-of-sample test before promoting tile to "active."

Finding 3 · Two Failure Modes, Not One
Risk-flagged patient non-completion is 3.5× stronger — but it's a separate problem.
The strongest growth predictor in the entire dataset
r = −0.443  ·  p < 0.001
The Dental Intel "risk_notcomp_rate" (risk-flagged appointments that didn't complete ÷ total risk-flagged) is the cleanest signal we've ever measured. But it's not the same as SAPS churn — and not the same lever.

How the two relate

Near-orthogonal — independent problems

What we measuredPearson r
SAPS churn ↔ risk non-completion+0.109
Shared variance (R²)1.2%
Risk non-completion → growth−0.443
SAPS churn → growth (after controls)−0.012
When you add risk non-completion to the growth model, the SAPS churn signal collapses to near-zero. Two different failure modes — not one cause and one symptom.

What that means in practice

Different teams, different protocols

SAPS churn
Front-desk lever. Confirmation discipline, recall workflows, reminder chain. OM owns it.
Risk non-comp
Patient-mix marker. Reflects the practice's payor mix, demographic, clinical access. Not directly fixable by the front desk.
Codex view
Risk non-completion is a diagnostic, not a recovery promise. Surface to ROD; don't put on OM huddle board.
Falsifiable?
Yes — payor mix + zip-income controls + lagged capacity. If r=−0.443 survives those, it really is a clinical/access lever and not just a mix marker.
Finding 4 · Rework Tax Bracket
"$73.7M annualized rework tax" is the ceiling. The defensible floor is $5.9M.
Floor-to-ceiling, with the reason
$5.9M  →  $73.7M
Quant B's top-line $73.7M is 70% one component ($51.7M of PBI's "risk_notcomp_dollars") with an unverified time window and possible overlap with the other two components. Codex pulls that out and recomputes the conservative floor.

Three components, three confidence levels

What's in the $73.7M and how much we trust each

ComponentAnnualizedType
A. Front-office rework labor$7.1Mrate-card
B. Chair idle (lost slot)$14.9Mrate-card
C. Risk non-completion ($)$51.7Munverified window
Total$73.7Mmixed
Codex defensible floor (drop C, stricter A/B)
Handler time: 10 min × 1 person × $20/hr
$2.0M
Chair idle: $125/hr × 25% unfilled
$3.9M
Defensible floor (annual)
$5.9M

What "moving worst quartile to median" actually recovers

The realistic recovery scenario

Quant B
$6.0M annual recovery · 1.1% of organic-GP production.
Codex floor
$0.66M annual recovery · 0.13% of production · before any risk-NC recovery.
Worst-quartile n
47 practices, churn ≥32.2%. Target = median 26.2%. Roughly 6 pp improvement per practice.
Our posture
Publish the floor. Show the ceiling as upper-bound in the drill rail. Don't quote $73.7M in any exec deck.
Open validation: confirm risk_notcomp_dollars time window with PBI dataset owner. If it's not 90-day, the ×4.06 annualization is wrong by definition.
Finding 5 · Component Split
73% of churned appointments are rescheduled, not no-shows or hard cancels.
The shape of the chaos
Reschedule 73% · Cancel 15% · No-show 13% · Patient-fault 0.3%
When we decompose churn into its components: most "broken" appointments aren't patients ghosting — they're appointments getting moved. That's a protocol problem (recall + confirmation cadence), not a "fire the no-show patient" problem.
Patient-fault cancels
0.3%
Nearly nothing. Churn is mostly practice-side.
In plain English

If you ask an OM "what's killing your schedule," you'll probably hear "no-shows" or "flaky patients." The data says no.

Three out of every four churned appointments are moves — same patient, different day. That's a workflow problem (recall, confirmation, reschedule cadence) the practice fully owns. Patient-fault cancels are literally 0.3% of the SAPS denominator — basically nothing.

Caveat: only 83 of 188 practices have complete component decomposition. The other 105 have PMS-coding gaps that make no-show / cancel counts unreliable.

Finding 6 · Mechanism
Adding front-office staff doesn't fix churn. High churn forces MORE staff.
Direction-of-causation finding
More staff ≠ less churn (it's the other way)
On the 43 practices where we have both labor + churn data: high-staffed practices have HIGHER churn, and front-office hours per delivered visit RISE with churn. Reverse causation. The chaos is creating the labor load, not the other way around.

Front-office staffing density vs churn

More FO staff → MORE churn (not less)

Staffing tier (FO FTE / 1k active pts)Mean churn 90d
T1 — low (0.10 FTE/1k)18.1%
T2 — mid (0.18 FTE/1k)23.7%
T3 — high (0.29 FTE/1k)22.1%
"Add a front-desk hire to fix the churn" is the wrong move — the data points the opposite direction. Pearson r=+0.307 (n=43).

Hours per delivered visit, by churn quartile

FO hours scale with chaos (rework tax in person-hours)

Churn quartileFO h / visit
Q1 — lowest churn (11.7%)0.29
Q2 (18.2%)0.41
Q3 (24.3%)0.40
Q4 — highest churn (30.4%)0.47
Total hours per visit stays flat across quartiles (1.55–1.67) — the chair side doesn't scale. Only front-office does. That's the rework tax made visible in person-hours.
Finding 7 · Decision
What we change in Daybreak this week.
From one tile to three — each with the right job
T13 (reframe)  ·  T14 (new)  ·  T15 (new)
One tile can't carry "rework flag," "growth lever," and "patient-mix marker" all at once. Split them so each lights up the right team with the right ask.

Moves

MoveWhyConfidence
Ship T13 Patient Appointment Churn as-is — but reframeOperational rework-flag, not the growth lever. Update validation_status copy in catalog.yaml.High
Add T14 Confirmation Discipline tileThe actual front-desk lever per Quant A. Bands: ≥90% great · 80-90% good · 70-80% warn · <70% critical.Medium
Add T15 Risk-Patient Non-Completion (diagnostic only)Strongest signal but a marker, not a lever. Surface to ROD for context; no OM action.Medium
Drop the $280/visit rate-card in the threshold formulaReplace with Codex's defensible floor: excess × saps × $24/hr × 0.25hr × 2. Full model in drill rail as upper bound.High
Front-office FTE NOT in any thresholdDirection-of-causation unresolved; labor coverage only 56 practices.High
Finding 8 · What's Next
Five validations we need before "exploratory" becomes "validated."
Honest open work
5 validations, ranked by leverage
The big surviving claim — confirmation rate predicts growth — is statistically the strongest thing we found. But it's cross-sectional and brand-confounded. These five tests resolve the open questions before we promote any tile from exploratory to validated.
Open validation queue
  1. Confirm risk_notcomp_dollars time window with the PBI dataset owner. If it's not 90-day, the $51.7M annualization is wrong by definition. (Highest leverage — unlocks the dollar-magnitude story.)
  2. Within-practice longitudinal test: does confirmation_pct change in month N predict growth change in month N+12 inside the same practice? Resolves reverse-causation for the T14 lever.
  3. Brand / PMS / payor fixed-effects regression on confirmed_pct → growth and risk_notcomp_rate → growth. The cross-sectional signal could be brand-confounded; this rules it out.
  4. Out-of-sample replication of confirmed_pct → growth on a separate trailing window (e.g., prior 365d). Required before T14 promotes from exploratory to active.
  5. Labor coverage expansion beyond Accelerate brand — extend punches to Parks Pace / SGA East so the FO-staffing-direction claim has n>100. (Currently n=43, exploratory only.)
The honest summary

We built a tile yesterday based on a real signal. The deep dive shows the signal is real but smaller, in a different place than we thought, and entangled with two other independent failure modes.

The fix is to ship three tiles, not one — and to publish a defensible $5.9M floor instead of a $73.7M ceiling we can't yet defend. Everything else queues into the validation list above.