Operation Daybreak · Patient Appointment Churn · Deep Dive
We built the churn tile yesterday. Today we stress-tested it.
The question
Is "Patient Appointment Churn" really the lever we thought?
Run two independent quant analysts + GPT-5 Codex adversarial review on the data. Find out what's real, what's overclaim, and where the actual front-desk lever lives.
What's coming in the next 10 slides

Ten findings, each in plain English. Some confirm what we built. Some change it. One we have to retire.

Slide 2 = the headline reframe. Slides 3-5 = the data + fixed-effects validation. Slide 6 = sizing. Slide 7 = component split. Slide 8 = mechanism. Slides 9-10 = decision + what's next.

Finding 1 · Reframe
The "churn moves growth" story is real — but quieter than we said yesterday.
Yesterday vs today
~1 pp growth lift, not 3.4 pp
Cutting a worst-quartile practice's churn down to the network median lifts annual growth ~1 percentage point. The prior 3.4 pp estimate didn't survive controlling for practice size.
Raw correlation
−0.128
Pearson r, churn vs YoY growth (n=188). p=0.080 — borderline.
After size control
p = 0.55
Big practices grow more and have different churn — that's most of the signal.
Bootstrap 95% CI
[−0.31, +0.05]
Includes zero. We can't rule out that the true effect is nothing.
In plain English

Yesterday we said "if you cut a high-churn practice's churn down to median, you'll see +3.4pp more annual growth." That number came from a simple bivariate fit.

Today's tighter analysis controls for practice size — and most of yesterday's signal disappears. The honest number is closer to +1 pp. Still positive. Still real. Just much smaller.

Bottom line: the composite SAPS churn metric is fine as a "is this practice in chaos?" flag — but stop selling it as the network's biggest growth lever. It's not.

Finding 2 · The Real Front-Desk Lever
Confirmation rate is the one front-desk metric that actually predicts growth.
The winner among 5 churn components
+10 pp confirmation rate → +2.5 pp YoY growth
Only metric with a statistically clean signal after size control. Survives Bonferroni correction (just barely). The other components — no-show rate, cancel rate, patient-fault cancels — show nothing usable.
Confirmed_pct
+0.231
p=0.002. ΔR²=+3.4%. The signal.
No-show rate
−0.088
p=0.33. Conflicting Pearson/Spearman. Noise.
Cancel rate
+0.069
p=0.52. Wrong sign. Probably a PMS artifact.
Patient-fault %
+0.003
p=0.97. 77% of practices report literal zero. Unusable.
In plain English

We checked 5 different ways to measure churn. Only one — confirmation rate (% of booked appointments the patient confirmed before the day) — actually moves with growth.

Practice with 75% confirmation grows ~2.5 pp/year faster than one with 65%. Front desk can directly change this — it's the 3-day / 1-day / 2-hour reminder chain.

Codex caveat: this is "the strongest observed association," not yet "proven causation." A practice with great managers probably confirms more AND grows more — could be a manager-quality story. Out-of-sample test before promoting tile to "active."

Finding 3 · Two Failure Modes, Not One
Risk-flagged patient non-completion is 3.5× stronger — but it's a separate problem.
The strongest growth predictor in the entire dataset
r = −0.443  ·  p < 0.001
The Dental Intel "risk_notcomp_rate" (risk-flagged appointments that didn't complete ÷ total risk-flagged) is the cleanest signal we've ever measured. But it's not the same as SAPS churn — and not the same lever.

How the two relate

Near-orthogonal — independent problems

What we measuredPearson r
SAPS churn ↔ risk non-completion+0.109
Shared variance (R²)1.2%
Risk non-completion → growth−0.443
SAPS churn → growth (after controls)−0.012
When you add risk non-completion to the growth model, the SAPS churn signal collapses to near-zero. Two different failure modes — not one cause and one symptom.

What that means in practice

Two separate questions — keep them separate

SAPS churn
Front-desk confirmation discipline. Reminder chain, recall workflows. OM owns it via T14.
Risk non-comp
Protect-the-chair protocol. When PMS flags risk-NC patient on today's schedule: park-list standby, wave double-book, hyper-confirm chain (7d/3d/1d/90min/same-day call), pre-collect for cash plans.
The nuance
The statistic (r=−0.443 → growth) is patient-mix-confounded — can't promise growth uplift from cutting the rate. But the operational response is plainly OM-actionable. Different ask.
What recovers
The protocol doesn't recover the lost-show patient — it recovers the chair that would otherwise sit empty. Park-list patient or wave-booked patient fills the slot.
Finding 4 · Rework Tax Bracket
"$73.7M annualized rework tax" is the ceiling. The defensible floor is $5.9M.
Floor-to-ceiling, with the reason
$5.9M  →  $73.7M
Quant B's top-line $73.7M is 70% one component ($51.7M of PBI's "risk_notcomp_dollars") with an unverified time window and possible overlap with the other two components. Codex pulls that out and recomputes the conservative floor.

Three components, three confidence levels

What's in the $73.7M and how much we trust each

ComponentAnnualizedType
A. Front-office rework labor$7.1Mrate-card
B. Chair idle (lost slot)$14.9Mrate-card
C. Risk non-completion ($)$51.7Munverified window
Total$73.7Mmixed
Codex defensible floor (drop C, stricter A/B)
Handler time: 10 min × 1 person × $20/hr
$2.0M
Chair idle: $125/hr × 25% unfilled
$3.9M
Defensible floor (annual)
$5.9M

What "moving worst quartile to median" actually recovers

The realistic recovery scenario

Quant B
$6.0M annual recovery · 1.1% of organic-GP production.
Codex floor
$0.66M annual recovery · 0.13% of production · before any risk-NC recovery.
Worst-quartile n
47 practices, churn ≥32.2%. Target = median 26.2%. Roughly 6 pp improvement per practice.
Our posture
Publish the floor. Show the ceiling as upper-bound in the drill rail. Don't quote $73.7M in any exec deck.
Open validation: confirm risk_notcomp_dollars time window with PBI dataset owner. If it's not 90-day, the ×4.06 annualization is wrong by definition.
Finding 4c · Validation #3 Done
After controlling for regional director: only risk-NC survives. Confirmation rate is partly a manager-quality effect.
Brand/regional fixed-effects regression
Risk-NC survives. Confirmation barely doesn't.
Re-ran the headline regressions with dummies for each ROD (14 regional directors with n≥5). Risk_notcomp_rate keeps its β=−0.38, p<0.001 even with regional director controlled. Confirmation rate drops from p=0.006 → p=0.083 — meaning the cross-sectional confirmation signal is partly "good managers run good practices" not pure "confirmation chain → growth."

Fixed-effects regression results

Before vs after ROD controls (n=189 organic GP)

Metric → YoY growthβ no-FEp no-FEβ +FEp +FE
risk_notcomp_rate−0.461<0.001−0.383<0.001
confirmed_pct+0.2650.006+0.1860.083
churn_90d (composite)−0.1100.444+0.0310.848
RODs add 7.9–8.4% R² across the three models. Regional/manager variance is a real chunk of what the cross-sectional analysis was attributing to operational metrics.

Bonus finding: "Good operator" effect

Three RODs persistently positive across ALL three models

RODβ rangep
Libby Knopp+0.15 to +0.210.01–0.04
Kim Miller+0.14 to +0.160.04–0.05
Leah Grevious+0.16 to +0.31<0.001–0.06
After controlling for confirmation, risk-NC, AND size, these three RODs still beat baseline. That's an operator-quality signal worth surfacing as a separate thread — what are they doing the data doesn't capture? (Huddle cadence? OM coaching? Recall discipline?)
Finding 4b · The Move That Pays
Move the worst-quartile 47 practices from 32.2% churn → median 26.2%.
Annualized recovery, decomposed
$6.0M target · $0.66M defensible floor
47 practices, 6 percentage points of churn each. Full Quant B model puts annual recovery at $6.0M (~$128k/practice/yr, 1.1% of organic-GP production). Codex stricter floor (no risk-NC dollars, conservative rate-card) puts it at $0.66M (~$14k/practice/yr).

Decomposition (Quant B full model)

Where the $6.0M lives, by component

Component90-dayAnnualized
Rework labor saved$185k$750k
Chair recovered$428k$1.74M
Risk-NC reduction$878k$3.56M
Total$1.49M$6.0M
Per practice / year$31.7k~$128k
~1.1% of organic-GP network production. Risk-NC is the largest single line — also the most defensibility-fragile (Codex flagged unverified time window).

Codex defensible floor

If you drop risk-NC and tighten the rate-card

Drop C
Exclude risk-NC dollars entirely until PBI time-window verified.
Tighter A
10 min × 1 handler × $20/hr (vs 15 min × 2 × $24/hr).
Tighter B
$125/hr chair margin × 25% unfilled (vs $200/hr × 60%).
Floor result
$0.66M annual recovery · ~$14k per practice · 0.13% of production.
Recommendation: cite $6M target with $0.66M floor, decompose components in drill rail, label risk-NC "pending PBI time-window verification."
Finding 5 · Component Split
73% of churned appointments are rescheduled, not no-shows or hard cancels.
The shape of the chaos
Reschedule 73% · Cancel 15% · No-show 13% · Patient-fault 0.3%
When we decompose churn into its components: most "broken" appointments aren't patients ghosting — they're appointments getting moved. That's a protocol problem (recall + confirmation cadence), not a "fire the no-show patient" problem.
Patient-fault cancels
0.3%
Nearly nothing. Churn is mostly practice-side.
In plain English

If you ask an OM "what's killing your schedule," you'll probably hear "no-shows" or "flaky patients." The data says no.

Three out of every four churned appointments are moves — same patient, different day. That's a workflow problem (recall, confirmation, reschedule cadence) the practice fully owns. Patient-fault cancels are literally 0.3% of the SAPS denominator — basically nothing.

Caveat: only 83 of 188 practices have complete component decomposition. The other 105 have PMS-coding gaps that make no-show / cancel counts unreliable.

Finding 6 · Mechanism
High churn forces MORE staff. Fix churn first — labor rightsizes naturally.
Direction-of-causation finding · sequencing constraint
Protocol fix → labor savings  (NOT the other way)
High-churn practices have to staff UP to handle the rework. If we cut staff before fixing the churn protocol, the same load lands on fewer people — burnout, turnover, culture damage. Sequencing matters: fix the protocol, let FO load naturally decline, THEN consider staffing model.

Front-office staffing density vs churn

More FO staff → MORE churn (not less)

Staffing tier (FO FTE / 1k active pts)Mean churn 90d
T1 — low (0.10 FTE/1k)18.1%
T2 — mid (0.18 FTE/1k)23.7%
T3 — high (0.29 FTE/1k)22.1%
"Add a front-desk hire to fix the churn" is the wrong move — the data points the opposite direction. Pearson r=+0.307 (n=43).

Hours per delivered visit, by churn quartile

FO hours scale with chaos (rework tax in person-hours)

Churn quartileFO h / visit
Q1 — lowest churn (11.7%)0.29
Q2 (18.2%)0.41
Q3 (24.3%)0.40
Q4 — highest churn (30.4%)0.47
Total hours per visit stays flat across quartiles (1.55–1.67) — the chair side doesn't scale. Only front-office does. That's the rework tax made visible in person-hours.
Sequencing rule (Scott)

Do NOT cut FO headcount before the churn protocol is fixed.

The current FO load is REAL work — it's the rework tax being absorbed by people. Pulling headcount first lands the same load on fewer staff → burnout, turnover, culture damage. Sequence: fix protocol (T13 + T14 + T15) → FO load naturally declines → THEN revisit staffing model practice-by-practice.

Finding 6b · Validation #2 Done
Within-practice over time: confirmation lever is real. Same manager, different month — when confirmation rises, NP follows.
The test that resists the manager-quality confound
+$826/month NP gap · improving vs worsening confirmation
Computed the SLOPE of confirmation rate and NP within each practice over the last 5–12 months. Same OM, same ROD, different month. The within-practice signal eliminates the "good operator" confound that killed the cross-sectional confirmed_pct in the FE check. Both confirmation AND composite churn show real within-practice predictive power.

Confirmation slope → NP slope (n=179)

Big within-practice dose-response

Confirmation trendNP slope ($/mo)n
T1 worsening−$1559
T2 stable+$26561
T3 improving+$81159
Gap: $826/month NP between extremes
r=+0.150 over concurrent 5-month window. Manager-invariant because each practice is its own control.

Churn slope → NP slope (n=189, 12mo)

Composite churn IS predictive within-practice

Churn trendNP slope ($/mo)n
T1 churn going down+$27463
T2 stable+$7063
T3 churn going up−$12363
Gap: $397/month NP between extremes
Cross-sectional was r=−0.13 (weak, killed by FE). Within-practice r=−0.18. The signal lives in WITHIN-practice change, not cross-practice level.
Updated confidence — three tiles

T14 confirmation: Cross-section + FE killed it (p=0.083). Within-practice rescued it (+$826/mo gap). Back to High operational confidence.

T15 risk-NC: Already survived FE. Holds at High.

T13 composite churn: Cross-section weak, FE null, BUT within-practice r=−0.18 with $397/mo gap. Elevates from Low to Medium. Composite is fine as a within-practice trend tile, not as a cross-practice ranking tile.

Finding 7 · Decision
What we change in Daybreak this week.
From one tile to three — each with the right job
T13 (reframe)  ·  T14 (new)  ·  T15 (new)
One tile can't carry "rework flag," "growth lever," and "patient-mix marker" all at once. Split them so each lights up the right team with the right ask.

Moves

MoveWhyConfidence
Ship T13 Patient Appointment Churn as-is — but reframeOperational rework-flag, not the growth lever. Update validation_status copy in catalog.yaml.High
Add T14 Confirmation Discipline tileThe actual front-desk lever per Quant A. Bands: ≥90% great · 80-90% good · 70-80% warn · <70% critical.Medium
Add T15 Risk-Patient Non-Completion — active OM tileOperational protocol when risk-NC patient is on today's schedule: park-list standby, wave double-book, hyper-confirm chain, pre-collect for cash. Don't quote dollar-recovery, just protect the chair.High
Drop the $280/visit rate-card in the threshold formulaReplace with Codex's defensible floor: excess × saps × $24/hr × 0.25hr × 2. Full model in drill rail as upper bound.High
Front-office FTE NOT in any thresholdDirection-of-causation unresolved; labor coverage only 56 practices.High
No FO headcount cuts until churn protocol stabilizesCurrent FO load is real (rework absorbed by people). Cut first = burnout + turnover + culture damage. Sequence: protocol fix → load declines → revisit staffing model.High
Finding 8 · What's Next
Five validations we need before "exploratory" becomes "validated."
Honest open work
5 validations, ranked by leverage
The big surviving claim — confirmation rate predicts growth — is statistically the strongest thing we found. But it's cross-sectional and brand-confounded. These five tests resolve the open questions before we promote any tile from exploratory to validated.
Open validation queue
  1. Confirm risk_notcomp_dollars time window with the PBI dataset owner. If it's not 90-day, the $51.7M annualization is wrong by definition. (Highest leverage — unlocks the dollar-magnitude story.)
  2. Within-practice longitudinal test: does confirmation_pct change in month N predict growth change in month N+12 inside the same practice? Resolves reverse-causation for the T14 lever.
  3. Brand / PMS / payor fixed-effects regression on confirmed_pct → growth and risk_notcomp_rate → growth. The cross-sectional signal could be brand-confounded; this rules it out.
  4. Out-of-sample replication of confirmed_pct → growth on a separate trailing window (e.g., prior 365d). Required before T14 promotes from exploratory to active.
  5. Labor coverage expansion beyond Accelerate brand — extend punches to Parks Pace / SGA East so the FO-staffing-direction claim has n>100. (Currently n=43, exploratory only.)
The honest summary

We built a tile yesterday based on a real signal. The deep dive shows the signal is real but smaller, in a different place than we thought, and entangled with two other independent failure modes.

The fix is to ship three tiles, not one — and to publish a defensible $5.9M floor instead of a $73.7M ceiling we can't yet defend. Everything else queues into the validation list above.