Patient Appointment Churn — Deep Dive

The question

Is "Patient Appointment Churn" really the lever we thought?

Run two independent quant analysts + GPT-5 Codex adversarial review on the data. Find out what's real, what's overclaim, and where the actual front-desk lever lives.

Practices analyzed

214

Organic general practice (specialty, denovo, partial-year excluded by rule).

Reviewers

3

Two Claude data scientists + one GPT-5 Codex adversarial pass.

Tests run

30+

Per-component correlations, dose-response, dollar models, labor regression.

What's coming in the next 10 slides

Ten findings, each in plain English. Some confirm what we built. Some change it. One we have to retire.

Slide 2 = the headline reframe. Slides 3-5 = the data + fixed-effects validation. Slide 6 = sizing. Slide 7 = component split. Slide 8 = mechanism. Slides 9-10 = decision + what's next.

Yesterday vs today

~1 pp growth lift, not 3.4 pp

Cutting a worst-quartile practice's churn down to the network median lifts annual growth ~1 percentage point. The prior 3.4 pp estimate didn't survive controlling for practice size.

Raw correlation

−0.128

Pearson r, churn vs YoY growth (n=188). p=0.080 — borderline.

After size control

p = 0.55

Big practices grow more and have different churn — that's most of the signal.

Bootstrap 95% CI

[−0.31, +0.05]

Includes zero. We can't rule out that the true effect is nothing.

In plain English

Yesterday we said "if you cut a high-churn practice's churn down to median, you'll see +3.4pp more annual growth." That number came from a simple bivariate fit.

Today's tighter analysis controls for practice size — and most of yesterday's signal disappears. The honest number is closer to +1 pp. Still positive. Still real. Just much smaller.

Bottom line: the composite SAPS churn metric is fine as a "is this practice in chaos?" flag — but stop selling it as the network's biggest growth lever. It's not.

The winner among 5 churn components

+10 pp confirmation rate → +2.5 pp YoY growth

Only metric with a statistically clean signal after size control. Survives Bonferroni correction (just barely). The other components — no-show rate, cancel rate, patient-fault cancels — show nothing usable.

Confirmed_pct

+0.231

p=0.002. ΔR²=+3.4%. The signal.

No-show rate

−0.088

p=0.33. Conflicting Pearson/Spearman. Noise.

Cancel rate

+0.069

p=0.52. Wrong sign. Probably a PMS artifact.

Patient-fault %

+0.003

p=0.97. 77% of practices report literal zero. Unusable.

In plain English

We checked 5 different ways to measure churn. Only one — confirmation rate (% of booked appointments the patient confirmed before the day) — actually moves with growth.

Practice with 75% confirmation grows ~2.5 pp/year faster than one with 65%. Front desk can directly change this — it's the 3-day / 1-day / 2-hour reminder chain.

Codex caveat: this is "the strongest observed association," not yet "proven causation." A practice with great managers probably confirms more AND grows more — could be a manager-quality story. Out-of-sample test before promoting tile to "active."

The strongest growth predictor in the entire dataset

r = −0.443 · p < 0.001

The Dental Intel "risk_notcomp_rate" (risk-flagged appointments that didn't complete ÷ total risk-flagged) is the cleanest signal we've ever measured. But it's not the same as SAPS churn — and not the same lever.

How the two relate

Near-orthogonal — independent problems

What we measured	Pearson r
SAPS churn ↔ risk non-completion	+0.109
Shared variance (R²)	1.2%
Risk non-completion → growth	−0.443
SAPS churn → growth (after controls)	−0.012

When you add risk non-completion to the growth model, the SAPS churn signal collapses to near-zero. Two different failure modes — not one cause and one symptom.

What that means in practice

Two separate questions — keep them separate

SAPS churn

Front-desk confirmation discipline. Reminder chain, recall workflows. OM owns it via T14.

Risk non-comp

Protect-the-chair protocol. When PMS flags risk-NC patient on today's schedule: park-list standby, wave double-book, hyper-confirm chain (7d/3d/1d/90min/same-day call), pre-collect for cash plans.

The nuance

The statistic (r=−0.443 → growth) is patient-mix-confounded — can't promise growth uplift from cutting the rate. But the operational response is plainly OM-actionable. Different ask.

What recovers

The protocol doesn't recover the lost-show patient — it recovers the chair that would otherwise sit empty. Park-list patient or wave-booked patient fills the slot.

Floor-to-ceiling, with the reason

$5.9M → $73.7M

Quant B's top-line $73.7M is 70% one component ($51.7M of PBI's "risk_notcomp_dollars") with an unverified time window and possible overlap with the other two components. Codex pulls that out and recomputes the conservative floor.

Three components, three confidence levels

What's in the $73.7M and how much we trust each

Component	Annualized	Type
A. Front-office rework labor	$7.1M	rate-card
B. Chair idle (lost slot)	$14.9M	rate-card
C. Risk non-completion ($)	$51.7M	unverified window
Total	$73.7M	mixed

Codex defensible floor (drop C, stricter A/B)

Handler time: 10 min × 1 person × $20/hr

$2.0M

Chair idle: $125/hr × 25% unfilled

$3.9M

Defensible floor (annual)

$5.9M

What "moving worst quartile to median" actually recovers

The realistic recovery scenario

Quant B

$6.0M annual recovery · 1.1% of organic-GP production.

Codex floor

$0.66M annual recovery · 0.13% of production · before any risk-NC recovery.

Worst-quartile n

47 practices, churn ≥32.2%. Target = median 26.2%. Roughly 6 pp improvement per practice.

Our posture

Publish the floor. Show the ceiling as upper-bound in the drill rail. Don't quote $73.7M in any exec deck.

Open validation: confirm risk_notcomp_dollars time window with PBI dataset owner. If it's not 90-day, the ×4.06 annualization is wrong by definition.

Brand/regional fixed-effects regression

Risk-NC survives. Confirmation barely doesn't.

Re-ran the headline regressions with dummies for each ROD (14 regional directors with n≥5). Risk_notcomp_rate keeps its β=−0.38, p<0.001 even with regional director controlled. Confirmation rate drops from p=0.006 → p=0.083 — meaning the cross-sectional confirmation signal is partly "good managers run good practices" not pure "confirmation chain → growth."

Fixed-effects regression results

Before vs after ROD controls (n=189 organic GP)

Metric → YoY growth	β no-FE	p no-FE	β +FE	p +FE
risk_notcomp_rate	−0.461	<0.001	−0.383	<0.001
confirmed_pct	+0.265	0.006	+0.186	0.083
churn_90d (composite)	−0.110	0.444	+0.031	0.848

RODs add 7.9–8.4% R² across the three models. Regional/manager variance is a real chunk of what the cross-sectional analysis was attributing to operational metrics.

Bonus finding: "Good operator" effect

Three RODs persistently positive across ALL three models

ROD	β range	p
Libby Knopp	+0.15 to +0.21	0.01–0.04
Kim Miller	+0.14 to +0.16	0.04–0.05
Leah Grevious	+0.16 to +0.31	<0.001–0.06

After controlling for confirmation, risk-NC, AND size, these three RODs still beat baseline. That's an operator-quality signal worth surfacing as a separate thread — what are they doing the data doesn't capture? (Huddle cadence? OM coaching? Recall discipline?)

Annualized recovery, decomposed

$6.0M target · $0.66M defensible floor

47 practices, 6 percentage points of churn each. Full Quant B model puts annual recovery at $6.0M (~$128k/practice/yr, 1.1% of organic-GP production). Codex stricter floor (no risk-NC dollars, conservative rate-card) puts it at $0.66M (~$14k/practice/yr).

Decomposition (Quant B full model)

Where the $6.0M lives, by component

Component	90-day	Annualized
Rework labor saved	$185k	$750k
Chair recovered	$428k	$1.74M
Risk-NC reduction	$878k	$3.56M
Total	$1.49M	$6.0M
Per practice / year	$31.7k	~$128k

~1.1% of organic-GP network production. Risk-NC is the largest single line — also the most defensibility-fragile (Codex flagged unverified time window).

Codex defensible floor

If you drop risk-NC and tighten the rate-card

Drop C

Exclude risk-NC dollars entirely until PBI time-window verified.

Tighter A

10 min × 1 handler × $20/hr (vs 15 min × 2 × $24/hr).

Tighter B

$125/hr chair margin × 25% unfilled (vs $200/hr × 60%).

Floor result

$0.66M annual recovery · ~$14k per practice · 0.13% of production.

Recommendation: cite $6M target with $0.66M floor, decompose components in drill rail, label risk-NC "pending PBI time-window verification."

The shape of the chaos

Reschedule 73% · Cancel 15% · No-show 13% · Patient-fault 0.3%

When we decompose churn into its components: most "broken" appointments aren't patients ghosting — they're appointments getting moved. That's a protocol problem (recall + confirmation cadence), not a "fire the no-show patient" problem.

Rescheduled

73.1%

The appointment moves. Workflow + recall.

Canceled (PMS)

14.5%

Hard cancel inside PMS. Real loss.

No-show

12.7%

Patient doesn't show. The classic case.

Patient-fault cancels

0.3%

Nearly nothing. Churn is mostly practice-side.

In plain English

If you ask an OM "what's killing your schedule," you'll probably hear "no-shows" or "flaky patients." The data says no.

Three out of every four churned appointments are moves — same patient, different day. That's a workflow problem (recall, confirmation, reschedule cadence) the practice fully owns. Patient-fault cancels are literally 0.3% of the SAPS denominator — basically nothing.

Caveat: only 83 of 188 practices have complete component decomposition. The other 105 have PMS-coding gaps that make no-show / cancel counts unreliable.

Direction-of-causation finding · sequencing constraint

Protocol fix → labor savings (NOT the other way)

High-churn practices have to staff UP to handle the rework. If we cut staff before fixing the churn protocol, the same load lands on fewer people — burnout, turnover, culture damage. Sequencing matters: fix the protocol, let FO load naturally decline, THEN consider staffing model.

Front-office staffing density vs churn

More FO staff → MORE churn (not less)

Staffing tier (FO FTE / 1k active pts)	Mean churn 90d
T1 — low (0.10 FTE/1k)	18.1%
T2 — mid (0.18 FTE/1k)	23.7%
T3 — high (0.29 FTE/1k)	22.1%

"Add a front-desk hire to fix the churn" is the wrong move — the data points the opposite direction. Pearson r=+0.307 (n=43).

Hours per delivered visit, by churn quartile

FO hours scale with chaos (rework tax in person-hours)

Churn quartile	FO h / visit
Q1 — lowest churn (11.7%)	0.29
Q2 (18.2%)	0.41
Q3 (24.3%)	0.40
Q4 — highest churn (30.4%)	0.47

Total hours per visit stays flat across quartiles (1.55–1.67) — the chair side doesn't scale. Only front-office does. That's the rework tax made visible in person-hours.

Sequencing rule (Scott)

Do NOT cut FO headcount before the churn protocol is fixed.

The current FO load is REAL work — it's the rework tax being absorbed by people. Pulling headcount first lands the same load on fewer staff → burnout, turnover, culture damage. Sequence: fix protocol (T13 + T14 + T15) → FO load naturally declines → THEN revisit staffing model practice-by-practice.

The test that resists the manager-quality confound

+$826/month NP gap · improving vs worsening confirmation

Computed the SLOPE of confirmation rate and NP within each practice over the last 5–12 months. Same OM, same ROD, different month. The within-practice signal eliminates the "good operator" confound that killed the cross-sectional confirmed_pct in the FE check. Both confirmation AND composite churn show real within-practice predictive power.

Confirmation slope → NP slope (n=179)

Big within-practice dose-response

Confirmation trend	NP slope ($/mo)	n
T1 worsening	−$15	59
T2 stable	+$265	61
T3 improving	+$811	59

Gap: $826/month NP between extremes

r=+0.150 over concurrent 5-month window. Manager-invariant because each practice is its own control.

Churn slope → NP slope (n=189, 12mo)

Composite churn IS predictive within-practice

Churn trend	NP slope ($/mo)	n
T1 churn going down	+$274	63
T2 stable	+$70	63
T3 churn going up	−$123	63

Gap: $397/month NP between extremes

Cross-sectional was r=−0.13 (weak, killed by FE). Within-practice r=−0.18. The signal lives in WITHIN-practice change, not cross-practice level.

Updated confidence — three tiles

T14 confirmation: Cross-section + FE killed it (p=0.083). Within-practice rescued it (+$826/mo gap). Back to High operational confidence.

T15 risk-NC: Already survived FE. Holds at High.

T13 composite churn: Cross-section weak, FE null, BUT within-practice r=−0.18 with $397/mo gap. Elevates from Low to Medium. Composite is fine as a within-practice trend tile, not as a cross-practice ranking tile.

From one tile to three — each with the right job

T13 (reframe) · T14 (new) · T15 (new)

One tile can't carry "rework flag," "growth lever," and "patient-mix marker" all at once. Split them so each lights up the right team with the right ask.

Moves

Move	Why	Confidence
Ship T13 Patient Appointment Churn as-is — but reframe	Operational rework-flag, not the growth lever. Update validation_status copy in catalog.yaml.	High
Add T14 Confirmation Discipline tile	The actual front-desk lever per Quant A. Bands: ≥90% great · 80-90% good · 70-80% warn · <70% critical.	Medium
Add T15 Risk-Patient Non-Completion — active OM tile	Operational protocol when risk-NC patient is on today's schedule: park-list standby, wave double-book, hyper-confirm chain, pre-collect for cash. Don't quote dollar-recovery, just protect the chair.	High
Drop the $280/visit rate-card in the threshold formula	Replace with Codex's defensible floor: `excess × saps × $24/hr × 0.25hr × 2`. Full model in drill rail as upper bound.	High
Front-office FTE NOT in any threshold	Direction-of-causation unresolved; labor coverage only 56 practices.	High
No FO headcount cuts until churn protocol stabilizes	Current FO load is real (rework absorbed by people). Cut first = burnout + turnover + culture damage. Sequence: protocol fix → load declines → revisit staffing model.	High

Honest open work

5 validations, ranked by leverage

The big surviving claim — confirmation rate predicts growth — is statistically the strongest thing we found. But it's cross-sectional and brand-confounded. These five tests resolve the open questions before we promote any tile from exploratory to validated.

Open validation queue

Confirm risk_notcomp_dollars time window with the PBI dataset owner. If it's not 90-day, the $51.7M annualization is wrong by definition. (Highest leverage — unlocks the dollar-magnitude story.)
Within-practice longitudinal test: does confirmation_pct change in month N predict growth change in month N+12 inside the same practice? Resolves reverse-causation for the T14 lever.
Brand / PMS / payor fixed-effects regression on confirmed_pct → growth and risk_notcomp_rate → growth. The cross-sectional signal could be brand-confounded; this rules it out.
Out-of-sample replication of confirmed_pct → growth on a separate trailing window (e.g., prior 365d). Required before T14 promotes from exploratory to active.
Labor coverage expansion beyond Accelerate brand — extend punches to Parks Pace / SGA East so the FO-staffing-direction claim has n>100. (Currently n=43, exploratory only.)

The honest summary

We built a tile yesterday based on a real signal. The deep dive shows the signal is real but smaller, in a different place than we thought, and entangled with two other independent failure modes.

The fix is to ship three tiles, not one — and to publish a defensible $5.9M floor instead of a $73.7M ceiling we can't yet defend. Everything else queues into the validation list above.