Patient Appointment Churn — Deep Dive

The question

Is "Patient Appointment Churn" really the lever we thought?

Run two independent quant analysts + GPT-5 Codex adversarial review on the data. Find out what's real, what's overclaim, and where the actual front-desk lever lives.

Practices analyzed

214

Organic general practice (specialty, denovo, partial-year excluded by rule).

Reviewers

3

Two Claude data scientists + one GPT-5 Codex adversarial pass.

Tests run

30+

Per-component correlations, dose-response, dollar models, labor regression.

What's coming in the next 8 slides

Eight findings, each in plain English. Some confirm what we built. Some change it. One we have to retire.

Slide 2 = the headline reframe. Slides 3-5 = what the data actually says. Slide 6 = the dollar question. Slide 7 = the mechanism. Slides 8-9 = what to do this week.

Yesterday vs today

~1 pp growth lift, not 3.4 pp

Cutting a worst-quartile practice's churn down to the network median lifts annual growth ~1 percentage point. The prior 3.4 pp estimate didn't survive controlling for practice size.

Raw correlation

−0.128

Pearson r, churn vs YoY growth (n=188). p=0.080 — borderline.

After size control

p = 0.55

Big practices grow more and have different churn — that's most of the signal.

Bootstrap 95% CI

[−0.31, +0.05]

Includes zero. We can't rule out that the true effect is nothing.

In plain English

Yesterday we said "if you cut a high-churn practice's churn down to median, you'll see +3.4pp more annual growth." That number came from a simple bivariate fit.

Today's tighter analysis controls for practice size — and most of yesterday's signal disappears. The honest number is closer to +1 pp. Still positive. Still real. Just much smaller.

Bottom line: the composite SAPS churn metric is fine as a "is this practice in chaos?" flag — but stop selling it as the network's biggest growth lever. It's not.

The winner among 5 churn components

+10 pp confirmation rate → +2.5 pp YoY growth

Only metric with a statistically clean signal after size control. Survives Bonferroni correction (just barely). The other components — no-show rate, cancel rate, patient-fault cancels — show nothing usable.

Confirmed_pct

+0.231

p=0.002. ΔR²=+3.4%. The signal.

No-show rate

−0.088

p=0.33. Conflicting Pearson/Spearman. Noise.

Cancel rate

+0.069

p=0.52. Wrong sign. Probably a PMS artifact.

Patient-fault %

+0.003

p=0.97. 77% of practices report literal zero. Unusable.

In plain English

We checked 5 different ways to measure churn. Only one — confirmation rate (% of booked appointments the patient confirmed before the day) — actually moves with growth.

Practice with 75% confirmation grows ~2.5 pp/year faster than one with 65%. Front desk can directly change this — it's the 3-day / 1-day / 2-hour reminder chain.

Codex caveat: this is "the strongest observed association," not yet "proven causation." A practice with great managers probably confirms more AND grows more — could be a manager-quality story. Out-of-sample test before promoting tile to "active."

The strongest growth predictor in the entire dataset

r = −0.443 · p < 0.001

The Dental Intel "risk_notcomp_rate" (risk-flagged appointments that didn't complete ÷ total risk-flagged) is the cleanest signal we've ever measured. But it's not the same as SAPS churn — and not the same lever.

How the two relate

Near-orthogonal — independent problems

What we measured	Pearson r
SAPS churn ↔ risk non-completion	+0.109
Shared variance (R²)	1.2%
Risk non-completion → growth	−0.443
SAPS churn → growth (after controls)	−0.012

When you add risk non-completion to the growth model, the SAPS churn signal collapses to near-zero. Two different failure modes — not one cause and one symptom.

What that means in practice

Different teams, different protocols

SAPS churn

Front-desk lever. Confirmation discipline, recall workflows, reminder chain. OM owns it.

Risk non-comp

Patient-mix marker. Reflects the practice's payor mix, demographic, clinical access. Not directly fixable by the front desk.

Codex view

Risk non-completion is a diagnostic, not a recovery promise. Surface to ROD; don't put on OM huddle board.

Falsifiable?

Yes — payor mix + zip-income controls + lagged capacity. If r=−0.443 survives those, it really is a clinical/access lever and not just a mix marker.

Floor-to-ceiling, with the reason

$5.9M → $73.7M

Quant B's top-line $73.7M is 70% one component ($51.7M of PBI's "risk_notcomp_dollars") with an unverified time window and possible overlap with the other two components. Codex pulls that out and recomputes the conservative floor.

Three components, three confidence levels

What's in the $73.7M and how much we trust each

Component	Annualized	Type
A. Front-office rework labor	$7.1M	rate-card
B. Chair idle (lost slot)	$14.9M	rate-card
C. Risk non-completion ($)	$51.7M	unverified window
Total	$73.7M	mixed

Codex defensible floor (drop C, stricter A/B)

Handler time: 10 min × 1 person × $20/hr

$2.0M

Chair idle: $125/hr × 25% unfilled

$3.9M

Defensible floor (annual)

$5.9M

What "moving worst quartile to median" actually recovers

The realistic recovery scenario

Quant B

$6.0M annual recovery · 1.1% of organic-GP production.

Codex floor

$0.66M annual recovery · 0.13% of production · before any risk-NC recovery.

Worst-quartile n

47 practices, churn ≥32.2%. Target = median 26.2%. Roughly 6 pp improvement per practice.

Our posture

Publish the floor. Show the ceiling as upper-bound in the drill rail. Don't quote $73.7M in any exec deck.

Open validation: confirm risk_notcomp_dollars time window with PBI dataset owner. If it's not 90-day, the ×4.06 annualization is wrong by definition.

The shape of the chaos

Reschedule 73% · Cancel 15% · No-show 13% · Patient-fault 0.3%

When we decompose churn into its components: most "broken" appointments aren't patients ghosting — they're appointments getting moved. That's a protocol problem (recall + confirmation cadence), not a "fire the no-show patient" problem.

Rescheduled

73.1%

The appointment moves. Workflow + recall.

Canceled (PMS)

14.5%

Hard cancel inside PMS. Real loss.

No-show

12.7%

Patient doesn't show. The classic case.

Patient-fault cancels

0.3%

Nearly nothing. Churn is mostly practice-side.

In plain English

If you ask an OM "what's killing your schedule," you'll probably hear "no-shows" or "flaky patients." The data says no.

Three out of every four churned appointments are moves — same patient, different day. That's a workflow problem (recall, confirmation, reschedule cadence) the practice fully owns. Patient-fault cancels are literally 0.3% of the SAPS denominator — basically nothing.

Caveat: only 83 of 188 practices have complete component decomposition. The other 105 have PMS-coding gaps that make no-show / cancel counts unreliable.

Direction-of-causation finding

More staff ≠ less churn (it's the other way)

On the 43 practices where we have both labor + churn data: high-staffed practices have HIGHER churn, and front-office hours per delivered visit RISE with churn. Reverse causation. The chaos is creating the labor load, not the other way around.

Front-office staffing density vs churn

More FO staff → MORE churn (not less)

Staffing tier (FO FTE / 1k active pts)	Mean churn 90d
T1 — low (0.10 FTE/1k)	18.1%
T2 — mid (0.18 FTE/1k)	23.7%
T3 — high (0.29 FTE/1k)	22.1%

"Add a front-desk hire to fix the churn" is the wrong move — the data points the opposite direction. Pearson r=+0.307 (n=43).

Hours per delivered visit, by churn quartile

FO hours scale with chaos (rework tax in person-hours)

Churn quartile	FO h / visit
Q1 — lowest churn (11.7%)	0.29
Q2 (18.2%)	0.41
Q3 (24.3%)	0.40
Q4 — highest churn (30.4%)	0.47

Total hours per visit stays flat across quartiles (1.55–1.67) — the chair side doesn't scale. Only front-office does. That's the rework tax made visible in person-hours.

From one tile to three — each with the right job

T13 (reframe) · T14 (new) · T15 (new)

One tile can't carry "rework flag," "growth lever," and "patient-mix marker" all at once. Split them so each lights up the right team with the right ask.

Moves

Move	Why	Confidence
Ship T13 Patient Appointment Churn as-is — but reframe	Operational rework-flag, not the growth lever. Update validation_status copy in catalog.yaml.	High
Add T14 Confirmation Discipline tile	The actual front-desk lever per Quant A. Bands: ≥90% great · 80-90% good · 70-80% warn · <70% critical.	Medium
Add T15 Risk-Patient Non-Completion (diagnostic only)	Strongest signal but a marker, not a lever. Surface to ROD for context; no OM action.	Medium
Drop the $280/visit rate-card in the threshold formula	Replace with Codex's defensible floor: `excess × saps × $24/hr × 0.25hr × 2`. Full model in drill rail as upper bound.	High
Front-office FTE NOT in any threshold	Direction-of-causation unresolved; labor coverage only 56 practices.	High

Honest open work

5 validations, ranked by leverage

The big surviving claim — confirmation rate predicts growth — is statistically the strongest thing we found. But it's cross-sectional and brand-confounded. These five tests resolve the open questions before we promote any tile from exploratory to validated.

Open validation queue

Confirm risk_notcomp_dollars time window with the PBI dataset owner. If it's not 90-day, the $51.7M annualization is wrong by definition. (Highest leverage — unlocks the dollar-magnitude story.)
Within-practice longitudinal test: does confirmation_pct change in month N predict growth change in month N+12 inside the same practice? Resolves reverse-causation for the T14 lever.
Brand / PMS / payor fixed-effects regression on confirmed_pct → growth and risk_notcomp_rate → growth. The cross-sectional signal could be brand-confounded; this rules it out.
Out-of-sample replication of confirmed_pct → growth on a separate trailing window (e.g., prior 365d). Required before T14 promotes from exploratory to active.
Labor coverage expansion beyond Accelerate brand — extend punches to Parks Pace / SGA East so the FO-staffing-direction claim has n>100. (Currently n=43, exploratory only.)

The honest summary

We built a tile yesterday based on a real signal. The deep dive shows the signal is real but smaller, in a different place than we thought, and entangled with two other independent failure modes.

The fix is to ship three tiles, not one — and to publish a defensible $5.9M floor instead of a $73.7M ceiling we can't yet defend. Everything else queues into the validation list above.