Methodology

How the NHL player value model works

This is the hockey analytics layer behind every prediction. For each player the model finds five close NHL contracts, weights them by similarity, and returns the weighted average AAV. No black-box regression, no opaque coefficients, just hockey data models built from real signed deals.

01 · Pipeline

Four steps

Every step is deterministic. Drop a player in, get five real comparables and a number out.

Step 01
K-means clustering
Seven unsupervised clusters group every active skater by deployment (TOI, PP points, plus/minus, faceoff%, position). Centroids decide the labels: Elite, Top-Line F, Middle-Six F, Bottom-Six F, Top-Four D, Bottom-Pair D, Two-Way / Shutdown.
Step 02
Performance score
Z-scored production inside each cluster, scaled to roughly +-100. Forwards score on G/60, P/60, PP pts, shots, sh%, plus/minus. Defensemen on TOI/GP, PP pts, plus/minus, sh%. The score is the tier signal the comp engine actually uses.
Step 03
Find five closest contracts
Same-cluster first, inside progressively wider score bands (+-30, +-60, +-100, +-200). The search stops once two quality peers are found rather than padding with diluted matches. Cross-cluster fill only kicks in when same-cluster is genuinely empty.
Step 04
Weighted-average AAV
Each comp's cap hit is weighted by 1/(1 + distance), with UFA contracts multiplied 1.5x because those are the cleanest free-market signal. That weighted average is the player's predicted value. Delta = predicted minus actual cap hit.

02 · Distance

How comps are ranked

Distance is a weighted combination of four normalized features. Weights favor production rate over biographical signals so a 34-year-old star still comps against 28-year-old stars instead of 34-year-old depth players.

Distance metric

Weights sum to 1.00

Points / 60

35%

Performance score

30%

Power-play points

25%

Age

10%

Points / 60. Tightest signal of scoring rate. A 0.3 P/60 gap is worth ~$2-3M of AAV.

Performance score. Within-cluster z-score. Anchors comps to the player's actual production tier.

Power-play points. Separates PP drivers from even-strength role players who share gross P/60.

Age. Kept low so a 34-year-old star still comps against 28-year-old stars instead of 34-year-old depth.

03 · Roles

Cluster distribution

Players bucketed into the seven clusters. The comp engine searches the same cluster first; it only expands cross-cluster when in-cluster comps are genuinely empty.

Elite

mean score -11.1 · mean cap $7.5M

Top-Line F

100

mean score -3.4 · mean cap $5.5M

Middle-Six F

176

mean score -21.8 · mean cap $3.0M

Bottom-Six F

195

mean score 3.4 · mean cap $1.7M

Top-Four D

mean score 15.0 · mean cap $7.1M

Bottom-Pair D

143

mean score 25.2 · mean cap $1.8M

Two-Way / Shutdown

109

mean score 20.4 · mean cap $4.6M

04 · Worked example

Leon Draisaitl

Click through to see the five contracts the model picked, their cap hits, and the weighted average that became the predicted value.

EDM · C · age 31 · Elite

05 · Edge cases

What the model handles deliberately

UFA bias (1.5x weight)

UFA contracts are freely negotiated with no arbitration ceiling or RFA leverage. They're the cleanest market signal in the dataset, so they get a 1.5x multiplier in the weighted-average AAV when they appear among a player's five closest comps.

Quality over quantity

The search stops once two quality same-cluster comps are found inside a score band. A tight set of two true peers produces a better estimate than five comps padded out with dissimilar players. McDavid does not get compared to a 30-score winger just because they share a cluster label.

ELC suppression

Entry-level contracts cap out near $925k regardless of production. Players on ELCs almost always show a large positive delta against the model. That's structural, not error. The CBA sets that price, not the comp engine.

Retained salary aggregation

When a player is traded with salary retention, both teams' cap sheets carry a slice of the AAV. The scraper sums those slices back into the player's full contract AAV so value-vs-market math uses the real number, not the post-retention prorated cap.

Pre-signed extensions

A player whose current deal expires this season but who has already signed their next contract gets labeled Extension Signed, not flagged as a re-sign candidate. The signal ignores years_left when there's already a successor contract on the books.

Sub-20-GP suppression

Projecting 82-game rate stats from a small sample produces nonsense. 5 GP at 1 P/G scales to 82 points, which the model would happily compare against an $8M cap. Players under 20 GP get their predicted value nulled out.

06 · Coverage

What's in the model right now

Total players

785

With predictions

682

With contract

785

UFA / unsigned

How the NHL player value model works

Four steps

K-means clustering

Performance score

Find five closest contracts

Weighted-average AAV

How comps are ranked

Cluster distribution

Leon Draisaitl

What the model handles deliberately

UFA bias (1.5x weight)

Quality over quantity

ELC suppression

Retained salary aggregation

Pre-signed extensions

Sub-20-GP suppression

What's in the model right now

Predictions reflect what the market currently pays for similar production.