How the NHL player value model works
This is the hockey analytics layer behind every prediction. For each player the model finds five close NHL contracts, weights them by similarity, and returns the weighted average AAV. No black-box regression, no opaque coefficients, just hockey data models built from real signed deals.
Four steps
Every step is deterministic. Drop a player in, get five real comparables and a number out.
- Step 01
K-means clustering
Seven unsupervised clusters group every active skater by deployment (TOI, PP points, plus/minus, faceoff%, position). Centroids decide the labels: Elite, Top-Line F, Middle-Six F, Bottom-Six F, Top-Four D, Bottom-Pair D, Two-Way / Shutdown.
- Step 02
Performance score
Z-scored production inside each cluster, scaled to roughly +-100. Forwards score on G/60, P/60, PP pts, shots, sh%, plus/minus. Defensemen on TOI/GP, PP pts, plus/minus, sh%. The score is the tier signal the comp engine actually uses.
- Step 03
Find five closest contracts
Same-cluster first, inside progressively wider score bands (+-30, +-60, +-100, +-200). The search stops once two quality peers are found rather than padding with diluted matches. Cross-cluster fill only kicks in when same-cluster is genuinely empty.
- Step 04
Weighted-average AAV
Each comp's cap hit is weighted by 1/(1 + distance), with UFA contracts multiplied 1.5x because those are the cleanest free-market signal. That weighted average is the player's predicted value. Delta = predicted minus actual cap hit.
How comps are ranked
Distance is a weighted combination of four normalized features. Weights favor production rate over biographical signals so a 34-year-old star still comps against 28-year-old stars instead of 34-year-old depth players.
Cluster distribution
Players bucketed into the seven clusters. The comp engine searches the same cluster first; it only expands cross-cluster when in-cluster comps are genuinely empty.
Leon Draisaitl
Click through to see the five contracts the model picked, their cap hits, and the weighted average that became the predicted value.
What the model handles deliberately
UFA bias (1.5x weight)
UFA contracts are freely negotiated with no arbitration ceiling or RFA leverage. They're the cleanest market signal in the dataset, so they get a 1.5x multiplier in the weighted-average AAV when they appear among a player's five closest comps.
Quality over quantity
The search stops once two quality same-cluster comps are found inside a score band. A tight set of two true peers produces a better estimate than five comps padded out with dissimilar players. McDavid does not get compared to a 30-score winger just because they share a cluster label.
ELC suppression
Entry-level contracts cap out near $925k regardless of production. Players on ELCs almost always show a large positive delta against the model. That's structural, not error. The CBA sets that price, not the comp engine.
Retained salary aggregation
When a player is traded with salary retention, both teams' cap sheets carry a slice of the AAV. The scraper sums those slices back into the player's full contract AAV so value-vs-market math uses the real number, not the post-retention prorated cap.
Pre-signed extensions
A player whose current deal expires this season but who has already signed their next contract gets labeled Extension Signed, not flagged as a re-sign candidate. The signal ignores years_left when there's already a successor contract on the books.
Sub-20-GP suppression
Projecting 82-game rate stats from a small sample produces nonsense. 5 GP at 1 P/G scales to 82 points, which the model would happily compare against an $8M cap. Players under 20 GP get their predicted value nulled out.