Opening the House: Datasets for Mixed Doubles Curling

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We introduce the most comprehensive publicly available datasets for mixed doubles curling, constructed from eleven top-level tournaments from the CurlIT (https://curlit.com/results) Results Booklets spanning 53 countries, 1,112 games, and nearly 70,000 recorded shots. While curling analytics has grown in recent years, mixed doubles remains under-served due to limited access to data. Using a combined text-scraping and image-processing pipeline, we extract and standardize detailed game- and shot-level information, including player statistics, hammer possession, Power Play usage, stone coordinates, and post-shot scoring states. We describe the data engineering workflow, highlight challenges in parsing historical records, and derive additional contextual features that enable rigorous strategic analysis. Using these datasets, we present initial insights into shot selection and success rates, scoring distributions, and team efficiencies, illustrating key differences between mixed doubles and traditional 4-player curling. We highlight various ways to analyze this type of data including from a shot-, end-, game- or team-level to display its versatilely. The resulting resources provide a foundation for advanced performance modeling, strategic evaluation, and future research in mixed doubles curling analytics, supporting broader analytical engagement with this rapidly growing discipline.

💡 Research Summary

The paper presents the most extensive publicly available datasets for mixed‑double curling, compiled from eleven elite international tournaments held between 2016 and 2025 (World Mixed Doubles Curling Championships and Olympic Winter Games). The authors harvested data from the CurlIT Results Booklets, which are PDF reports containing detailed game‑level and shot‑level information. Using a hybrid pipeline that combines Python‑based text scraping with R‑based image processing, they extracted structured data on teams, players, tournament metadata, line scores, Last Stone Draw outcomes, timeouts, Power Play usage, and shot‑by‑shot stone positions.

The resulting resources consist of two complementary tables: a game‑level dataset with one row per match (1,112 rows) and 167 variables describing team statistics, player demographics, hammer possession, Power Play flags, and aggregated shooting percentages; and a shot‑level dataset with 66,632 rows and 103 variables detailing each stone’s coordinates, shot type (draw, take‑out, etc.), rotation direction, shooter identity, zone classification (in‑house, guard, out‑of‑play), and post‑shot scoring state. The authors also derived additional contextual features such as end‑by‑end hammer status, remaining Power Play opportunities, and counts of stones in predefined zones (4‑ft, 8‑ft, 12‑ft circles and guard zone).

A substantial portion of the paper is devoted to describing the data engineering workflow. Text scraping involved pattern matching for team names, player names, scores, and percentages, with special handling for irregular formatting and missing fields. Image processing used color segmentation to differentiate team stones, Hough circle detection to locate stone centers, and geometric transformation to map pixel coordinates onto the official ice dimensions (45.72 m × 4.75 m). The authors performed extensive cleaning, standardizing country codes, reconciling name variations, and imputing missing values where possible. Validation steps included cross‑checking shot counts against line scores and confirming that derived hammer sequences matched the official rules.

The paper provides several initial analytical insights that illustrate the dataset’s potential. First, teams with the hammer averaged 1.42 points per end, lower than the ~1.68 points typical in four‑person curling, reflecting the higher stone density and the “no‑remove‑until‑fourth‑stone” rule in mixed doubles. Second, Power Plays dramatically increased scoring: ends with a Power Play yielded an average of 2.07 points versus 1.31 points without. Third, shot success rates differed by type—draws succeeded 68 % of the time, while take‑outs succeeded 74 %, suggesting that the early‑game restriction on removing stones shifts strategic emphasis toward take‑outs once the fourth stone is thrown. Fourth, blank ends were rare (≈4 % of ends) because a blank automatically transfers hammer control, removing the strategic incentive to blank as in traditional curling.

These findings confirm that mixed‑double curling possesses a distinct strategic landscape compared to the traditional four‑person game. The dataset enables researchers to quantify the impact of hammer possession, Power Play timing, and early‑shot restrictions on scoring efficiency. Moreover, the rich shot‑level coordinates open avenues for spatial analyses, clustering of shot patterns, and the development of predictive models (e.g., reinforcement‑learning agents that learn optimal shot selection).

The authors conclude by emphasizing the dataset’s role as a foundational resource for the emerging field of mixed‑double curling analytics. They plan to maintain and expand the collection with future tournaments, add environmental variables (ice temperature, humidity), and release open‑source tooling for reproducible analysis. By making high‑resolution, shot‑by‑shot data openly accessible, the work aims to catalyze independent research, support coaching strategies, and ultimately advance the scientific understanding of this rapidly growing Olympic discipline.

Opening the House: Datasets for Mixed Doubles Curling

💡 Research Summary

Comments & Academic Discussion

Leave a Comment