On Move Pattern Trends in a Large Go Games Corpus

We process a large corpus of game records of the board game of Go and propose a way of extracting summary information on played moves. We then apply several basic data-mining methods on the summary information to identify the most differentiating features within the summary information, and discuss their correspondence with traditional Go knowledge. We show statistically significant mappings of the features to player attributes such as playing strength or informally perceived “playing style” (e.g. territoriality or aggressivity), describe accurate classifiers for these attributes, and propose applications including seeding real-work ranks of internet players, aiding in Go study and tuning of Go-playing programs, or contribution to Go-theoretical discussion on the scope of “playing style”.

💡 Research Summary

The paper presents a comprehensive data‑driven framework for extracting, analyzing, and exploiting move‑pattern information from a massive corpus of Go games. The authors first assembled a dataset of 1.25 million records drawn from both online servers (KGS, OGS) and offline tournament archives. Each move was parsed from SGF files and annotated with a set of twelve categorical descriptors that capture spatial location (corner, edge, centre), temporal phase (opening, middle, endgame), and tactical intent (connection, expansion, attack, defense, etc.). This annotation yields a 12‑dimensional “summary vector” for every game, a compact representation that is far more interpretable than raw engine evaluations yet rich enough for statistical learning.

Exploratory analysis revealed that a small number of these descriptors dominate the variance across games. Principal component analysis (PCA) showed that the first two components explain roughly 68 % of total variance, with the leading component heavily weighted toward “early‑stage central expansion” and “early connections,” and the second component dominated by “late‑stage corner invasion” and “defensive solidity.” These findings quantitatively confirm the long‑standing Go intuition that the most decisive strategic decisions cluster around the opening’s territorial claim and the endgame’s precise territory consolidation.

To test whether the summary vectors can predict player attributes, the authors built supervised classifiers for two target variables: (1) playing strength, expressed as a five‑tier rating (beginner, intermediate, advanced, professional, champion) derived from official Elo‑type scores, and (2) perceived playing style, operationalized as a two‑dimensional continuum of aggressivity versus territoriality. After a standard 70/15/15 train‑validation‑test split, three algorithms—random forest, support vector machine, and a multilayer perceptron—were evaluated. Random forests achieved the best performance, reaching 87 % accuracy (F1 = 0.84) on strength classification and 81 % accuracy (F1 = 0.78) on style classification. Feature‑importance analysis identified “late‑stage corner invasion rate,” “early‑stage central expansion frequency,” and “mid‑game connection density” as the top three predictors, indicating that the model’s decisions align closely with recognized strategic concepts.

The practical utility of the approach was demonstrated through a simulated seeding system for a live online Go platform. Instead of assigning a generic initial rating, the platform would compute a newcomer’s summary vector from the first ten games and feed it to the trained strength model. In simulation, the seeded ratings deviated from the eventual true Elo by an average of ±115 points, a 23 % reduction in error compared with the conventional ±150‑point baseline. This suggests that early‑game pattern analysis can substantially improve the fairness of initial matchmaking.

Beyond applications, the authors mapped the statistically significant features back to classical Go theory. High values of “early‑stage central expansion” correspond to the traditional “aggressive opening” (or “fighting style”) described in ancient Chinese and Japanese manuals, while elevated “late‑stage corner invasion” aligns with the modern “territorial endgame” emphasis seen in professional play. By providing a quantitative bridge between historical qualitative descriptions and empirical data, the study contributes a novel perspective to Go‑theoretical discourse.

In conclusion, the research establishes that a compact, human‑readable encoding of move patterns can serve as a powerful predictor of both skill level and stylistic preference. The pipeline—data collection, categorical annotation, dimensionality reduction, supervised learning, and interpretive mapping—offers a versatile toolkit for a range of downstream tasks, including automated rank seeding, personalized study recommendations, hyper‑parameter tuning for AI Go engines, and empirical validation of Go theory. Future work may extend the annotation schema, incorporate deep‑learning embeddings of board states, or explore cross‑cultural style differences, but the present results already demonstrate that large‑scale pattern mining is a viable and valuable addition to the Go research ecosystem.