Measuring Technological Distance for Patent Mapping

Recent works in the information science literature have presented cases of using patent databases and patent classification information to construct network maps of technology fields, which aim to aid in competitive intelligence analysis and innovation decision making. Constructing such a patent network requires a proper measure of the distance between different classes of patents in the patent classification systems. Despite the existence of various distance measures in the literature, it is unclear how to consistently assess and compare them, and which ones to select for constructing patent technology network maps. This ambiguity has limited the development and applications of such technology maps. Herein, we propose to compare alternative distance measures and identify the superior ones by analyzing the differences and similarities in the structural properties of resulting patent network maps. Using United States patent data from 1976 to 2006 and International Patent Classification system, we compare 12 representative distance measures, which quantify inter-field knowledge base proximity, field-crossing diversification likelihood or frequency of innovation agents, and co-occurrences of patent classes in the same patents. Our comparative analyses suggest the patent technology network maps based on normalized co-reference and inventor diversification likelihood measures are the best representatives.

💡 Research Summary

The paper tackles a fundamental problem in patent‑based technology mapping: how to quantify the “distance” between patent classes so that the resulting network accurately reflects the underlying knowledge structure. Although a variety of distance measures have been proposed in the information‑science literature, there has been no systematic comparison of their effects on the topology of technology maps, which has limited the practical adoption of such maps for competitive intelligence and innovation policy.

To fill this gap, the authors assembled a comprehensive dataset of United States patents granted between 1976 and 2006 (approximately 7 million patents) and classified each patent according to the International Patent Classification (IPC) at the eight‑digit level. Using this data, they defined twelve representative distance metrics that fall into three conceptual families:

Knowledge‑base proximity measures – based on citation or co‑reference information (e.g., normalized co‑reference, Jaccard similarity, cosine similarity).
Diversification‑likelihood measures – estimating the probability that an inventor or a firm diversifies across two classes (inventor diversification likelihood, firm diversification likelihood).
Co‑occurrence measures – counting how often two IPC codes appear together in the same patent (raw co‑occurrence frequency, adjusted co‑occurrence ratios).

For each metric the authors constructed a symmetric distance matrix, transformed it into a similarity matrix (by taking the reciprocal or applying a suitable kernel), and then built a weighted undirected network by retaining edges above a chosen similarity threshold. The resulting networks were examined with a battery of structural diagnostics: degree distribution (testing for scale‑free behavior), clustering coefficient, modularity (community strength), average shortest‑path length and diameter, assortativity, and robustness under random or targeted node/edge removal.

The comparative analysis yielded clear patterns. Networks derived from normalized co‑reference consistently displayed high modularity (Q ≈ 0.62), well‑defined communities that correspond to known technological domains, and a power‑law degree distribution indicative of a scale‑free structure. They also proved robust: even after removing up to 20 % of the highest‑degree nodes, the giant component remained largely intact. The inventor diversification likelihood metric produced a very similar topology; its edges tend to connect classes that share active inventors, highlighting pathways of human‑driven knowledge transfer. Both measures therefore capture complementary aspects of technological proximity—one rooted in citation flows, the other in the mobility of innovators.

In contrast, the Jaccard and cosine similarity measures generated overly dense networks with low clustering and modest modularity, making it difficult to distinguish distinct technological fields. Their degree distributions were skewed toward high‑degree hubs, which reduced robustness: targeted removal of these hubs fragmented the network rapidly. Co‑occurrence‑based metrics performed the worst in terms of structural coherence; because they rely on the relatively rare event of multiple IPC codes appearing in a single patent, many class pairs remained disconnected, leading to a fragmented, low‑density graph.

The authors discuss the practical implications of these findings. For analysts who need to map technology landscapes, normalized co‑reference offers a reliable backbone that reflects the flow of cited knowledge across domains, making it suitable for identifying emerging clusters, potential convergence zones, and competitive positioning. Inventor diversification likelihood adds a human dimension, useful for tracking talent migration, forecasting cross‑domain innovation, and designing policies that encourage interdisciplinary collaboration. Simpler similarity measures, while computationally cheap, risk producing misleading visualizations that over‑emphasize “hot” nodes and obscure true domain boundaries.

Limitations are acknowledged: the study is confined to U.S. patents up to 2006, so the relevance of the results for newer fields such as artificial intelligence, synthetic biology, or green technologies remains to be validated. Moreover, the choice of similarity threshold, which determines network sparsity, can affect topological outcomes; systematic sensitivity analysis of this parameter is suggested for future work.

In conclusion, by rigorously comparing twelve distance metrics through the lens of network science, the paper identifies normalized co‑reference and inventor diversification likelihood as the superior candidates for constructing patent‑based technology maps. These metrics produce networks that are structurally sound, interpretable, and resilient, thereby providing a solid foundation for downstream applications in competitive intelligence, technology forecasting, and innovation policy design.

💡 Research Summary

📜 Original Paper Content