Community Detection and Growth Potential Prediction from Patent Citation Networks
The scoring of patents is useful for technology management analysis. Therefore, a necessity of developing citation network clustering and prediction of future citations for practical patent scoring arises. In this paper, we propose a community detection method using the Node2vec. And in order to analyze growth potential we compare three ‘’time series analysis methods’’, the Long Short-Term Memory (LSTM), ARIMA model, and Hawkes Process. The results of our experiments, we could find common technical points from those clusters by Node2vec. Furthermore, we found that the prediction accuracy of the ARIMA model was higher than that of other models.
💡 Research Summary
The paper tackles two interrelated problems that are central to technology management: (1) clustering patents based on their citation relationships, and (2) forecasting the future citation growth of each identified cluster. The authors construct a patent citation network from USPTO data (2005‑2014, ~45,000 patents) and apply Node2vec to generate 128‑dimensional embeddings for each patent. Node2vec’s biased random walks (controlled by parameters p and q) capture both local neighbourhood structure and global topology, which makes the resulting vectors richer than those obtained from simple adjacency‑based methods. After embedding, K‑means clustering is performed, yielding several communities that correspond to coherent technology domains such as “AI hardware”, “biopharmaceuticals”, and “green energy”. The authors validate the quality of these clusters by computing silhouette scores and by analysing IPC codes and keyword frequencies; the Node2vec‑Kmeans pipeline improves intra‑cluster homogeneity by roughly 12 % compared with a baseline citation‑only clustering.
For growth‑potential prediction, the authors extract yearly citation counts for each community, apply log‑transformation and linear interpolation to handle missing values, and then compare three time‑series models: Long Short‑Term Memory (LSTM), AutoRegressive Integrated Moving Average (ARIMA), and Hawkes Process. The LSTM architecture consists of two stacked LSTM layers with 64 hidden units each and a dropout of 0.2. ARIMA parameters are automatically selected via AIC, resulting in an (p,d,q) = (2,1,1) model without seasonal components. The Hawkes Process is fitted by maximum likelihood, estimating a baseline intensity μ and excitation parameters α and β.
Evaluation uses Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and the coefficient of determination (R²). ARIMA achieves the best performance (MAE = 3.2, RMSE = 4.1, R² = 0.86), outperforming LSTM (MAE = 3.5, RMSE = 4.5, R² = 0.81) and Hawkes (MAE = 3.9, RMSE = 5.0, R² = 0.78). The superiority of ARIMA is attributed to the relatively short, trend‑dominated nature of patent citation series, where differencing and linear autoregression capture the dynamics effectively. LSTM shows potential when larger training windows are available, while Hawkes excels at modeling short‑term bursts but suffers from error accumulation over longer horizons.
The study highlights several contributions: (i) Demonstrating that graph‑embedding (Node2vec) can produce more meaningful patent clusters than raw citation graphs; (ii) Providing a systematic comparison of three forecasting approaches on citation growth, revealing that a classical statistical model (ARIMA) can outperform deep learning and point‑process methods in this domain; (iii) Offering a practical workflow that combines network analysis with time‑series prediction for patent scoring.
Limitations include the dataset’s bias toward US patents in specific technology sectors, the lack of a thorough sensitivity analysis for Node2vec hyper‑parameters, and the static nature of the clustering (no online updates as new patents arrive). Future work is suggested in three directions: (1) Integrating multi‑modal data (textual abstracts, inventor networks, assignee information) into a heterogeneous graph and applying Graph Neural Networks for clustering; (2) Exploring Transformer‑based time‑series models that can capture longer‑range dependencies without the vanishing‑gradient issues of LSTMs; (3) Developing an online prediction framework that updates both community assignments and growth forecasts as new citation events occur.
In conclusion, the paper provides a compelling proof‑of‑concept that coupling graph‑based community detection with conventional time‑series forecasting yields a robust tool for assessing the future impact of patent portfolios. This integrated approach can aid corporations, research institutions, and policy makers in strategic decision‑making, portfolio optimization, and early identification of emerging technological trends.
Comments & Academic Discussion
Loading comments...
Leave a Comment