Robust Generalizable Heterogeneous Legal Link Prediction
Recent work has applied link prediction to large heterogeneous legal citation networks \new{with rich meta-features}. We find that this approach can be improved by including edge dropout and feature concatenation for the learning of more robust representations, which reduces error rates by up to 45%. We also propose an approach based on multilingual node features with an improved asymmetric decoder for compatibility, which allows us to generalize and extend the prediction to more, geographically and linguistically disjoint, data from New Zealand. Our adaptations also improve inductive transferability between these disjoint legal systems.
💡 Research Summary
The paper addresses the problem of link prediction in large heterogeneous legal citation networks, extending prior work that relied on a heterogeneous graph enrichment (HGE) model with rich meta‑features. The authors identify two major weaknesses of the previous approach: susceptibility to noisy or missing meta‑information and a tendency to over‑fit due to the high dimensionality of heterogeneous representations. To mitigate these issues, they introduce two complementary techniques. First, they apply Edge Dropout (DropEdge) during training, randomly removing 50 % of edges for each batch. This simulates the frequent incompleteness of legal citation data and forces the model to learn robust node embeddings that do not depend on any single edge. Second, they adopt Feature Concatenation, exposing the outputs of all GNN layers directly to the decoder. By concatenating multi‑hop representations, the model gains a richer view of the graph topology and reduces over‑fitting, especially when combined with residual connections.
The decoder is also redesigned. Instead of the block‑wise inner‑product used in the original HGE, the authors interleave source and target node embeddings before computing the asymmetric inner product. This change improves compatibility with multilingual textual embeddings (Jina‑v2 for German data, Jina‑v3 for multilingual transfer) and ensures that semantic dimensions are aligned with the concatenated structural features.
Experiments are conducted on two disjoint legal citation graphs: OLD201k, a German civil‑law‑oriented dataset, and LiO338k, a New Zealand case‑law‑oriented dataset. Both contain multiple edge types (law‑law, law‑case, case‑case, etc.) and full textual content for each node, allowing a realistic semi‑inductive temporal split. The authors evaluate a broad set of baselines, including homogeneous GCN, relational GCN (R‑GCN), heterogeneous graph transformer (HGT), simple heterogeneous graph network (Simple HGN), GraphSAGE, and GraphSA‑GE, alongside the original HGE.
Results show that the proposed R‑HGE model consistently outperforms all baselines. On the LiO338k dataset, R‑HGE achieves an average precision (AP) of 97.5 % and an AUC‑ROC of 97.4 %, reducing error rates by more than a factor of two compared with the original HGE. On the German dataset, it reaches AP = 91.0 % and AUC‑ROC = 90.3 %, again improving by 2–3 percentage points. An ablation study isolates the contributions of each component: edge dropout yields the largest gain, feature concatenation provides a moderate boost, and the interleaved decoder has a negligible but useful compatibility effect.
The paper also explores fully inductive transfer between the two legal systems. When training on one dataset and testing on the other without any fine‑tuning, GraphSA‑GE shows the most stable performance, while R‑HGE excels when the source dataset is larger (German → New Zealand). Transfer in the opposite direction (New Zealand → German) is more challenging due to the sparser training data, and R‑HGE’s performance drops slightly, though it remains competitive.
Computationally, R‑HGE incurs roughly a 10 % overhead relative to HGE because of the larger concatenated representations, but it remains far more efficient than the full HGT model. All code, hyper‑parameters, and experimental scripts are released publicly, facilitating reproducibility.
In conclusion, the study demonstrates that simple regularisation (edge dropout) combined with richer multi‑scale feature exposure (concatenation) can substantially improve robustness and generalisation of heterogeneous graph neural networks for legal link prediction. Moreover, the multilingual node embeddings and asymmetric decoder enable the model to bridge distinct legal traditions, opening avenues for cross‑jurisdictional legal analytics. Future work may focus on scaling to additional jurisdictions, refining meta‑feature extraction, and integrating online learning for continuously evolving legal corpora.
Comments & Academic Discussion
Loading comments...
Leave a Comment