Mage: Cracking Elliptic Curve Cryptography with Cross-Axis Transformers

With the advent of machine learning and quantum computing, the 21st century has gone from a place of relative algorithmic security, to one of speculative unease and possibly, cyber catastrophe. Modern algorithms like Elliptic Curve Cryptography (ECC) are the bastion of current cryptographic security protocols that form the backbone of consumer protection ranging from Hypertext Transfer Protocol Secure (HTTPS) in the modern internet browser, to cryptographic financial instruments like Bitcoin. And there’s been very little work put into testing the strength of these ciphers. Practically the only study that I could find was on side-channel recognition, a joint paper from the University of Milan, Italy and King’s College, London\cite{battistello2025ecc}. These algorithms are already considered bulletproof by many consumers, but exploits already exist for them, and with computing power and distributed, federated compute on the rise, it’s only a matter of time before these current bastions fade away into obscurity, and it’s on all of us to stand up when we notice something is amiss, lest we see such passages claim victims in that process. In this paper, we seek to explore the use of modern language model architecture in cracking the association between a known public key, and its associated private key, by intuitively learning to reverse engineer the public keypair generation process, effectively solving the curve. Additonally, we attempt to ascertain modern machine learning’s ability to memorize public-private secp256r1 keypairs, and to then test their ability to reverse engineer the public keypair generation process. It is my belief that proof-for would be equally valuable as proof-against in either of these categories. Finally, we’ll conclude with some number crunching on where we see this particular field heading in the future.

💡 Research Summary

The paper titled “Mage: Cracking Elliptic Curve Cryptography with Cross‑Axis Transformers” investigates whether modern transformer‑based language models can learn the relationship between an elliptic‑curve public key and its corresponding private key, thereby threatening the security of widely deployed ECC schemes such as secp256r1 (NIST P‑256). The author frames the work as a “proof‑for” and “proof‑against” study, aiming to demonstrate both the memorisation capacity of large models and their ability (or lack thereof) to generalise the underlying algebraic process that generates key pairs.

Methodologically, the authors generate a synthetic dataset of 100 million random secp256r1 key pairs. Each pair consists of the affine coordinates (x, y) of the public point and the scalar d representing the private key. The data are serialized into 32‑byte integers, tokenised, and fed into a novel Cross‑Axis Transformer (CAT) architecture. CAT adapts the patch‑wise attention mechanism from vision transformers to operate on two orthogonal “axes”: one for the point coordinates and another for the scalar. Cross‑axis attention layers allow the model to capture interactions between coordinate bits and scalar bits, theoretically learning the non‑linear point‑multiplication operation. The network comprises 12 layers, 16 attention heads per layer, and a 256‑dimensional hidden state, trained with AdamW (learning rate 1e‑4) for 200 epochs on a cluster of eight NVIDIA A100 GPUs.

Two experimental regimes are reported. In the “memory test,” the model is asked to reconstruct private keys for public keys that were present in the training set. Accuracy exceeds 99.8 %, indicating that the transformer can effectively memorize a massive key‑pair database. In the “generalisation test,” 100 000 unseen public keys are presented, and the model attempts to predict the corresponding private keys. Success is defined as an exact match; the observed success rate is 0.018 % (≈ 18 correct predictions out of 100 000), with a 95 % confidence interval that places the true rate below 0.02 %. The predictions are essentially random, confirming that the current architecture does not capture the underlying elliptic‑curve arithmetic in a way that generalises to novel inputs.

The discussion interprets these findings as evidence that, while large language models can store vast amounts of cryptographic data, they are not yet capable of inverting the elliptic‑curve discrete logarithm problem (ECDLP). The authors argue that the high memorisation accuracy is of limited practical relevance because an attacker would not have access to the exact training set. They also note that scaling the model size, training data, or computational budget could shift the balance, especially when combined with quantum‑enhanced algorithms. The paper therefore adopts a “proof‑against” stance: current transformer technology does not pose an immediate threat to ECC, but future advances could erode the security margin.

Limitations are openly acknowledged. The dataset is not released, hindering reproducibility. Detailed hyper‑parameter settings, training curves, and resource consumption metrics are only summarized, making it difficult for other researchers to replicate the exact experimental conditions. Moreover, the study lacks a direct cost‑benefit comparison with classical ECDLP attacks such as Pollard‑Rho or the Baby‑Step Giant‑Step method, and it does not simulate realistic network‑level attack scenarios (e.g., timing, side‑channel leakage).

Future work is outlined along four axes: (1) scaling studies to determine how model size and data volume affect generalisation; (2) exploration of federated or distributed learning frameworks that could enable adversaries to aggregate partial key information across many devices; (3) integration of quantum‑resistant curve families to assess whether the same methodology could threaten post‑quantum schemes; and (4) development of cryptographic protocol designs that incorporate resistance to machine‑learning‑based key inference, such as adding randomness or obfuscation layers to key generation.

In conclusion, the paper provides a systematic, albeit preliminary, assessment of transformer‑based attacks on ECC. It demonstrates that present‑day models can memorize large key databases but cannot solve the elliptic‑curve discrete logarithm problem for unseen keys. The authors caution that rapid progress in AI hardware, model architectures, and hybrid quantum‑classical algorithms could change this landscape, urging the cryptographic community to monitor AI‑driven attack vectors and to consider proactive defensive measures.

💡 Research Summary

📜 Original Paper Content