MemeChain: A Multimodal Cross-Chain Dataset for Meme Coin Forensics and Risk Analysis

MemeChain: A Multimodal Cross-Chain Dataset for Meme Coin Forensics and Risk Analysis
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

The meme coin ecosystem has grown into one of the most active yet least observable segments of the cryptocurrency market, characterized by extreme churn, minimal project commitment, and widespread fraudulent behavior. While countless meme coins are deployed across multiple blockchains, they rely heavily on off-chain web and social infrastructure to signal legitimacy. These very signals are largely absent from existing datasets, which are often limited to single-chain data or lack the multimodal artifacts required for comprehensive risk modeling. To address this gap, we introduce MemeChain, a large-scale, open-source, cross-chain dataset comprising 34,988 meme coins across Ethereum, BNB Smart Chain, Solana, and Base. MemeChain integrates on-chain data with off-chain artifacts, including website HTML source code, token logos, and linked social media accounts, enabling multimodal and forensic study of meme coin projects. Analysis of the dataset shows that visual branding is frequently omitted in low-effort deployments, and many projects lack a functional website. Moreover, we quantify the ecosystem’s extreme volatility, identifying 1,801 tokens (5.15%) that cease all trading activity within just 24 hours of launch. By providing unified cross-chain coverage and rich off-chain context, MemeChain serves as a foundational resource for research in financial forensics, multimodal anomaly detection, and automated scam prevention in the meme coin ecosystem.


💡 Research Summary

MemeChain addresses a critical gap in cryptocurrency research by providing the first large‑scale, multimodal, cross‑chain dataset specifically focused on meme coins. The authors collected 34,988 meme tokens deployed across Ethereum, BNB Smart Chain, Solana, and Base, integrating on‑chain transaction metadata (addresses, deployment timestamps, liquidity pool information, and granular trade events) with off‑chain artifacts such as website HTML source code, token logo images, and links to associated social media accounts (Twitter, Telegram, Discord). The dataset occupies 1.46 GB and is openly released for reproducible research.

The data acquisition pipeline consists of three stages. First, verified meme coins were harvested from established aggregators—CoinMarketCap, CoinGecko, and GeckoTerminal—yielding 8,852 confirmed tokens. Second, the authors scraped decentralized‑exchange indexers DexScreener and CoinSniper, which together listed 65,021 newly created tokens. To separate meme coins from the broader token universe, a name‑based classification was devised: TF‑IDF analysis on the verified set identified 126 high‑impact meme‑related keywords (e.g., “dog”, “cat”, “inu”, “AI”). Tokens whose names contain any of these keywords were flagged as meme candidates, capturing 26,557 tokens. An additional 4,140 tokens were added directly from the pump.fun platform, whose addresses end with “.pump”, ensuring coverage of emerging meme trends that may not be reflected in the keyword list.

A three‑step refinement process was then applied to maximize precision. Stablecoins were removed using CoinMarketCap’s stablecoin list (4 tokens). Tokens with unusually high price (> $0.80) or market cap (> 10⁷ USD) were inspected via CoinGecko categories, eliminating 46 false positives such as staking or bridged assets. Finally, a conservative string‑matching blacklist filtered out remaining non‑meme terms. The resulting curated collection comprises 34,988 high‑confidence meme coins.

Analytical findings reveal two striking phenomena. First, “One‑Day Meme Coins”—tokens that cease all trading activity within 24 hours of launch—account for 1,801 entries, representing 5.15 % of the dataset. This highlights an ultra‑fast rug‑pull dynamic that prior studies, which typically focus on longer‑lived assets, have missed. Second, web‑infrastructure forges a strong proxy for project viability: roughly 30 % of tokens lack any website, and among those with a site, a substantial fraction become unreachable within 48 hours after the initial liquidity event. Visual branding is also often absent or of low quality, underscoring the minimal commitment of many creators.

The paper’s contributions are fourfold: (1) release of the first open‑source multimodal meme‑coin dataset, (2) quantitative lifecycle analysis introducing the “One‑Day” concept, (3) the inaugural forensic study of meme‑coin web presence, and (4) a roadmap for future work, including survival‑analysis modeling, early‑warning scam detectors, and automated risk‑assessment pipelines.

Limitations are acknowledged: off‑chain snapshots may decay over time as links break or content changes, and name‑based classification may miss novel meme vocabularies. The authors propose continuous crawling, machine‑learning‑enhanced text and image classifiers, and cross‑chain migration analysis to mitigate these issues. By furnishing a rich, cross‑chain, multimodal resource, MemeChain enables researchers to develop more robust anomaly‑detection models, deepen understanding of meme‑coin fraud mechanisms, and ultimately support better investor protection and regulatory oversight.


Comments & Academic Discussion

Loading comments...

Leave a Comment