When Blockchain Meets Crawlers: Real-time Market Analytics in Solana NFT Markets
In this paper, we design and implement a web crawler system based on the Solana blockchain for the automated collection and analysis of market data for popular non-fungible tokens (NFTs) on the chain. Firstly, the basic information and transaction data of popular NFTs on the Solana chain are collected using the Selenium tool. Secondly, the transaction records of the Magic Eden trading market are thoroughly analyzed by combining them with the Scrapy framework to examine the price fluctuations and market trends of NFTs. In terms of data analysis, this paper employs time series analysis to examine the dynamics of the NFT market and seeks to identify potential price patterns. In addition, the risk and return of different NFTs are evaluated using the mean-variance optimization model, taking into account their characteristics, such as illiquidity and market volatility, to provide investors with data-driven portfolio recommendations. The experimental results show that the combination of crawler technology and financial analytics can effectively analyze NFT data on the Solana blockchain and provide timely market insights and investment strategies. This study provides a reference for further exploration in the field of digital currencies.
💡 Research Summary
The paper presents an end‑to‑end system that automatically harvests transaction data for popular Solana‑based non‑fungible tokens (NFTs) and applies quantitative financial analysis to generate real‑time market insights and portfolio recommendations. The authors begin by motivating the need for timely data in the highly volatile NFT market, noting that traditional manual collection methods are too slow and error‑prone for effective investment decision‑making.
To address data acquisition, the study combines two widely used web‑crawling frameworks: Selenium and Scrapy. Selenium is employed to interact with dynamic, JavaScript‑heavy pages such as solscan.io and the Magic Eden marketplace, allowing the crawler to render client‑side content, click through pagination, and extract detailed token metadata, ownership changes, and transaction prices. Scrapy handles the bulk of static HTML pages and any available REST endpoints, leveraging its asynchronous Twisted‑based engine for high throughput. By orchestrating Selenium for dynamic content and Scrapy for static content, the authors achieve both completeness and speed in data collection.
Recognizing that many NFT sites deploy anti‑crawler defenses (e.g., navigator.webdriver flag, FingerprintJS, Captchas), the authors implement a series of “anti‑anti‑crawler” techniques. They suppress the WebDriver flag, modify the ChromeDriver internal variable $cdc_asdjflasutopfhvcZLmcfl_ to a random string of identical length, and rotate IP addresses through a proxy pool to evade rate‑limiting and IP bans. Session cookies are preserved to maintain continuity across requests. These practical measures significantly improve crawl success rates and reduce detection.
The raw transaction logs are timestamped at the second level. The authors compute a simple return R = (P_{t+1} − P_t)/P_t for each adjacent price point, then adjust it for the exact time interval Δt (in seconds) using the formula R_adjusted = (1 + R)^{1/Δt} − 1. By compounding the adjusted returns across all intervals, they derive a total weighted return R_total that accounts for irregular trading frequencies typical of NFTs. This approach replaces conventional daily or weekly returns with a more granular, time‑normalized metric.
For portfolio construction, the paper adopts the classic mean‑variance optimization (MVO) framework introduced by Markowitz. Expected returns are estimated from the weighted return series, while the covariance matrix is calculated from the same series across the selected NFT assets. The optimization objective is to maximize the Sharpe ratio (E
Comments & Academic Discussion
Loading comments...
Leave a Comment