A Techno-Economic Framework for Cost Modeling and Revenue Opportunities in Open and Programmable AI-RAN

The large-scale deployment of 5G networks has not delivered the expected return on investment for mobile network operators, raising concerns about the economic viability of future 6G rollouts. At the same time, surging demand for Artificial Intellige…

Authors: Gabriele Gemmi, Michele Polese, Tommaso Melodia

A Techno-Economic Framework for Cost Modeling and Revenue Opportunities in Open and Programmable AI-RAN
A T echno-Economic Frame work for Cost Modeling and Re v enue Opportunities in Open and Programmable AI-RAN Gabriele Gemmi, Michele Polese, T ommaso Melodia Institute for Intelligent Networked Systems, Northeastern Uni versity { g.gemmi, m.polese, melodia } @northeastern.edu Abstract —The large-scale deployment of 5G netw orks has not deliver ed the expected r eturn on inv estment f or mobile network operators, raising concerns about the economic viability of futur e 6G rollouts. At the same time, sur ging demand f or Artificial Intel- ligence (AI) inference and training workloads is straining global compute capacity . AI-RAN architectur es, in which Radio Access Network (RAN) platforms accelerated on Graphics Processing Unit (GPU) share idle capacity with AI workloads during off- peak periods, offer a potential path to improved capital efficiency . Howev er , the economic case f or such systems remains unsubstan- tiated. In this paper , we present a techno-economic analysis of AI- RAN deployments by combining publicly available benchmarks of 5G Layer -1 pr ocessing on heterogeneous platf orms—from x86 servers with accelerators f or channel coding to moder n GPUs— with realistic traffic models and AI service demand profiles for Large Language Model (LLM) inference. W e construct a joint cost and r evenue model that quantifies the surplus compute ca- pacity av ailable in GPU-based RAN deployments and evaluates the returns from leasing it to AI tenants. Our results show that, across a range of scenarios encompassing token depreciation, varying demand dynamics, and diverse GPU serving densities, the additional capital and operational expenditures of GPU-heavy deployments ar e offset by AI-on-RAN rev enue, yielding a return on in vestment of up to 8 × . These findings strengthen the long- term economic case for accelerator-based RAN architectures and future 6G deployments. I . I N T RO D U C T I O N Over the past decade, mobile network operators hav e com- pleted large-scale fifth generation (5G) deployments, upgrad- ing spectrum holdings, transport networks, and radio access infrastructure. Despite these inv estments, the expected rev enue uplift has not materialized. Most operators ha ve experienced flat or declining returns, and no transformativ e application has emerged to justify the capital expenditure profile of the 5G rollout [1]. As the community begins to define sixth gener- ation (6G) system requirements, this economic misalignment highlights a structural challenge: without a ne w source of v alue creation, the incentive to undertake a further hardware refresh remains weak. In parallel, Artificial Intelligence (AI) has become a domi- nant dri ver of global compute demand. T raining and inference workloads increasingly rely on high-performance accelerators, creating sustained pressure on datacenter capacity [2]. These workloads also e xhibit significant spatial and temporal variabil- ity , suggesting potential complementarity with the utilization profile of the Radio Access Network (RAN) [3]. In virtualized and O-RAN architectures, where Layer-1 processing increas- ingly relies on general-purpose accelerators such as Graphics Processing Units (GPUs), the same hardware supporting the RAN may be repurposed for different compute tasks when traffic intensity is lo w , with utilization gains dri ven by statistical multiplexing. This concept, often referred to as AI-and-RAN by the AI- RAN Alliance, en visions the RAN not only as a consumer of AI algorithms but also as a pro vider of compute resources to external tenants. Under this model, idle accelerator capacity during off-peak periods can be allocated to AI tasks (also known as AI-on-RAN workloads), thereby introducing a new rev enue mechanism while preserving the real-time performance constraints of the radio workload. If realized at scale, such architectures may help address the economic feasibility of future 6G deployments by improving the capital efficiency of accelerator-rich RAN platforms. Howe ver , despite research and technical dev elopments in this space [4]–[8], there are still se veral questions around the economics of such systems. Among others: will the rev enue generated by AI-on-RAN offset the increased Capital Expenditure (CapEx) associated with powerful general purpose accelerators? Ar e systems dynamics (e.g ., utilization patterns over time) amenable to gener ating enough r evenue thr ough sharing? In this paper, we answer these questions by inv estigating the techno-economic implications of AI-RAN systems. W e combine publicly available benchmarks of 5G Layer-1 pro- cessing on heterogeneous platforms, including x86 servers with Forward Error Correction (FEC) accelerators to modern GPUs, with user -driven traffic models to quantify the amount of sur- plus compute capacity expected in realistic deployments. W e also model demand for AI services, namely , Large Language Models (LLMs), and combine the models to construct a cost and rev enue model that characterizes the value of renting RAN excess capacity to AI tenants. Our results indicate that in most scenarios modeled in this paper—including token deprecia- tion, v arying demand dynamics, and diverse serving densities for LLMs on GPUs—the additional CapEx and Operational Expenditure (OpEx) introduced by GPU-heavy deployments are offset through revenue generated by AI-on-RAN services running on the shared AI-RAN infrastructure, with a return on in vestment of up to 8 times. This strengthens the long-term economic case for accelerator-based RAN architectures and future 6G rollouts. W e also release the software used for the techno-economic model as open source and also packaged as a webapp that can be used to reproduce the results of this paper . 1 2 The remainder of this article is org anized as follows. Sec- tion II surve ys related work, Section III introduces the RAN architecture landscape and compares GPU-based and FEC- accelerated platforms. Section IV sizes the server cluster for a tar get deployment and characterizes residual capacity a vail- able for non-RAN workloads. Section V formulates the LLM inference demand model consuming this surplus. Section VI combines deployment cost and inference re venue into a unified economic frame work. Section VII presents quantitativ e results for a dense urban deployment in Milan, Italy . Finally , Sec- tion VIII summarizes findings and discusses future directions. I I . R E L A T E D W O R K The economic case for virtualized and open RAN has been the subject of extensi ve industry analysis. Analysys Mason con- cluded that Open RAN can deliver TCO savings of up to 30% under f av orable conditions [9], and a follow-up study reported growing operator confidence, with Open RAN projected to reach 20–30% of total RAN re venues by 2028 [10], [11]. These analyses, howe ver , ev aluate the RAN compute substrate as single-purpose infrastructure and do not consider the additional rev enue that idle accelerator capacity could generate. The AI-RAN Alliance, formed in 2024 [5], introduced a framew ork for repurposing that idle capacity by organizing research into three pillars—AI-for-RAN, AI-on-RAN, and AI- and-RAN—with the latter targeting the coexistence of RAN and AI w orkloads on shared GPU hardware. K undu et al. [6] proposed a reference architecture with a proof-of-concept on GH200 serv ers, establishing feasibility without addressing eco- nomics. On the orchestration side, Shah et al. [7], [8] de veloped the CA ORA frame work, which uses reinforcement learning to dynamically partition MIG resources while maintaining near- 99% RAN demand fulfillment, and Polese et al. [4] extended the O-RAN Service Management and Orchestration (SMO) to support unified management of heterogeneous workloads. These contributions address the how of AI-RAN coexistence but lea ve the whether it pays off question unanswered. On the demand side, Erdil [12] dev eloped a roofline-style cost model for LLM serving that relates token throughput to hardware constraints, while Gundlach et al. [13] empirically documented quality-adjusted inference price declines of 5 – 10 × per year . Demirer et al. [14] corroborated these trends using API transaction data from OpenRouter and Microsoft Azure, report- ing approximately 1000 × price declines for models at the 2023 frontier and estimating short-run price elasticities just above unity . From a model efficienc y standpoint, Xiao et al. [15] introduced the concept of capability density and sho wed that the maximum capability density of open-source LLMs doubles roughly e very 3.5 months, implying that inference cost per unit of capability decreases exponentially—a trend directly relev ant 1 Repo: www .github.com/wineslab/AIRAN-re venue-model 2 W ebapp: www .open6g.us/#/ai-ran-economics AI - RAN Compute Infrastructure CU - UP SDAP PDCP CU - CP RRC PDCP DU RLC MAC PHY - high O- RAN RIC s AI - RAN Cloud at the edge Near - RT RIC PHY - low RF Cell sit e RU SMO Non - RT RIC FH Accelerator s Software AI - and - RAN AI - for - RAN Improve RAN efficien cy with AI AI - on - RAN Va lu e - added services at the RAN edg e dUPF Manage the dynamic coexistence of AI and RAN on shared infras tructure Fig. 1. AI-RAN system architecture and role of AI-for-RAN (increasing RAN efficienc y and performance with AI), AI-on-RAN (value-added edge services co-deployed with the RAN and with access to RAN data and telemetry), and AI- and-RAN (orchestration and management to support the coexistence of RAN, AI-for-RAN, and AI-on-RAN). to our ρ tok parameter . The BurstGPT dataset [16], which we use to deriv e weekly demand profiles, provides timestamped traces of real-world LLM inference traf fic. Although edge LLM deplo yment has receiv ed attention from a systems ef ficiency perspecti ve [17], [18], no prior work treats the RAN as a source of opportunistic inference compute or models the resulting revenue against the GPU cost premium. This paper fills that gap with a unified techno-economic frame- work combining platform benchmarking, demand-dri ven de- ployment sizing, and a dual-use re venue analysis. I I I . R A N A R C H I T E C T U R E S A N D C O S T M O D E L T raditional RANs rely on integrated, v endor-specific base stations in which radio, baseband processing, and control func- tions are tightly coupled within proprietary hardware. The 3GPP and O-RAN architectures in 5G and be yond disaggre gate these functions into distinct logical components: the Radio Unit (R U) handles radio frequency operations, the Distributed Unit (DU) processes Layer -1 and Layer -2 protocol functions, and the Central Unit (CU) manages higher-layer control and user plane processing (Fig. 1). While the R U is usually based on Field Programmable Gate Arrays (FPGAs) or dedicated hard- ware, the DU and CU open opportunities for software-based implementations deployed on general purpose compute [19], [20]. AI-RAN e xtends this architecture by explicitly designing the compute substrate to support both real-time Layer -1 processing and batch or inference AI-on-RAN workloads from external AI tenants. The key enabler is the use of high-performance accel- erators, such as GPUs, whose resources can be dynamically allocated to RAN functions (e.g., during peak traffic periods) and to AI tasks (e.g., during idle intervals). The economic via- bility of this dual-use model depends critically on the baseband capacity , cost, and po wer efficienc y of the underlying hardw are platforms, which we characterize below . T ABLE I S U MM A RY O F S E RVE R C HA R AC T E RI S T I CS , B A SE BA N D C A PAC IT Y , A ND E C ON O M I C E FFI C IE N C Y Platform Accel. Layer 1 Cost ($) Po wer (W) L DL/UL N C BW (MHz) B (MHz) η C (MHz/$) η O (MHz/W) Ref. ARS-111GL GH200 Aerial 45000 1200 4/4 40 100 16000 0.36 13.33 [21] ARS-111GL GH200 Aerial 45000 1200 16/8 6 100 9600 0.21 8.00 [22] mixed 3:1 w . avg. GH200 Aerial 45000 1200 – – – 14400 0.32 12.00 EGX74I VRB1 FlexRAN 6000 300 16/8 6 100 9600 1.60 32.00 [23] EGX74I VRB1 FlexRAN 6000 300 4/4 36 10 1440 0.24 4.80 [23] DL110 VRB1 FlexRAN 7200 300 4/4 18 20 1440 0.20 4.80 [24] DL110 VRB1 FlexRAN 7200 300 16/8 6 100 9600 1.33 32.00 [24] mixed 3:1 w . avg. VRB1 FlexRAN 6600 300 – – – 3480 0.70 11.60 A. Baseband Capacity Metric T o compare platforms with heterogeneous Multiple Input, Multiple Output (MIMO) layers, cell count, and bandwidth configurations under a single metric, we define the baseband ca- pacity B [MHz] as the aggregate downlink baseband processing capacity: B = L DL × N C × BW , (1) where L DL is the number of do wnlink MIMO layers, N C is the number of simultaneously served cells, and BW is the channel bandwidth per cell in MHz. This product captures the total spatial-frequenc y processing load sustained by a baseband processor (e.g., a server) and provides the basis for two eco- nomic ef ficiency indicators. Capital ef ficiency relates baseband capacity to the serv er acquisition cost: η C = B Cost while power efficienc y relates it to the platform power consumption: η O = B Power Therefore, the first represent a component of CapEx, while the second contributes to OpEx. B. Platform Comparison T able I summarizes the baseband processing capacity and economic efficienc y of representative server platforms from publicly a vailable benchmarks and vendor specifications. The platforms are grouped by Layer-1 software stack: NVIDIA Aerial [25], which targets GPU-based architectures using the GH200 Grace Hopper Superchip, and Intel FlexRAN [26], which leverages the VRB1 FEC accelerator [27] on x86 server platforms. The literature and technical specifications for both platforms [25]–[27] identifies tw o representativ e cell configura- tions as benchmarks: a macr o-cell with up to 16 MIMO layers transmitted through a mMIMO frontend, and a micr o-cell with 4 MIMO layers transmitted through a lo w-order antenna panel. The GPU-based ARS-111GL achie ves 9600 MHz in the macro-cell configuration and up to 16 GHz in micro-cell mode, both at a platform cost of $45000 and 1200 W po wer draw . The VRB1-based FlexRAN platforms (EGX74I at $6000 and DL110 at $7200, both at 300 W) reach 9600 MHz in the macro-cell configuration—matching the GPU in raw baseband throughput—at a fraction of the cost and power . As discussed in [26], [27], for the micro-cell mode reference solution, capac- ity drops to 1440 MHz with correspondingly lower ef ficiency . In a realistic dense urban deployment, macro and micro cells coe xist. F ollowing ITU-R deployment guidelines [28], we adopt a 1:3 macro-to-micro cell ratio for the remainder of the analysis. The deployment-weighted average baseband capacity per server is: B = B macro + 3 B micro 4 . (2) For Aerial, B Aerial = 14400 MHz, while for FlexRAN, B FlexRAN = 3480 MHz. Despite Aerial’ s higher absolute capacity , FlexRAN achiev es a superior CapEx efficienc y: η FlexRAN C ≈ 0 . 70 MHz/$ versus η Aerial C ≈ 0 . 32 MHz/$. This is because Aerial’ s se venfold cost premium outweighs its fourfold capacity advantage in the mixed deployment, making FlexRAN the more capital-ef ficient baseband platform—absent the dual- use rev enue considered in the following sections. T o translate these per-serv er metrics into deployment costs, we dimension each stack to deliv er an aggregate peak through- put of 10 Gbps—assuming SE = 9 bit/s/Hz and 20% L1/L2 ov erhead—and compute the 10-year T otal Cost of Ownership (TCO), reflecting a typical mobile network in vestment cycle. Figure 2 reports the resulting CapEx, OpEx, and TCO bro- ken do wn by cell type and platform stack. In the macro-cell configuration, FlexRAN achieves the lowest TCO thanks to its high per-serv er capacity at low cost and power . In the micro-cell and mixed 3:1 deployments, the gap narrows as the higher cell count reduces FlexRAN’ s efficiency advantage. Howe ver , the economic case for AI-RAN rests not on baseband cost alone but on the dual-use capability of GPU hardware: during off-peak hours, the GPU can be repurposed for AI inference workloads—an option unavailable to dedicated FEC accelerators. The higher acquisition cost of GPU platforms must therefore be weighed against the additional re venue they can generate from surplus compute, which we quantify in the following sections. I V . R A N D E M A N D A N D D E P L OY M E N T M O D E L W e develop a general model for sizing an AI-RAN deploy- ment gi ven a target geographic area and population density , Micro Macro Mixed 0 k 2 k 4 k 6 k 8 k Cost ($) Aerial FlexRAN CapEx OpEx (10y) Fig. 2. TCO for 10 Gbps aggregate peak throughput over 10 years. deriving the number of servers required, their associated cap- ital and operational costs, and the residual compute capacity av ailable for AI workloads. The model is anchored to the Inter - national Mobile T elecommunication (IMT)-2030 (6G) capabil- ity targets defined in International T elecommunication Union (ITU)-R Recommendation M.2160 [29], which sets a per-user experienced do wnlink rate of 300–500 Mbps for the immersive communication usage scenario—a 3 × impro vement ov er the IMT-2020 requirement of 100 Mbps [30]. Since no outdoor dense urban area traffic capacity target has been finalized for IMT-2030 at the time of writing, we adopt a user-demand- driv en approach to derive the required area capacity . W e model the per-user downlink rate R user ( w , h ) as the product of a long-term growth trend and a normalized hourly demand profile λ RAN ( h ) ∈ [0 , 1] , with max h λ RAN ( h ) = 1 at the busy hour: R user ( w , h ) = R user (0) · ρ w/ 52 R · λ RAN ( h ) [ Mbps ] , (3) where w is the week index (with w = 0 at deployment), h ∈ { 0 , . . . , 23 } is the hour of day , ρ R is the annual RAN demand growth factor , and R user (0) is the baseline busy-hour rate from IMT-2030 (e.g., 300 Mbps). Given a deployment area with population density ρ pop , the downlink demand per unit area at week w , hour h is: D area ( w , h ) = ρ pop · η pen · α BH · R user ( w , h ) [ Mbps/km 2 ] , (4) where η pen is the smartphone penetration and α BH is the busy- hour concurrency factor . T o translate this demand into a serv er count, each platform provides an average baseband capacity B (defined in Eq. (1)), computed as the mean across the benchmark configurations of the selected Layer-1 stack (T able I). After accounting for av erage spectral efficiency and L1/L2 signaling overhead η OH (DeModulation Reference Signal (DMRS), Channel State In- formation - Reference Signal (CSI-RS), Synchronization Signal Block (SSB), Physical Random Access Channel (PRA CH), guard bands), the net deliverable downlink throughput per server is: C net = B · SE · (1 − η OH ) [ Mbps ] . (5) For a target deployment area A [km 2 ], we denote by G RAN ( w , h ) the number of GPUs (serv ers) required for RAN at week w , hour h : G RAN ( w , h ) =  A · D area ( w , h ) C net  . (6) 0 24 48 72 96 120 144 0 0 . 5 1 h Normalized Demand λ RAN ( h ) λ LLM ( h ) Fig. 3. W eekly usage patterns for RAN and LLM workloads showing slightly complementary demand cycles. The physical deployment is fix ed at the dimensioning week w dim and b usy hour h peak (e.g., w dim = 0 to size for launch, or w dim = W to over -provision for end-of-horizon demand). Let G RAN ( w dim , h peak ) denote the dimensioned cluster size. The CapEx is: C = G RAN ( w dim , h peak ) · Cost server . [$] , (7) The weekly profile λ RAN ( h ) , derived from empirical urban macro-cell measurements [31], is sho wn in Fig. 3. Because G RAN ( w , h ) already encodes both demand growth and hourly variation, we define the weekly OpEx as: O ( w ) = 23 X h =0 G RAN ( w , h ) · P server · PUE 1000 · c elec , (8) where P server is the server TDP in watts and the factor 1 / 1000 con verts to kilow atts. Because the cluster is dimensioned to a fixed peak, servers are partially idle at all other times and their spare capacity is av ailable for AI workloads. The number of GPUs G free ( w , h ) av ailable for AI at week w , hour h is: G free ( w , h ) = G total − G RAN ( w , h ) (9) when ρ R = 1 or w = w dim , this reduces to G free ( h ) = G total − G RAN ( w dim , h ) . V . L L M D E M A N D M O D E L W e model the LLM inference demand that can be served by the AI-RAN cluster during off-peak RAN hours. The workload is characterized by three parameters: the market tok en price p tok [$/token], the per-GPU output throughput T GPU [tok/s], and the maximum sustainable request concurrenc y per GPU ρ max [req/GPU]. All three depend on the specific LLM and hardware platform and are instantiated for a concrete model in Section VII. Following Xiao et al. [15], we introduce a den- sity scaling parameter ρ dens (annual capability density growth factor) that captures the growth of LLM capability density ov er time: the same hardware deliv ers more capable output as models improve. The effecti ve per-GPU throughput at week w is then T GPU ( w ) = T GPU (0) · ρ w/ 52 dens [ tok/s ] , (10) where T GPU (0) is the baseline throughput at deployment ( w = 0 ). T o characterize temporal demand patterns and per -request workload, we use the BurstGPT dataset [32], a real-world trace of LLM inference requests containing timestamped records with input token counts. From this trace we e xtract two quan- tities: (i) a ra w hourly request count λ LLM ( h ) that captures the weekly shape of inference demand, and (ii) the a verage number of tokens per request ¯ T req . Because the trace does not identify individual users, we cannot directly estimate per- user request rates from it. Instead, we derive the baseline per-user daily request rate ¯ q (0) from public ChatGPT usage statistics: with approximately 2.5–3 billion prompts processed daily across 190.6 million daily acti ve users [33], the ratio yields ¯ q (0) ≈ 14 . 4 req/user/day . W e model the per -user daily request rate as a time-varying quantity ¯ q ( w ) that compounds at the annual growth factor ρ LLM : ¯ q ( w ) = ¯ q (0) · ρ w/ 52 LLM [ req/user/day ] , (11) where w is the week index. W e rescale the BurstGPT hourly profile so that its daily per -user average matches ¯ q ( w ) . Nor- malizing the raw counts gi ves the weekly shape ˆ λ LLM ( h ) = λ LLM ( h ) / P 23 h =0 r ( h ) , representing the fraction of daily re- quests that fall in hour h . The number of LLM-activ e users in the deployment area is A · ρ pop · η pen · η AI . The hourly request arriv al rate at week w , hour h is then Λ( w , h ) = ˆ λ LLM ( h ) 3600 · ¯ q ( w ) · A · ρ pop · η pen · η AI [ req/s ] . (12) The profile λ LLM ( h ) is shown alongside λ RAN ( h ) in Fig. 3. The two curves are partially anti-correlated : the morning RAN peak coincides with the LLM demand trough, and the ev ening LLM peak coincides with declining RAN load, with the primary conflict window between 14:00 and 18:00. This complementarity is fav orable for dual-use server scheduling (without considering de-synchronized curv es based on dif ferent markets or geographical areas). Giv en that each request requires on av erage ¯ T req tokens to process, the mean service time per request at week w is ¯ s ( w ) = ¯ T req /T GPU ( w ) . Applying Little’ s Law to the inference queue, the mean number of concurrently active requests at week w , hour h is L ( w , h ) = Λ( w , h ) · ¯ s ( w ) , and the number of GPUs required to sustain the full demand is: G LLM req ( w , h ) =  L ( w , h ) ρ max  =  Λ( w , h ) · ¯ T req T GPU ( w ) · ρ max  (13) Specific parameter values and the user scenarios ev aluated in this paper are detailed in Section VII. V I . R E V E N U E M O D E L F O R A I - O N - R A N The AI-RAN revenue model combines the RAN deployment cost (Section IV) with the incremental income generated by renting surplus compute to LLM inference consumers. At each week w , hour h , the number of GPUs av ailable for AI work- loads is G free ( w , h ) (Eq. (9)). The LLM allocation G LLM alloc ( w , h ) is capped by the instantaneous LLM demand from Section V: G LLM alloc ( w , h ) = min  G LLM req ( w , h ) , G free ( w , h )  . (14) T ABLE II S C EN A RI O PA R AM E T E RS – M I L AN D EN S E U R BA N D E P L OYM E N T . Parameter Symbol V alue Ref. RAN demand Population density ρ pop 7500 /km 2 [34] Smartphone penetration η pen 80% [35] Busy-hour concurrency α BH 10% [36] Per-user do wnlink rate R user (0) 300 Mbps [29] A vg. spectral efficienc y SE 9 bit/s/Hz [37] L1/L2 overhead η OH 20% [38] Power Usage Ef fectiveness PUE 1.5 [39] Electricity cost c elec 0.20 $/kWh [40] RAN demand growth f actor ρ R 1.2 [41] LLM infer ence (Llama 3.1 70B 8-bit Floating P oint (FP8)) Per-GPU output throughput T GPU (0) 37 tok/s [42] Max concurrency per GPU ρ max 23.5 req/GPU [43] A vg. tokens per request ¯ T req 969.17 [32] Per-user daily requests ¯ q (0) 14.4 req/day [33] T oken price p tok 0.88 $/Mtok [44] T oken depreciation ρ tok 0.5 [14] LLM adoption ratio η AI 50% [45] LLM demand growth f actor ρ LLM 16 [14] The resulting token throughput deli vered to inference con- sumers at week w , hour h is: T ( w , h ) = G LLM alloc ( w , h ) · ρ max · T GPU ( w ) [ tok/s ] . (15) T okens are priced at a market rate p tok [$/token]. Moti v ated by the empirical price trends documented by Demirer et al. [14], we model token price erosion through a depreciation factor ρ tok ∈ (0 , 1] , where : p tok ( w ) = p tok · ρ w/ 52 tok ; Over a deployment lifetime of W weeks, the total LLM re venue is thus: R LLM ( w ) = 7 · p tok ( w ) · 23 X h =0 T ( w , h ) · 3600 (16) and the total LLM rev enue over the deployment lifetime is R LLM = W − 1 X w =0 R LLM ( w ) . (17) The net economic gain of the AI-RAN inv estment over a con- ventional RAN-only deployment is then the dif ference between R LLM and the additional CapEx premium of GPU platforms ov er dedicated FEC-accelerator alternatives, amortized over the deployment lifetime. Scenario-specific parameter choices and the resulting rev enue projections are presented in Section VII. V I I . S C E NA R I O - B A S E D E V A L U A T I O N W e instantiate the models of Sections IV–VI for a dense urban deployment in Milan, Italy . T able II summarizes the RAN and LLM parameters used across all scenarios. Throughout the ev aluation, we consider two dimensioning strate gies for the AI- RAN cluster: • Scenario 1 : the cluster is sized for launch demand, w dim = 0 , with RAN demand held constant ov er the 10- year lifetime ( ρ R = 1 ). • Scenario 2 : the cluster is sized for end-of-horizon de- mand, w dim = W , with RAN demand growing at ρ R = 1 . 2 . 0 10 20 30 GPUs G RAN (0 , h ) G LLM alloc (0 , h ) Idle 0 24 48 72 96 120 144 168 0 100 200 h GPUs Fig. 4. Hourly GPU allocation at deployment ( w = 0 ) for clusters sized for Scenario 1 (top) and Scenario 2 (bottom). The total deployed capacity G total is split at each hour between RAN processing and LLM inference. 0 10 20 30 GPUs ˆ G RAN ( w ) ρ dens = 3 G tot ˆ G LLM alloc ( w ) ρ dens = 12 . 87 0 100 200 300 400 500 0 50 100 150 w [weeks] GPUs Fig. 5. W eekly-averaged GPU allocation (RAN plus LLM) over the 10-year horizon. T op: Scenario 1. Bottom: Scenario 2 . A. GPUs Allocation Figure 4 illustrates the hourly GPU allocation at deployment ( w = 0 ) under both dimensioning strategies, when considering the NVIDIA Aerial deployment. In Scenario 1 (top panel), the cluster is sized to cov er the weekly busy-hour peak, so little surplus remains for the LLM tenant; inference re venue is thus limited to hours when the RAN load is low . In Scenario 2 (bottom panel), the cluster is provisioned for end-of-horizon demand. At w = 0 , RAN traf fic is well below the dimensioning point, so the workload occupies only a fraction of the deployed capacity at any hour , and the surplus G free (0 , h ) is large— particularly during off-peak hours. As w increases to ward w dim , RAN demand absorbs a growing share of this surplus, progres- siv ely reducing the headroom available for AI tenants. In order to e valuate the weekly-a veraged GPU allocation, we define ˆ G ( w ) = 1 / 168 · P h G ( w , h ) , where 168 is the number of hours in a week. ˆ G ( w ) is traced for both LLM and RAN in Figure 5 across the full 10-year horizon. In Sce- nario 1, RAN demand is constant and consumes on av erage 15 GPUs throughout; LLM demand gro ws until it saturates the excess capacity after roughly four years, then plateaus at approximately 33 GPUs. In Scenario 2, the dynamics are richer . At deployment, the RAN workload occupies about 15 GPUs; as traffic grows at 20%/year, its allocation rises to roughly 100 GPUs by the end of the horizon. For the LLM metric, different v alues of ρ dens are explored, as there is still no consensus of what this ratio is and will be. The first one ( ρ dens = 3 ) has been estimated by Gundlach et Al, based on the price reduction [13], while the second one ( ρ dens = 12 . 87 ) has been estimated by [15] by fitting a expo- nential model on the performance of different LLMs. The ρ dens parameter significantly modulates both the peak allocation and its timing. Under ρ dens = 3 , slo wer performance densification translates into a steeper demand curve: in Scenario 2, the LLM allocation reaches approximately 190 GPUs as early as year two and a half, before declining as the RAN workload progressi vely absorbs the shared capacity . W ith ρ dens = 12 . 87 , efficiency gains more effecti vely of fset demand growth, keeping the peak near 122 GPUs around year fi ve and producing a gentler subse- quent decline. In Scenario 1, the contrast manifests in the rate of saturation rather than the ceiling: the low-densification case fills the av ailable e xcess capacity in roughly two years, while the high-densification case approaches the same ceiling only gradually ov er the full ten-year horizon. B. LLM Revenues T o characterize the interplay between token price defla- tion and performance densification, we define the ratio k = ρ tok /ρ dens , which captures the relativ e speed at which token prices erode with respect to the efficienc y gains achiev ed through model densification. In practice, these two quantities are structurally coupled: the market-observed token price de- flation is largely driv en by the v ery ef ficiency improvements that ρ dens tracks, making their ratio the natural quantity to parametrize. W e ev aluated the revenue metric under both ρ dens values considered above ( ρ dens = 3 and ρ dens = 12 . 87 ) and found only marginal differences in the resulting revenue trajectories; accordingly , the remainder of the analysis fixes ρ dens = 12 . 87 and sweeps k directly . Figure 6 illustrates how LLM revenue evolv es over the deployment horizon across different values of k . The case k =1 represents the neutral point at which token price defla- tion is exactly offset by ef ficiency gains, so the effecti ve per - token price remains constant over time. For k > 1 , deflation outpaces efficienc y improvement and the effecti ve price erodes monotonically . In Scenario 1 (top panel), all curves start at approximately $6000/week. The k =1 case gro ws steadily as the LLM workload fills the av ailable excess capacity , reaching a plateau of roughly $8750/week after about 92 weeks; beyond this inflection, the curve remains flat for the rest of the horizon since neither the ef fecti ve token price nor the GPU allocation change further . For k > 1 , ho wever , rev enue declines from the very first week: token price deflation erodes earnings faster than the gro wing GPU allocation can compensate. The decline accelerates once the capacity headroom is exhausted at the same 0 2 4 6 8 R LLM ( w ) [k$] k =1 k =1 . 25 k =1 . 5 k =2 0 100 200 300 400 500 0 20 40 60 80 W eek R LLM ( w ) [k$] Fig. 6. W eekly LLM gross revenue over the deployment lifetime under different values of k = ρ tok /ρ dens (token depreciation relativ e to efficiency improvement). T op: Scenario 1. Bottom: Scenario 2. inflection point, since the partial offsetting effect of additional GPU hours disappears. By the end of the ten-year horizon, the k =2 curve is reduced to a negligible fraction of its initial v alue, with k =1 . 5 and k =1 . 25 following the same qualitative trend at progressiv ely slower rates. In Scenario 2 (bottom panel), starting from approximately $31000/week, the k =1 curve grows continuously for roughly fiv e years—peaking near $80000/week around week 257— before declining as the e xpanding RAN w orkload progressiv ely reclaims shared GPU hours. For k > 1 , rev enue falls from the outset; beyond week 257 the loss of GPU capacity further compounds the token price erosion, steepening the decline for all curv es. Crucially , the k =2 curv e approaches zero by the end of the horizon, re vealing a structural threshold: when token price deflation outpaces ef ficiency gains by a f actor of two, LLM operation becomes economically non-viable regardless of the av ailable infrastructure. C. R OI Analysis T o determine whether the LLM co-tenant justifies the AI- RAN cost premium, we frame the problem as a marginal in vestment analysis. Let C Aerial and C FlexRAN denote the CapEx of each platform (Eq. (7)). The marginal in vestment is the additional CapEx required to deplo y Aerial over the FEC- accelerated alternativ e: I = C Aerial − C FlexRAN . (18) The cumulative net return is the gross LLM re venue minus the Aerial operational cost ov er the deployment horizon: R ( w ) = R LLM ( w ) − O ( w ) Aerial . (19) Break-ev en is reached when cumulati ve R equals I ; the ratio R/I quantifies the return multiple o ver the full horizon. Figure 7 shows the CapEx breakdown for Aerial and FlexRAN under both dimensioning strategies. W e focus on CapEx here because the OpEx depends on the actual LLM demand: under RAN-only operation, servers are po wer-g ated Scenario 1 Scenario 2 0 5 10 C [M$] Aerial FlexRAN Fig. 7. CapEx for the Milan network for Aerial and Fle xRAN under Scenario 1 and Scenario 2. 0 2 4 R(w) [M$] k =1 k =1 . 25 k =1 . 5 k =2 In vestment 0 100 200 300 400 500 0 10 20 30 w [weeks] R(w) [M$] Fig. 8. Cumulati ve LLM re venue R (Eq. (19)) over the deplo yment horizon un- der different values of k = ρ tok /ρ dens . T op: Scenario 1. Bottom: Scenario 2. The horizontal dashed line indicates the marginal in vestment I (Eq. (18)). during low-traf fic periods (e.g., overnight), so the incremental energy expenditure is directly tied to the LLM workload and is therefore accounted for in the net return R (Eq. (19)). In Scenario 1, Aerial requires a CapEx of $1.58M against $0.95M for FlexRAN, placing the marginal inv estment at I ≈ $0 . 62 M. In Scenario 2, the gap widens substantially: $9.68M versus $5.87M, yielding I ≈ $3 . 80 M. This premium is a direct consequence of the GPU-based server architecture, whose per- unit acquisition cost significantly exceeds that of the FPGA- accelerated FlexRAN hardware. Transitioning from Scenario 1 to Scenario 2 increases CapEx by 514% for Aerial and 518% for FlexRAN, since the cluster must be sized for end-of- horizon demand from day one. The trade-of f is that this over - provisioning creates a large pool of idle GPU hours from the outset, substantially enlarging the window for LLM rev enue generation throughout the deployment lifetime. Figure 8 plots cumulati ve re venue alongside the inv estment threshold for each scenario. The qualitativ e pattern mirrors the weekly re venue analysis: curv es with lo wer k accumulate faster and recov er the in vestment sooner , while higher k leads to slo wer accumulation or outright failure to break even. In Scenario 1 (top panel), the k =1 curv e crosses the threshold at approximately week 105 (two years into deployment), fol- lowed by k =1 . 25 around week 139. The k =1 . 5 curv e requires nearly the full horizon to recover the in vestment, crossing only 1.0 1.25 1.5 2.0 0 2 4 6 8 k R/I Scenario 1 Scenario 2 Fig. 9. Return on in vestment ( R /I ) of AI-RAN by scenario and depreciation ratio k = ρ tok /ρ dens . In vestment I and return R defined in Eqs. (18) and (19). around week 235. The k =2 curve nev er reaches break-even: the cumulativ e rev enue at week 520 ($0.35M) falls far short of the $0.62M threshold, confirming that token deflation at twice the rate of efficiency improv ement makes the in vestment irrecov erable on this timescale. In Scenario 2 (bottom panel), the in vestment threshold is substantially higher at $3.80M, yet the larger volume of idle GPU capacity av ailable from the start allows all curv es e xcept k =2 to cross it within the horizon. The k =1 curve breaks ev en around week 123, k =1 . 25 around week 166, and k =1 . 5 recov ers the in vestment near week 280. As in Scenario 1, the k =2 case fails to break e ven: despite generating $1.94M in cumulati ve re venue, it does not reach the $3.80M threshold, as the compounding ef fect of token deflation and shrinking GPU av ailability forecloses recovery . Figure 9 summarises the final return multiple R /I across all depreciation scenarios. The central finding is that Scenario 2 consistently outperforms Scenario 1 in all cases where the in- vestment is profitable: at k =1 , the return multiple reaches 8 . 2 × in Scenario 2 versus 6 . 7 × in Scenario 1, and at k =1 . 25 the figures are 2 . 7 × and 2 . 4 × , respectiv ely . This advantage stems directly from the deliberate o ver -provisioning of Scenario 2: by dimensioning the cluster for end-of-horizon demand, the operator creates an infrastructure surplus that can be leased for LLM inference from day one, generating rev enue that far exceeds the additional CapEx. At k =1 . 5 , both scenarios remain marginally profitable ( R/I ≈ 1 . 31 × and 1 . 25 × ), confirming that moderate token deflation can still be offset by the efficienc y gains embedded in k . At k =2 , both scenarios yield R/I < 1 (0.56 × and 0.51 × ), establishing a hard threshold: when token price erosion outpaces efficiency improv ement by a factor of two, the LLM co-tenancy fails to recov er the infrastructure premium reg ardless of the dimensioning strategy . Notably , the k =2 return is slightly lower in Scenario 2 than in Scenario 1, because the larger inv estment denominator more than offsets the additional rev enue generated by the excess capacity . V I I I . C O N C L U S I O N This paper presented a techno-economic framew ork for e v al- uating AI-RAN deployments in which GPU-accelerated base- band platforms share idle capacity with LLM inference work- loads. By combining publicly a vailable hardware benchmarks, IMT -2030-anchored demand models, and empirical LLM traf- fic traces, we constructed a unified cost and re venue model and instantiated it for a dense urban deplo yment in Milan, Italy . The analysis compared two dimensioning strategies:sizing the cluster for launch demand (Scenario 1) and for end-of-horizon demand (Scenario 2);across a range of token depreciation-to- densification ratios k . Three principal findings emerge. For all k ≤ 1 . 5 , the cumulativ e inference rev enue exceeds the marginal cost pre- mium of the GPU-based platform, with return multiples up to 8 . 2 × . Deliberate over -provisioning (Scenario 2) consistently outperforms conserv ative dimensioning, as the early-year sur- plus generates substantial re venue before RAN demand absorbs the excess capacity . The ratio k = ρ tok /ρ dens emerges as the decisi ve parameter: at k ≥ 2 , the in vestment becomes irrecov erable regardless of dimensioning strategy , establishing a clear viability boundary for AI-RAN economics. Sev eral limitations of the current analysis suggest direc- tions for future work. The re venue model prices inference tokens at prev ailing cloud market rates and does not capture the additional value that edge-local inference may command: lower latency for interactiv e applications, reduced backhaul and transport costs for both the RAN operator and the LLM tenant, and data locality adv antages for pri vac y-sensitive workloads. Incorporating an edge premium into the token price would likely strengthen the economic case further . The LLM demand model assumes a single workload type (LLMs); incorporat- ing heterogeneous AI workloads such as image generation, embedding computation, or fine-tuning jobs would broaden the rev enue base and potentially smooth the demand profile. Finally , the analysis also assumes a single geographic market with synchronized RAN and LLM diurnal patterns; in practice, operators serving multiple time zones or offering compute to geographically dispersed AI tenants could exploit additional temporal di versity . Despite these simplifications, the frame work provides a principled and e xtensible basis for operators and policymakers to assess the economic viability of accelerator- based RAN architectures in the transition to 6G. R E F E R E N C E S [1] GSMA Intelligence, “Global mobile trends 2023, ” 2023, accessed: 2025-06. [Online]. A vailable: https://www . gsma . com/ [2] McKinsey Global Institute, “The economic potential of generativ e AI: The next productivity frontier , ” McKinse y & Company , T ech. Rep., Jun. 2023, accessed: 2025-06. [Online]. A vailable: https://www . mckinsey . com/capabilities/mckinsey- digital/our - insights/ the- economic- potential- of- generativ e- ai- the- next- producti vity- frontier [3] Y . W ang, Y . Chen et al. , “BurstGPT: A real-world workload dataset to optimize LLM serving systems, ” in Pr oceedings of the 31st A CM SIGKDD Confer ence on Knowledge Discovery and Data Mining (KDD ’25) . T oronto, ON, Canada: A CM, 2025. [Online]. A vailable: https://doi . or g/10 . 1145/3711896 . 3737413 [4] M. Polese, N. Mohamadi et al. , “Beyond connectivity: An open architecture for AI-RAN con vergence in 6G, ” IEEE Communications Magazine (to appear) , 2025. [Online]. A vailable: https://arxiv . org/abs/ 2507 . 06911 [5] AI-RAN Alliance, “Industry leaders form AI-RAN alliance, ” 2024, mWC Barcelona. [Online]. A vailable: https://ai- ran . org/ne ws/industry- leaders- in- ai- and- wireless- form- ai- ran- alliance/ [6] L. Kundu, X. Lin et al. , “AI-RAN: Transforming RAN with AI-driv en computing infrastructure, ” arXiv preprint , 2025. [Online]. A vailable: https://arxi v . org/abs/2501 . 09007 [7] S. D. A. Shah, Z. Nezami et al. , “The interplay of AI-and- RAN: Dynamic resource allocation for con ver ged 6G platform, ” arXiv preprint arXiv:2503.07420 , 2025. [Online]. A vailable: https: //arxiv . or g/abs/2503 . 07420 [8] S. D. A. Shah, M. Hafeez et al. , “Proactiv e AI-and-RAN workload orchestration in O-RAN architectures, ” arXiv pr eprint arXiv:2507.09124 , 2025. [Online]. A vailable: https://arxi v . org/abs/2507 . 09124 [9] Analysys Mason, “Open RAN TCO analysis, ” T ech. Rep., 2022, commissioned by Wind River . [Online]. A vailable: https://www . analysysmason . com/contentassets/ b3260036a0d449718117eeaf5ac83472/analysys mason open ran tco feb2022 rma16 rma18 . pdf [10] ——, “Open RAN progress dri ves confidence, ” T ech. Rep., 2024, commissioned by Wind River . [Online]. A vailable: https://www . analysysmason . com/contentassets/ a99b7d01b9e64a2cafc375459c99de99/analysys mason open ran confidence apr2024 rma18 . pdf [11] S. Pongratz, “Open RAN and vRAN revenue trends, ” 2024, dell’Oro Group. [Online]. A vailable: https://www . fierce- network . com/ modernization/open- ran- and- vran- tanks- 2023 [12] E. Erdil, “Inference economics of language models, ” arXiv pr eprint arXiv:2506.04645 , 2025. [Online]. A vailable: https://arxiv . org/abs/ 2506 . 04645 [13] H. Gundlach, J. L ynch et al. , “The price of progress: Algorithmic efficienc y and the falling cost of AI inference, ” arXiv preprint arXiv:2511.23455 , 2025. [Online]. A vailable: https://arxiv . org/abs/ 2511 . 23455 [14] M. Demirer , A. Fradkin et al. , “The emerging market for intelligence: Pricing, supply , and demand for LLMs, ” 2025, working paper . [Online]. A vailable: https://andreyfradkin . com/assets/LLM Demand 12 12 2025 . pdf [15] C. Xiao, J. Cai et al. , “Densing law of LLMs, ” Natur e Machine Intelli- gence , vol. 7, pp. 1823–1833, 2025. [16] Y . W ang, Y . Chen et al. , “BurstGPT: A real-world workload dataset to optimize LLM serving systems, ” arXiv pr eprint arXiv:2401.17644 , 2024. [Online]. A vailable: https://arxi v . org/abs/2401 . 17644 [17] G. Cai, R. T ian et al. , “Efficient inference for edge lar ge language models: A survey , ” Tsinghua Science and T echnology , vol. 31, no. 3, pp. 1365– 1380, 2026. [18] D. Ding, A. Mallick et al. , “Hybrid LLM: Cost-efficient and quality- aware query routing, ” in Proc. ICLR , 2024. [Online]. A vailable: https://openrevie w . net/forum?id=02f3mUtqnM [19] A. K elkar and C. Dick, “NVIDIA Aerial GPU hosted AI-on-5G, ” in Proc. IEEE 4th 5G W orld F orum (5GWF) , 2021, pp. 64–69. [20] A. Garcia-Saa vedra and X. Costa-Perez, “O-RAN: Disrupting the vir- tualized RAN ecosystem, ” IEEE Communications Standards Magazine , vol. 5, no. 4, pp. 96–103, 2021. [21] NVIDIA, “Enabling the world’ s first GPU-accelerated 5G Open RAN for NTT DOCOMO with NVIDIA Aerial, ” 2023, accessed: 2025-12. [On- line]. A vailable: https://developer . n vidia . com/blog/enabling- the- worlds- first- gpu- accelerated- 5g- open- ran- for- ntt- docomo- with- n vidia- aerial/ [22] ——, “Aerial CUD A-accelerated RAN release 25-2, ” 2025, accessed: 2025-12. [Online]. A vailable: https://docs . nvidia . com/aerial/ cuda- accelerated- ran/25- 2/aerial- cuda- accelerated- ran . pdf [23] Quanta Cloud T echnology, “QCT VRB100 vRAN server specifications, ” 2024, v endor specifications; Accessed: 2025-12. [Online]. A vailable: https://www . qct . io/ [24] Intel and Hewlett Packard Enterprise, “V erified reference configuration for virtualized RAN on the HPE ProLiant DL110, ” 2022, accessed: 2025- 12. [Online]. A vailable: https://builders . intel . com/docs/networkb uilders/ intel- hpe- verified- reference- configuration- for - virtualized- radio- access- networks- on- the- hpe- proliant- dl110- 1653673153 . pdf [25] A. Kelkar and C. Dick, “Nvidia aerial gpu hosted ai-on-5g, ” in 2021 IEEE 4th 5G W orld F orum (5GWF) . IEEE, 2021, pp. 64–69. [26] C. W ang, H. Nie et al. , “Understanding 5G Performance on Hetero- geneous Computing Architectures, ” IEEE Communications Magazine , vol. 63, no. 3, pp. 107–113, March 2025. [27] Intel Corporation, “Intel accelerates 5G leadership with new products, ” 2023, accessed: 2025-12. [Online]. A vailable: https://www . intc . com/news- ev ents/press- releases/detail/1606/ intel- accelerates- 5g- leadership- with- new- products [28] ITU-R, “Report ITU-R M.2412-0: Guidelines for evaluation of radio interface technologies for IMT -2020, ” International T elecommunication Union, T ech. Rep., Oct. 2017, dense Urban-eMBB; Accessed: 2025-06. [Online]. A vailable: https://www . itu . int/dms pub/itu- r/opb/rep/R- REP- M . 2412- 2017- PDF- E . pdf [29] ——, “Recommendation ITU-R M.2160-0: Framework and overall objectiv es of the future dev elopment of IMT for 2030 and beyond, ” International T elecommunication Union, T ech. Rep., Nov . 2023, accessed: 2025-06. [Online]. A vailable: https://www . itu . int/dms pubrec/ itu- r/rec/m/R- REC- M . 2160- 0- 202311- I!!PDF- E . pdf [30] ——, “Report ITU-R M.2410-0: Minimum requirements related to technical performance for IMT -2020 radio interface(s), ” International T elecommunication Union, T ech. Rep., Nov . 2017, accessed: 2025-06. [Online]. A vailable: https://www . itu . int/pub/R- REP- M . 2410 [31] G. Barlacchi, M. De Nadai et al. , “ A multi-source dataset of urban life in the city of Milan and the Province of T rentino, ” Scientific Data , vol. 2, p. 150055, 2015. [Online]. A vailable: https://doi . org/10 . 1038/sdata . 2015 . 55 [32] Y . W ang, Y . Chen et al. , “BurstGPT: A real-world workload dataset to optimize LLM serving systems, ” arXiv pr eprint arXiv:2401.17644 , 2024. [Online]. A vailable: https://arxi v . org/abs/2401 . 17644 [33] R. A. Lee, “Claude vs. ChatGPT statistics 2026: Head-to-head numbers behind the AI battle, ” 2025, 2.5–3B prompts/day , 190.6M D A U; Accessed: 2025-10. [Online]. A vailable: https://sqmagazine . co . uk/ claude- vs- chatgpt- statistics/ [34] IST A T, “Resident population and population density – municipality of Milan, ” 2024, population density ≈ 7 500/km 2 ; Accessed: 2025-06. [Online]. A vailable: https://www . istat . it/en/ [35] GSMA Intelligence, “The state of mobile internet con- nectivity 2024, ” 2024, accessed: 2025-06. [Online]. A vail- able: https://data . gsmaintelligence . com/research/research/research- 2024/ the- state- of- mobile- internet- connectivity- 2024 [36] J. Lorincz and Z. Klarin, “ A comprehensi ve analysis of the impact of an increase in user devices on the long-term energy efficienc y of 5G networks, ” Smart Cities , vol. 7, no. 6, pp. 3616–3657, 2024. [Online]. A vailable: https://www . mdpi . com/2624- 6511/7/6/140 [37] ITU-R WP 5D, “Preliminary draft ne w report ITU-R M.[IMT - 2030.TECH PERF REQ]: Minimum requirements related to technical performance for IMT -2030 radio interface(s), ” International T elecommu- nication Union, T ech. Rep., 2024, working document; Accessed: 2025- 06. [38] 3GPP, “TS 38.214: NR; physical layer procedures for data (release 18), ” 3rd Generation P artnership Project, T ech. Rep., 2024, accessed: 2025-06. [Online]. A vailable: https://www . 3gpp . or g/dynareport/38214 . htm [39] Uptime Institute, “2024 global data center sur- vey: Report, ” 2024, accessed: 2025-06. [Online]. A vail- able: https://datacenter . uptimeinstitute . com/rs/711- RIA- 145/images/ 2024 . GlobalDataCenterSurvey . Report . pdf [40] Eurostat, “Electricity price statistics, ” 2024, household and industrial prices; 0.3291 EUR/kWh; Accessed: 2025-06. [Online]. A vail- able: https://ec . europa . eu/eurostat/statistics- explained/index . php?title= Electricity price statistics [41] Ericsson, “Ericsson mobility report, ” T ech. Rep., Nov ember 2025, accessed: February 23, 2026. [Online]. A vailable: https://www . ericsson . com/4aca6f/assets/local/reports- papers/mobility- report/documents/2025/ericsson- mobility- report- november - 2025 . pdf [42] Artificial Analysis, “LLM performance leaderboard: Model and API provider benchmarks, ” 2024, accessed: 2025-06. [Online]. A vailable: https://artificialanalysis . ai/leaderboards/models [43] J. Pichlmeier , P . Ross, and A. Lucko w , “Performance characterization of expert router for scalable LLM inference, ” in arXiv preprint arXiv:2404.15153 , 2024. [Online]. A vailable: https://arxiv . org/abs/ 2404 . 15153 [44] T ogether AI, “T ogether AI – inference pricing, ” 2024, llama 3.3 70B: $0.88/Mtok; Accessed: 2025-06. [Online]. A vailable: https: //www . together . ai/pricing [45] B. Sav oldi, G. Attanasio et al. , “Generati ve ai practices, literacy , and divides: An empirical analysis in the italian context, ” 2025. [Online]. A vailable: https://arxi v . org/abs/2512 . 03671

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment