How Much of the United States Can Still Host New Hyperscale Data Centers? A Constraint-Based Feasibility Analysis
The rapid expansion of hyperscale data centers, primarily driven by cloud computing and generative AI is placing growing pressure on electricity systems, land, and climate-sensitive infrastructure. While existing maps document where data centers are currently located, a major unanswered question remains: where can hyperscale data centers still be built under present-day physical, infrastructural, and environmental constraints? Here we address this question, focusing on the United States, using a national-scale, constraint-first geospatial framework that infers feasibility from revealed hyperscale siting patterns rather than from demand forecasts or optimization assumptions. By combining power-grid adjacency, environmental limits, land-use constraints, and climatic constraints within a uniform hexagonal spatial system, we estimate the feasible hyperscale hosting capacity. Our presented approaches converge on a limited feasible land envelope, implying a substantial contraction relative to naive land-availability assumptions. Based on observed build-out patterns, we estimate that total physically feasible U.S. hyperscale capacity lies in the tens of gigawatts rather than the hundreds. The results of this piece are intended to support national-scale reasoning about infrastructure feasibility under modern constraints.
💡 Research Summary
The paper tackles a pressing question for the United States: given today’s physical, infrastructural, and environmental constraints, how much additional hyperscale (or “hyper‑scale”) data‑center capacity can still be sited? Rather than forecasting demand or running cost‑optimisation models, the authors adopt a “constraint‑first” approach that infers feasibility directly from the observed spatial patterns of existing hyperscale facilities.
Data and preprocessing – The authors assemble a comprehensive, publicly‑available geospatial data stack covering the contiguous United States. The stack includes high‑voltage transmission lines (≥115 kV), substations, large (>50 MW) power plants, EMM market regions, climate normals (mean July temperature, max August temperature) from PRISM, surface water, wetlands, FEMA flood zones, elevation and slope from USGS, land‑cover from NLCD, population and built‑up intensity from GHSL, protected areas from PAD‑US, and metropolitan boundaries (CBSA). Existing data‑center sites are taken from a Business‑Insider dataset of roughly 1,200 facilities. All layers are projected to a common metric CRS and aggregated onto Uber’s H3 hierarchical hexagonal grid.
Spatial resolution – Using DBSCAN clustering on known data‑center points, the authors find stable cluster radii of 23–27 km, which correspond to H3 resolution 4 (≈25 km² per hexagon). This resolution is adopted for the entire analysis, yielding 4,348 hexagons nationwide.
Labeling of reference regions – For each hexagon the authors compute lower‑ and upper‑bound sustained power demand (MW) from reported annual energy use. Hexagons with an upper‑bound demand ≥ 20 MW are labeled “hyperscale”; those with a lower‑bound ≤ 5 MW are labeled “non‑hyperscale”. The remaining cells are uncertain. To resolve uncertainty without circularity, a binary classifier is trained using only high‑confidence hyperscale and non‑hyperscale hexagons, with features limited to power‑infrastructure metrics (distances to transmission lines, substations, plants, and regional generation capacity). This yields 40 hyperscale hexagons (0.9 % of the grid) and 151 non‑hyperscale hexagons; the rest are negligible. Summing the upper‑bound demand of the hyperscale hexagons gives a current hyperscale load of roughly 4.5–5 GW (≈40–45 TWh/yr), about 1 % of total U.S. electricity consumption.
Feasibility modelling – similarity‑based approach – All numeric features are rank‑scaled to a uniform
Comments & Academic Discussion
Loading comments...
Leave a Comment