Arxiv 2601.00912

Reading time: 7 minute
...

๐Ÿ“ Original Info

  • Title: Arxiv 2601.00912
  • ArXiv ID: 2601.00912
  • Date:
  • Authors: Unknown

๐Ÿ“ Abstract

๐Ÿ“„ Full Content

Something strange is happening with AI-powered search. Ask ChatGPT "What is Notion?" and you'll get a detailed, accurate response. But ask "What are the best notetaking apps?" and Notion might not even appear. This gap between recognition and recommendation is what I set out to study.

The shift matters because how people discover products is changing. If this trend continues, and there’s every reason to think it will, then understanding how products appear in AI responses becomes a business necessity, not just an academic curiosity.

Traditional search engine optimization has been studied exhaustively since Brin and Page published their PageRank paper in 1998 [2]. We know how Google ranks pages. We know what makes a site authoritative. But LLMs work differently. They don’t return ranked lists of links; they generate synthesized responses. The rules are different and we don’t fully understand them yet.

New startups sit at a particular disadvantage here. Consider the mechanics:

This study examines six questions: RQ1: How large is the gap between direct queries (asking about a product by name) and discovery queries (category searches where products might organically appear)?

Aggarwal and colleagues introduced the term “Generative Engine Optimization” in their 2024 paper [1], which tested nine content optimization strategies across 10,000 queries.

Their findings suggested that certain signals like citations, statistics, technical terminology and authoritative language could improve visibility in search-augmented generation systems by 30-40%. But there was a catch that the authors acknowledged only briefly. Their study focused on informational content that already appeared in search results. They tested optimization strategies for content that was already discoverable. The question of whether these strategies help products that aren’t discoverable in the first place. This is the situation most new startups face and needs a solution.

Product Hunt offers an unusual natural experiment for studying visibility. It’s estimated to receive a monthly traffic ranging from 2.93M to 3.3M visitors. Every product launched on the platform receives identical initial exposure. It competes for attention based on community voting and the daily leaderboard. The result is a population of products with varying levels of success metrics (upvotes, comments, ranking) but similar launch contexts.

The distinction between direct and discovery queries is central to this research. Direct queries explicitly name the product: “What is [ProductName]?” or “Tell me about [ProductName].” These test whether the LLM recognizes the product exists.

Discovery queries are category-based: “What are the best AI coding assistants launched in 2025?” or “Recommend some new productivity tools.” These test whether the product appears organically when users aren’t specifically looking for it.

I used ten queries per product: three direct queries and seven discovery queries. The complete query templates are provided in Appendix C.

Following the framework from Aggarwal et al. [1], I built a composite GEO score measuring six dimensions: presence of statistics on the website, citation density, technical terminology usage, authoritative language patterns, structured data implementation and content depth. Each dimension was scored 0-100 and the composite score averaged across all six.

The scoring was primarily regex-based, looking for patterns like numbers followed by percentage signs, citation markers, technical jargon. This approach has obvious limitations; it can’t capture the nuance of truly well-optimized content. But it provides a reasonable proxy for the kinds of signals the GEO literature identifies as important. All queries were executed via API between December 15-20, 2025. I used consistent parameters across all calls: temperature 0.7, no system prompt modifications.

The numbers tell the story better than I expected.

Product Hunt metrics showed a clear pattern: they matter for Perplexity but not for ChatGPT.

For Perplexity discovery, POTD rank showed a significant negative correlation (r = -0.286, p = 0.002), meaning better-ranked products achieved higher visibility. Upvotes correlated positively (r = +0.225, p = 0.017), as did the product rating (r = +0.187, p = 0.048).

For ChatGPT, none of these correlations reached significance. The model appears to treat products essentially at random, which makes sense given its training data limitations. This directly contradicts the implications of the Aggarwal et al. study [1]

Traditional SEO signals made a comeback in the Perplexity data. Referring domains emerged as the strongest predictor of discovery visibility (r = +0.319, p < 0.001).

Dofollow ratio also showed significance (r = +0.238, p = 0.012).

This makes intuitive sense. Perplexity searches the web in real-time, so the factors that make a page rank well in traditional search also make it more likely to be found and cited by Perplexity.

Perplexity’s web search provided a 2.5x advantage in discovery rate over ChatGPT (8.29% vs 3.32%). But perhaps more importantly, the architectures showed dramatically different predictability.

For ChatGPT: zero significant correlations. Discovery appears essentially random.

For Perplexity: seven significant correlations. Discovery follows identifiable patterns.

This “architecture divide” has real implications. If you’re optimizing for LLM discovery, targeting web-search models gives you levers to pull. Targeting knowledge-cutoff models gives you… hope, I suppose.

The community signal story required some detective work. Initial analysis showed Reddit mentions had no correlation with visibility (r = +0.052, p = 0.586). But something felt off.

Looking at the data, I noticed products with generic names: “Cursor,” “DROP,” “Solar”, had enormous Reddit mention counts but no discovery success. The Reddit search was picking up unrelated posts: people discussing mouse cursors, music drops, solar panels.

After identifying and removing 52 products with generic names prone to false positives (46% of the sample), the picture changed dramatically:

Reddit Mentions (cleaned): r = +0.395, p = 0.002 Unique Subreddits (cleaned): r = +0.405, p = 0.001 Community presence does matter, but only when measured accurately. GitHub stars and Hacker News showed no significant effects in either analysis. Query architecture: Direct queries activate fact-retrieval mechanisms (“What is X?”), while discovery queries activate recommendation systems with different biases.

The null GEO finding deserves careful interpretation. I’m not claiming GEO optimization doesn’t work. The Aggarwal study [1] provides compelling evidence that it does, in certain contexts.

What I’m claiming is narrower: for new products trying to break into AI visibility, GEO optimization doesn’t help because the fundamental discoverability barrier remains. It’s like polishing a car that’s stuck in a garage. The polish might be excellent, but it won’t get the car on the road.

This suggests a staged approach: build discoverability first (through SEO, community presence and time), then optimize content for AI visibility once you’re actually being found.

The stark difference between ChatGPT (0 significant predictors) and Perplexity (7 significant predictors) points toward a practical strategy: focus resources on web-search LLMs.

For Perplexity and similar models, the playbook looks familiar: build referring domains, earn Product Hunt success, cultivate genuine Reddit discussion. These are the same activities that drive traditional marketing success, now with the added benefit of AI visibility.

For knowledge-cutoff models, there’s not much to do except wait. As these models update their training data, products will gradually enter their knowledge base. But that timeline isn’t something founders can directly influence.

Several caveats apply to these findings:

The sample comes entirely from Product Hunt, which skews toward certain product categories (developer tools, productivity, AI). Results might differ for consumer products, enterprise software, or products launched through other channels.

I tested two LLMs. Claude, Gemini and others might show different patterns.

The Reddit cleaning removed 46% of the sample. While necessary for accuracy, this reduces statistical power for the community signal analysis.

GEO scoring was regex-based. More sophisticated content analysis might capture signals my approach missed.

This research documented a substantial visibility gap in how LLMs discover products.

Testing 112 randomly selected Product Hunt startups from the top 500 of 1825 across 2,240 queries revealed that while LLMs recognize products when asked directly (99.4%), they rarely recommend them organically (3.32% for ChatGPT, 8.29% for Perplexity).

The 30:1 gap for ChatGPT represents a structural barrier that new startups cannot directly overcome. GEO optimization, the set of techniques proposed for improving AI visibility, showed no correlation with discovery success, suggesting it functions as a multiplier that requires an existing base of visibility to work.

Web-search augmented models like Perplexity offered both better discovery rates and identifiable optimization paths. Referring domains, Product Hunt success and cleaned Reddit presence all predicted visibility. The architecture divide between knowledgecutoff and web-search LLMs emerged as the central finding with practical implications.

For startup founders, the strategic implication is counterintuitive: don’t optimize for AI discovery directly. Build the traditional foundations on SEO authority, community presence and platform success. AI visibility will follow. The cart cannot pull the horse.

๐Ÿ“ธ Image Gallery

page_1.png page_2.png page_3.png

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

โ†‘โ†“
โ†ต
ESC
โŒ˜K Shortcut