Private Information Disclosure from Web Searches. (The case of Google Web History)

Reading time: 6 minute
...

📝 Original Info

  • Title: Private Information Disclosure from Web Searches. (The case of Google Web History)
  • ArXiv ID: 1003.3242
  • Date: 2015-03-13
  • Authors: Researchers from original ArXiv paper

📝 Abstract

As the amount of personal information stored at remote service providers increases, so does the danger of data theft. When connections to remote services are made in the clear and authenticated sessions are kept using HTTP cookies, data theft becomes extremely easy to achieve. In this paper, we study the architecture of the world's largest service provider, i.e., Google. First, with the exception of a few services that can only be accessed over HTTPS (e.g., Gmail), we find that many Google services are still vulnerable to simple session hijacking. Next, we present the Historiographer, a novel attack that reconstructs the web search history of Google users, i.e., Google's Web History, even though such a service is supposedly protected from session hijacking by a stricter access control policy. The Historiographer uses a reconstruction technique inferring search history from the personalized suggestions fed by the Google search engine. We validate our technique through experiments conducted over real network traffic and discuss possible countermeasures. Our attacks are general and not only specific to Google, and highlight privacy concerns of mixed architectures using both secure and insecure connections.

💡 Deep Analysis

Deep Dive into Private Information Disclosure from Web Searches. (The case of Google Web History).

As the amount of personal information stored at remote service providers increases, so does the danger of data theft. When connections to remote services are made in the clear and authenticated sessions are kept using HTTP cookies, data theft becomes extremely easy to achieve. In this paper, we study the architecture of the world’s largest service provider, i.e., Google. First, with the exception of a few services that can only be accessed over HTTPS (e.g., Gmail), we find that many Google services are still vulnerable to simple session hijacking. Next, we present the Historiographer, a novel attack that reconstructs the web search history of Google users, i.e., Google’s Web History, even though such a service is supposedly protected from session hijacking by a stricter access control policy. The Historiographer uses a reconstruction technique inferring search history from the personalized suggestions fed by the Google search engine. We validate our technique through experiments conduc

📄 Full Content

Private Information Disclosure from Web Searches (The case of Google Web History) Claude Castelluccia1, Emiliano De Cristofaro2, Daniele Perito1 1 INRIA Rhone Alpes, Montbonnot, France {claude.castelluccia, daniele.perito}@inrialpes.fr 2 Information and Computer Science, University of California, Irvine edecrist@uci.edu Abstract. As the amount of personal information stored at remote service providers increases, so does the danger of data theft. When connections to remote services are made in the clear and authenticated sessions are kept using HTTP cookies, data theft becomes extremely easy to achieve. In this paper, we study the architecture of the world’s largest service provider, i.e., Google. First, with the exception of a few services that can only be accessed over HTTPS (e.g., Gmail), we find that many Google services are still vulnerable to simple session hijacking. Next, we present the Historiographer, a novel attack that reconstructs the web search history of Google users, i.e., Google’s Web History, even though such a service is supposedly protected from session hijacking by a stricter access control policy. The Historiog- rapher uses a reconstruction technique inferring search history from the personalized suggestions fed by the Google search engine. We validate our technique through experiments conducted over real network traffic and discuss possible countermeasures. Our attacks are general and not only specific to Google, and highlight privacy concerns of mixed architectures using both secure and insecure connections. Update: Our report was sent to Google on February 23rd, 2010. Google is inves- tigating the problem and has decided to temporarily suspend search suggestions from Search History. Furthermore, Google Web History page is now offered over HTTPS only. Updated information about this project is available at: http://planete.inrialpes.fr/projects/private-information-disclosure-from-web-searches 1 Introduction With the emergence of cloud-based computing, users store an increasing amount of information at remote service providers. User profiling techniques can complement such information automatically. Cloud-based services often come at no cost for the users, while service providers leverage considerable amounts of user profiling information to deliver targeted advertisement. Such a business model appears to benefit all par- ties. However, storing large amounts of personal information to external providers raises privacy concerns. Privacy advocates have highlighted the conceptual and practical dangers of personal data exposure over the Internet [12,13,14,15]. In this paper, we analyze private information potentially leaked from web searches to third parties, rather than focusing on data disclosed to service providers. The case of Google Web History. Being the world’s largest service provider (according to alexa.com), we focus on the case of Google. In particular, we analyze one Google service: Web History. It provides users with personalized search results based on the history of their searches and navigation. Such a history is accessible at http://google.com/history. For more details, we refer to Section 2. Web searches have been shown to be often sensitive [15]. Any information leaked from search histories could endanger user privacy. For example, the spread of influenza and the number of related search queries arXiv:1003.3242v3 [cs.CR] 23 Mar 2010 divided by region has been successfully correlated [17]: this suggests that search histories contain health- related data and possibly other personal information, including, but not restricted to: political or religious views, sexual orientation, etc. Furthermore, AOL’s release in 2006 of 20 million nominally anonymized searches underlined that search queries contain private information [11]. The privacy of personal data stored by service providers has been long threatened by the well-known attacks consisting of hijacking user’s HTTP cookies1. These attacks have been addressed by Google in several ways. For instance, “sensitive” services such as Gmail now enforce secure HTTPS communication by default and transmit authentication cookies only over encrypted connections. As for the privacy of the Google Web History, its login page states: “To help protect your privacy, we’ll sometimes ask you to verify your password even though you’re already signed in. This may happen more frequently for services like Web History which involves your personal information”. Frequently requesting users to re-enter their credentials can thwart the session hijacking attack, however, as illustrated in this paper, such an attack can still be effective if a user has just signed in. Moreover, we show that search histories can still be reconstructed even though the Web History page is inaccessible by hijacking cookies. The Historiographer. To this end, we successfully design the Historiographer, an attack that reconstructs the history of web searches conducted by users on Google.

…(Full text truncated)…

Reference

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut