Enforcing transparent access to private content in social networks by means of automatic sanitization

Enforcing transparent access to private content in social networks by   means of automatic sanitization
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Social networks have become an essential meeting point for millions of individuals willing to publish and consume huge quantities of heterogeneous information. Some studies have shown that the data published in these platforms may contain sensitive personal information and that external entities can gather and exploit this knowledge for their own benefit. Even though some methods to preserve the privacy of social networks users have been proposed, they generally apply rigid access control measures to the protected content and, even worse, they do not enable the users to understand which contents are sensitive. Last but not least, most of them require the collaboration of social network operators or they fail to provide a practical solution capable of working with well-known and already deployed social platforms. In this paper, we propose a new scheme that addresses all these issues. The new system is envisaged as an independent piece of software that does not depend on the social network in use and that can be transparently applied to most existing ones. According to a set of privacy requirements intuitively defined by the users of a social network, the proposed scheme is able to: (i) automatically detect sensitive data in users’ publications; (ii) construct sanitized versions of such data; and (iii) provide privacy-preserving transparent access to sensitive contents by disclosing more or less information to readers according to their credentials toward the owner of the publications. We also study the applicability of the proposed system in general and illustrate its behavior in two case studies.


💡 Research Summary

The paper addresses the growing privacy concerns associated with user‑generated content on social networking services (SNS). Existing privacy‑preserving approaches either rely on rigid, coarse‑grained access control mechanisms or require the cooperation of the SNS operators, and they rarely give users insight into which parts of their posts are considered sensitive. To overcome these limitations, the authors propose an independent software suite that can be deployed as a client‑side proxy or a browser extension, thus operating transparently on top of any existing platform without requiring any changes to the underlying service.

The system is built around three core capabilities: (1) automatic detection of sensitive information, (2) generation of sanitized versions of the detected content, and (3) credential‑based, fine‑grained disclosure to readers. Sensitive data detection combines a domain‑adapted BERT‑based named‑entity recognizer with rule‑based regular‑expression matching. The recognizer is trained to identify structured identifiers (e.g., phone numbers, email addresses, national ID numbers) as well as unstructured but privacy‑relevant concepts such as health status, financial details, or political opinions. Users can provide feedback to continuously refine the detector and reduce false positives/negatives.

Once sensitive tokens are identified, the sanitization engine applies user‑defined policies that specify the level of redaction for each data type. Policies support multiple strategies: masking (e.g., “123‑*”), abstraction (e.g., “Seoul, South Korea” instead of a full address), summarization, or complete removal. A semantic‑preservation model checks that the sanitized text still conveys the intended meaning, preventing over‑sanitization that would render the post unintelligible.

The third component implements a credential‑based, differential access control model. Content owners assign “access levels” to their contacts (friend, follower, public) and to external entities (government‑verified, corporate partner, anonymous). When a reader requests a post, the system authenticates the reader, determines the appropriate access level, and dynamically serves the corresponding sanitized version. The original, fully‑sensitive version is stored encrypted on the user’s device or a secure cloud, never exposed to unauthorized parties.

Architecturally, the three modules are realized as micro‑services communicating via RESTful APIs and containerized with Docker, ensuring scalability and easy integration with other privacy tools. The pipeline runs entirely on the client side, guaranteeing platform independence.

To evaluate the approach, the authors conducted two case studies using real data from Facebook and Twitter, comprising over 10,000 public posts. The detection module achieved an F1‑score of 0.92. User studies on the sanitized outputs reported an average readability rating of 4.3 out of 5, indicating that the sanitization preserved the usefulness of the posts. When differential access control was applied, the exposure of sensitive information to unauthorized users dropped by 87 %.

The paper also discusses limitations: performance degrades on multilingual or code‑mixed posts, and overly aggressive sanitization can lead to unnecessary information loss. Future work will extend the framework to handle multimedia content (images, videos) through OCR and metadata analysis, develop automatic policy recommendation based on user behavior, and improve the UI for policy definition.

In summary, this work presents a practical, operator‑agnostic solution that empowers users to automatically protect their private data while still participating fully in social networks. By integrating detection, sanitization, and credential‑based disclosure into a single, platform‑independent system, the authors demonstrate a viable path toward more transparent and user‑controlled privacy on today’s ubiquitous social platforms.


Comments & Academic Discussion

Loading comments...

Leave a Comment