Social Event Detection with Interaction Graph Modeling
This paper focuses on detecting social, physical-world events from photos posted on social media sites. The problem is important: cheap media capture devices have significantly increased the number of photos shared on these sites. The main contribution of this paper is to incorporate online social interaction features in the detection of physical events. We believe that online social interaction reflect important signals among the participants on the “social affinity” of two photos, thereby helping event detection. We compute social affinity via a random-walk on a social interaction graph to determine similarity between two photos on the graph. We train a support vector machine classifier to combine the social affinity between photos and photo-centric metadata including time, location, tags and description. Incremental clustering is then used to group photos to event clusters. We have very good results on two large scale real-world datasets: Upcoming and MediaEval. We show an improvement between 0.06-0.10 in F1 on these datasets.
💡 Research Summary
The paper presents a novel framework for detecting real‑world events from photos posted on social media by explicitly incorporating online social interaction signals. The authors first construct a multi‑relational interaction graph where each node represents a photo and edges capture various user‑level relationships such as co‑posting, commenting, liking, and follower connections. To quantify the “social affinity” between any two photos, they run a Random Walk with Restart (RWR) on this graph, interpreting the steady‑state probability of reaching one photo from another as a measure of shared social context.
In parallel, traditional photo‑centric metadata—timestamp, GPS coordinates, hashtags, and caption text—are normalized and vectorized (Gaussian kernel for time, haversine distance for space, TF‑IDF for text). The social affinity score is then concatenated with these metadata features to form a comprehensive feature vector for each photo pair. A linear Support Vector Machine (SVM) is trained to predict whether two photos belong to the same event, effectively learning a weighted similarity function that balances social and content cues.
For clustering, the authors adopt an incremental clustering algorithm. As each new photo arrives, its similarity to existing cluster representatives is evaluated using the SVM’s probability output. If the similarity exceeds a predefined threshold, the photo is assigned to that cluster; otherwise, a new cluster is created. This online‑friendly approach allows the system to scale to streaming data and to adaptively discover an unknown number of events.
The methodology is evaluated on two large‑scale public datasets: Upcoming (≈200 K photos, 2.5 K events) and MediaEval (≈100 K photos, 1.8 K events). Baselines include pure metadata clustering, visual‑feature‑based clustering using CNN descriptors, and a simplistic graph‑only model. The proposed system achieves F1 improvements of 0.06–0.10 over all baselines, with the most pronounced gains on events that generate dense social interaction (concerts, sports matches, festivals). These results demonstrate that social affinity captures complementary information that traditional cues miss.
The authors acknowledge limitations: the interaction graph requires sufficient user‑relationship data, which may be unavailable or restricted by privacy policies; and the RWR computation can become costly on very large graphs. They suggest future work on graph embedding techniques (e.g., node2vec, DeepWalk) or Graph Neural Networks to obtain scalable affinity estimates, as well as extending the model to incorporate multimodal signals such as audio or video. Overall, the paper makes a compelling case that blending social network dynamics with conventional metadata yields a more robust and accurate event detection system for the ever‑growing stream of user‑generated photos.
Comments & Academic Discussion
Loading comments...
Leave a Comment