Towards Automatic Personality Prediction Using Facebook Like Categories

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We demonstrate that effortlessly accessible digital records of behavior such as Facebook Likes can be obtained and utilized to automatically distinguish a wide range of highly delicate personal traits including: life satisfaction, cultural ethnicity, political views, age, gender and personality traits. The analysis presented based on a dataset of over 738,000 users who conferred their Facebook Likes, social network activities, egocentric network, demographic characteristics, and the results of various psychometric tests for our extended personality analysis. The proposed model uses unique mapping technique between each Facebook Like object to the corresponding Facebook page category/sub-category object, which is then evaluated as features for a set of machine learning algorithms to predict individual psycho-demographic profiles from Likes. The model , distinguishes between a religious and non-religious individual in 83% of circumstances, Asian and European in 87% of situations, and between emotional stable and emotion unstable in 81% of situations. We provide exemplars of correlations between attributes and Likes and present suggestions for future directions.

💡 Research Summary

The paper investigates whether Facebook “Likes” can be leveraged to predict users’ psychological and demographic traits, focusing on the Big‑Five personality model. Using the publicly available myPersonality datasets, the authors combined the “big5” questionnaire scores with the “user likes” records for over 738,000 users. Each Like is mapped, via the Facebook Graph API, to its page category and sub‑category (e.g., Politics, Sports, Arts). These categorical metadata become the feature set: for every user, the proportion of Likes belonging to each category is computed, normalising raw counts to mitigate biases caused by differing total numbers of Likes across users.

Two sampling strategies are employed. Random sampling provides an unbiased split of the data, while stratified sampling groups users by their Big‑Five scores and draws equal numbers from each group, ensuring that minority personality profiles are represented in the test set.

Four machine‑learning algorithms are evaluated: (1) Gradient‑boosted decision trees (xgboost), (2) Linear regression, (3) k‑Nearest Neighbours (k tuned between 10 and 15 with a custom penalty for missing categories), and (4) a multilayer perceptron neural network. Regression performance is measured with Root Mean Squared Error (RMSE); classification metrics (precision, recall) are reported only for the Random Forest classifier, which proved less suitable for the continuous‑valued personality scores.

Results show that boosted trees achieve the lowest RMSE across all five traits, with “Openness” predicted most accurately (≈8 % average error). Linear regression follows closely, while K‑NN performs comparably when the penalty scheme is applied. The neural network does not surpass the simpler models, likely due to over‑fitting and the high dimensionality of sparse categorical features. Feature‑importance analysis reveals that the total number of Likes dominates predictive power, and that specific categories (politics, sports, arts, etc.) correlate strongly with particular traits. Filtering users to those with at least 250 Likes reduces the dataset by about 75 % but improves per‑user prediction accuracy, indicating that richer individual profiles outweigh sheer sample size.

The authors discuss limitations and future work. They propose enriching the categorical metadata with external knowledge bases such as DBpedia to capture semantic relationships between pages, which could reduce feature sparsity and improve distance‑based models. Ethical considerations are highlighted: large‑scale extraction of personal data raises privacy concerns, and the authors call for clear legal and societal guidelines governing such predictive systems.

A web‑based prototype is built, allowing individuals to log in with their Facebook account, retrieve their Likes, and receive an instant estimate of their Big‑Five scores. The study demonstrates that Facebook page categories, a relatively lightweight form of metadata, are sufficient to infer personality traits with reasonable accuracy, opening avenues for personalized marketing, political campaigning, and user‑centric services, while also underscoring the need for responsible handling of digital behavioural data.

Towards Automatic Personality Prediction Using Facebook Like Categories

💡 Research Summary

Comments & Academic Discussion

Leave a Comment