Cross-Language Domain Adaptation for Classifying Crisis-Related Short Messages

Reading time: 5 minute
...

📝 Abstract

Rapid crisis response requires real-time analysis of messages. After a disaster happens, volunteers attempt to classify tweets to determine needs, e.g., supplies, infrastructure damage, etc. Given labeled data, supervised machine learning can help classify these messages. Scarcity of labeled data causes poor performance in machine training. Can we reuse old tweets to train classifiers? How can we choose labeled tweets for training? Specifically, we study the usefulness of labeled data of past events. Do labeled tweets in different language help? We observe the performance of our classifiers trained using different combinations of training sets obtained from past disasters. We perform extensive experimentation on real crisis datasets and show that the past labels are useful when both source and target events are of the same type (e.g. both earthquakes). For similar languages (e.g., Italian and Spanish), cross-language domain adaptation was useful, however, when for different languages (e.g., Italian and English), the performance decreased.

💡 Analysis

Rapid crisis response requires real-time analysis of messages. After a disaster happens, volunteers attempt to classify tweets to determine needs, e.g., supplies, infrastructure damage, etc. Given labeled data, supervised machine learning can help classify these messages. Scarcity of labeled data causes poor performance in machine training. Can we reuse old tweets to train classifiers? How can we choose labeled tweets for training? Specifically, we study the usefulness of labeled data of past events. Do labeled tweets in different language help? We observe the performance of our classifiers trained using different combinations of training sets obtained from past disasters. We perform extensive experimentation on real crisis datasets and show that the past labels are useful when both source and target events are of the same type (e.g. both earthquakes). For similar languages (e.g., Italian and Spanish), cross-language domain adaptation was useful, however, when for different languages (e.g., Italian and English), the performance decreased.

📄 Content

Imran et al. Domain Adaptation for Classifying Crisis-Related Messages

Long Paper – Social Media Studies Proceedings of the ISCRAM 2016 Conference – Rio de Janeiro, Brazil, May 2016 Tapia, Antunes, Bañuls, Moore and Porto,eds.

Cross-Language Domain Adaptation for Classifying Crisis-Related Short Messages

Muhammad Imran Qatar Computing Research Institute, HBKU Doha, Qatar mimran@qf.org.qa Prasenjit Mitra Qatar Computing Research Institute, HBKU Doha, Qatar pmitra@qf.org.qa

  Jaideep Srivastava 

Qatar Computing Research Institute, HBKU Doha, Qatar jsrivastava@qf.org.qa ABSTRACT Rapid crisis response requires real-time analysis of messages. After a disaster happens, volunteers attempt to classify tweets to determine needs, e.g., supplies, infrastructure damage, etc. Given labeled data, supervised machine learning can help classify these messages. Scarcity of labeled data causes poor performance in machine training.
Can we reuse old tweets to train classifiers? How can we choose labeled tweets for training? Specifically, we study the usefulness of labeled data of past events. Do labeled tweets in different language help? We observe the performance of our classifiers trained using different combinations of training sets obtained from past disasters. We perform extensive experimentation on real crisis datasets and show that the past labels are useful when both source and target events are of the same type (e.g. both earthquakes). For similar languages (e.g., Italian and Spanish), cross-language domain adaptation was useful, however, when for different languages (e.g., Italian and English), the performance decreased. Keywords Social media, tweets classification, domain adaptation INTRODUCTION Microblogging platforms such as Twitter provide active communication channels during the onset of mass convergence events such as natural disasters (Palen et al., 2009; Hughes et al., 2009; Starbird et al., 2010; Vieweg et al., 2010). In recent years, Twitter has been used to spread news about casualties and damages, donation offers and requests, and alerts, including multimedia information such as videos and photos (Cameron et al., 2012; Imran et al., 2013a; Qu et al., 2011). Many studies show the significance of this online information (Vieweg et al., 2014; Sakaki et al., 2010; Neubig et al., 2011). Moreover, it has been observed that these messages are usually communicated more quickly than disaster information shared via traditional channels such as news websites, etc. For instance, the first tweet to report on the 2013 Westgate Mall attack was posted within a minute of the initial onslaught.1 Given the importance of crisis-related messages for time-critical situational awareness, disaster-affected communities and professional responders may benefit from using an automatic system to extract relevant information from social media.
For rapid crisis response, real-time insights are important for emergency responders. To identify actionable and tactical informative pieces from a growing stack of social media information and to inform decision-making

1 http://www.ihub.co.ke/blog/2013/10/how-useful-is-a-tweet-a-review-of-the-first-tweets-of-the-westgate-attack Imran et al. Domain Adaptation for Classifying Crisis-Related Messages

Long Paper – Social Media Studies Proceedings of the ISCRAM 2016 Conference – Rio de Janeiro, Brazil, May 2016 Tapia, Antunes, Bañuls, Moore and Porto,eds.

processes as early as possible, messages need to be processed as they arrive. Given the volume of the messages, we need to triage them. That is, we need to put them in different actionable bins such as food, supplies, financial, logistics, etc. so that disaster-response professionals can quickly look into each bin and identity the needs. Different approaches can be employed to filter and classify these messages. For instance, many humanitarian organizations use the Digital Humanitarian Network (DHN)2 of volunteers to analyze messages one by one to find actionable information. However, given the amount of information that needs to be triaged, and the scarcity of volunteers, we would ideally like the messages to be categorized automatically and volunteers use their time to perform higher- order tasks. Despite advances in natural language processing, full automation is not feasible. Most classifiers that achieve high accuracies in solving different classification tasks are based on supervised machine learning where humans provide a set of training sample consisting of positive and negative examples for each classification category. A semi-automated system having similar characteristics to DHN is AIDR (Artificial Intelligence for Disaster Response) (Imran et al., 2014). AIDR can be trained to then automatically process and classify messages at high-speed using a supervised classification technique. The AIDR platform c

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut