Enhancing Flood Impact Analysis using Interactive Retrieval of Social Media Images

The analysis of natural disasters such as floods in a timely manner often suffers from limited data due to a coarse distribution of sensors or sensor failures. This limitation could be alleviated by leveraging information contained in images of the e…

Authors: Bj"orn Barz, Kai Schr"oter, Moritz M"unch

Enhancing Flood Impact Analysis using Interactive Retrieval of Social   Media Images
Enhancing Flood Impact Analysis using Interactiv e Retriev al of Social Media Images Björn Barz, Kai Schröter , Moritz Münch, Bin Y ang, Andrea Unger , Doris Dransch, and Joachim Denzler Abstract The analysis of natural disasters such as floods in a timely manner often suf fers from limited data due to a coarse distribution of sensors or sensor failures. This limitation could be alle viated by leveraging information contained in images of the event posted on social media platforms, so-called “V olunteered Geographic Information (VGI)”. T o sav e the analyst from the need to inspect all images posted online manually , we propose to use content-based image retriev al with the possibility of rele v ance feedback for retrieving only rele vant images of the e vent to be analyzed. T o e v aluate this approach, we introduce a ne w dataset of 3,710 flood images, annotated by domain experts re garding their rele v ance with respect to three tasks (determining the flooded area, inundation depth, water pollution). W e compare sev eral image features and relev ance feedback methods on that dataset, mix ed with 97,085 distractor images, and are able to improv e the precision among the top 100 retriev al results from 55% with the baseline retrie val to 87% after 5 rounds of feedback. Björn Barz, Joachim Denzler Friedrich Schiller Uni versity Jena, Computer V ision Group, Ernst Abbe Platz 2, 07743 Jena, Germany  {bjoern.barz, joachim.denzler}@uni- jena.de Kai Schröter , Moritz Münch Deutsches GeoForschungsZentrum, Sec. Hydrology , T ele grafenberg, 14473 Potsdam, Germany  {kai.schroeter, moritz.muench}@gfz- potsdam.de Bin Y ang, Andrea Unger , Doris Dransch Deutsches GeoForschungsZentrum, Sec. Geoinformatics, T elegrafenberg, 14473 Potsdam, German y  {bin.yang, andrea.unger, doris.dransch}@gfz- potsdam.de P R E - P R I N T 2 Barz, Schröter , Münch, Y ang, Unger , Dransch, Denzler 1 Introduction The rapid analysis of recent or current natural disasters such as floods is crucial to provide information about impacts as a basis for ef ficient disaster response and recov ery . W ith an increasing av ailability of data and information channels, the identification and exchange of core information and best possible up-to-date information is essential for disaster response (T urof f, 2002; Comfort et al, 2004). For recov ery and improv ed disaster risk reduction, comprehensiv e image-based documentations of disaster dynamics help to gain insights and to impro ve our understanding of system behavior during extreme ev ents. This kno wledge is important to re view and adapt flood pre v ention and protection concepts which are the basis to mitigate adv erse consequences from flooding. Ho we v er , e vent analyses often suffer from limited or insufficient data (Poser and Dransch, 2010; Thieken et al, 2016). For the case of flood mapping, traditional measuring de- vices as, for instance, water le vel gauges are e xpensi ve and hence only coarsely distributed. Malfunction and uncertainties of recordings during e xtreme ev ents are kno wn issues. On the other hand, there is usually a large amount of complementary in- formation that could be deriv ed from images posted by v olunteers on social media platforms (Assumpção et al, 2018). Since most modern consumer de- vices are GPS-enabled and store the geographical location where an image has been taken in its metadata, these images could be used to deriv e information about the flood at locations where the sensor cov erage is insufficient, to sub- stitute failures of measurements, or to complement other pieces of information (Schnebele and Cervone, 2013; Fohringer et al, 2015). The users of social me- dia platforms posting images about natural disasters can thus be considered as human sensors pro viding so-called volunteer ed geo graphic information (VGI) (Goodchild, 2007), from which dif ferent types of information can be extracted for being combined with the data obtained from traditional sensors. These data are potentially useful during all stages of disaster management (Poser and Dransch, 2010): T o prepare for the case of a natural disaster , suf fi- cient data about past ev ents are necessary . During an e v ent, the rapid av ailability of social media images would le verage monitoring the extent and intensity of the disaster and the current status of response acti vities. For post-disaster recov- ery , on the other hand, up-to-date damage estimates are required for financial compensation, insurance payouts, and reconstruction planning. Enhancing Flood Impact Analysis using Interactiv e Retriev al of Social Media Images 3 Ho wev er , the sheer amount of almost 6,000 tweets posted e very second on T witter alone (Krikorian, 2013) renders inspecting all of them intractable, e v en when the set of images is restricted to those within a certain region and time- frame. Therefore, an automated filter retrieving only those images that are rele vant for the analysis is highly desirable. The notion of r ele vance is usually not fixed but depends on the objective currently pursued by the analysts. In the case of flood impact analysis, hydrol- ogists might sometimes be interested in determining whether a certain area is flooded or not, which might be dif ficult to detect based on just a fe w water lev el measurements or due to mobile flood-protection walls that may alter the flood- ing process and expected inundation areas. Howe ver , while V GI images can be of great benefit for this task, just retrie ving all images of the flooding is not suf ficient in general. Because at another point of time, the information objecti v e of the analysts might be to determine inundation depth as a ke y indicator of flooding intensity . In this regard, a different set of images would be rele vant, sho wing visual clues for inundation depth such as partially flooded traffic-signs or people walking through the w ater . Another example is the task of determin- ing the type and de gree of water pollution from images, which changes the notion of rele vant image features drastically . These image characteristics are sometimes difficult to verbalize in natural language. An example image, ho we ver , can often capture the search objectiv e much more easily . Moreover , te xt-based search always runs the risk of missing rele vant images with insufficient textual descriptions. Thus, we propose an approach based purely on the image content. Since all the information objectiv es an analyst might have in mind constitute an open set, it is not possible to train a fixed set of classifiers for distinguishing between relev ant and irrelev ant samples. Instead, we propose an interactive image r etrie val approach to assist the analyst in finding those images that are rele vant with respect to the current task. This procedure is illustrated in Fig. 1: The user first provides a so-called query image that should capture the search objecti ve reasonably well. The system then extracts image features for this query and compares it with all other images in the database or social media image stream using the Euclidean distance between images in the feature space. The result is a list of retrie ved images, ranked by their proximity to the query . This procedure is kno wn as content-based image r etrie val (Smeulders et al, 2000) and has been an acti ve topic of research since 1993 (Niblack et al). Ho wev er , the results of this baseline retrie v al will be suboptimal in most cases, since it is based on just a single query image. Thus, the system enables 4 Barz, Schröter , Münch, Y ang, Unger , Dransch, Denzler Que r y Im ag e p rovi ded b y t he us er, rep resen t a t i ve f or t he t a sk a t ha nd Si mi l ar Images sort ed b y si mi l a rity U s er / A nalyst p ursu i ng a cer t a i n, complex ob j ec tive of an al y si s Feedba c k Refi n em en t b a sed on us er f eedback B as el i n e R et r i eval of si mi l a r i ma g es 1 2 3 4, 8, … 5, 9, … 6, 10 , … 7, 11, … Fig. 1: Schematic illustration of our interacti ve image retrie v al process. the user to flag some of the retrie ved images as relev ant or irrelev ant. This information will then be incorporated to obtain a refined list of retrie v al results, which should match the search interest pursued by the user more precisely . This step can be repeated until the user is satisfied with the result. In this work, we in vestigate how to construct an image retrie val pipeline that is suitable for retrieving flood images by comparing several types of features extracted from deep neural networks for the baseline retriev al and v arious ap- proaches to incorporate relev ance feedback. T o enable a quantitativ e ev aluation, we introduce a nov el dataset comprising 3,435 images of the European Flood 2013 from W ikimedia Commons plus 275 images sho wing water pollution from v arious sources. All images have been annotated by domain e xperts with respect to their rele vance re garding three pre-defined information objecti ves. The remainder of this paper is org anized as follows: W e will first briefly re view related work on using VGI images for disaster management in Section 2. Our nov el flood dataset is introduced in Section 3 and v arious baseline retrie v al and rele vance feedback methods are described and e v aluated in Section 4. Sec- tion 5 concludes this work and discusses directions for future research. Enhancing Flood Impact Analysis using Interactiv e Retriev al of Social Media Images 5 2 Related W ork Many approaches for lev eraging VGI from social media focus on linguistic patterns (e.g., Ireson, 2009), text-based classification (e.g., Sakaki et al, 2010; Y in et al, 2012) and keyword-based filtering (e.g., V ieweg et al, 2010; F ohringer et al, 2015). Similar to our moti vation, Schnebele and Cerv one (2013) used v olunteered data which hav e been retrie ved using the photo, video and ne ws Google search engines for a flood in Memphis (US) in May 2011. These information hav e been combined with remote sensing, digital elev ation and other data to produce flood extent and flood hazard maps. T witter messages ha ve been used by Brouwer et al (2017) to estimate flood- ing extents. While this approach uses only T witter text message contents, it applies a set of keywords to filter relev ant tweets. Geolocation information is deri ved from location references contained in the tweet. Fohringer et al (2015) proposed to deri ve information about flood events from images posted on T witter or Flickr and found them to contain “additional and potentially ev en exclusi ve information that is useful for inundation depth mapping”. Like wise, Rosser et al (2017) retriev e geotagged imagery from Flickr using a defined study area and time windo w in combination with the k eyw ord “flood”. In this approach, only the image location is used to delineate flooded areas in combination with other data sources. Ho we ver , both works do not em- ploy any automatic image-based filtering b ut collect all tweets containing some predefined ke ywords and then analyze the rele v ance of all images included in these tweets manually . On the one hand, this tedious process is prohibitiv e for rapid flood impact estimation due to the time needed for inspecting all images. Further , the initial k eyword-based filtering in v olves the risk of missing a large portion of rele vant images due to the lack of matching k eyw ords in the text. W e show ho w these issues can be overcome using computer vision tech- niques for filtering based on the image content only . 3 A Dataset f or Flood Image Retrieval A quantitati ve e v aluation of our interacti v e image retrie v al approach demands a sufficient number of both flood and non-flood images. In addition, we need to kno w for each image whether it is relev ant for a certain task or not. 6 Barz, Schröter , Münch, Y ang, Unger , Dransch, Denzler While obtaining non-flood-related images is rather easy , since an y existing dataset such as, for e xample, the Flickr100k dataset (Philbin et al, 2007) com- prising 100,031 images from Flickr could be used for that, finding a sufficient number of images relating to a certain flood event is more difficult. W e used W ikimedia Commons as a source for flood images, since it already provides dedicated categories for major flood e vents, the images are released under a permissi ve Creati ve Commons license, and man y of them contain geotags. In the follo wing subsections we describe how we collected flood images and annotated them with respect to their relev ance re garding three exemplary information objectiv es. The dataset, including metadata and annotations, can be obtained at https://github.com/cvjena/eu- flood- dataset . 3.1 Collecting Flood Images The W ikidata project strives to wards creating machine-readable representations of all structured information present in W ikipedia. This information can be queried fully automatically using the SP ARQL query language, which allows retrie ving a list of all flood events recorded on W ikipedia with an associated W ikimedia Commons category . W e can then use the W ikimedia Commons API to fetch all images and their metadata from those categories automatically . W ith a large margin, the highest number of images is a v ailable for the Central Europe floods of 2013 1 , which comprises 3,855 images in total (as of July 2017). W e hence decided to use this e vent as a basis for our flood dataset. After e xcluding sub-categories that relate e xclusi vely to public transportation during the flood but do not sho w actual flooding, a total of 3,435 images remain. Ho wev er , these images do not show any water pollution, in which we are also interested. Thus, we added another set of 275 images to the dataset, which we have collected manually from the web by querying image search engines for the names of recent major oil spill e v ents. T o this end, we ha ve again used a list of oil spills provided on W ikipedia 2 . 1 https://commons.wikimedia.org/wiki/?curid=26466898 (accessed: July 21 st , 2017) 2 https://en.wikipedia.org/w/index.php?title=List_of_oil_spills (accessed: May 30 th , 2018) Enhancing Flood Impact Analysis using Interactiv e Retriev al of Social Media Images 7 3.2 Relevance Annotations For a quantitati ve ev aluation and comparison of se veral image retrie v al methods, we need to simulate the behavior of a user of our proposed interactiv e image retrie val system. T o enable such a simulation, we have defined a set of three common tasks, which could be pursued by a hydrologist using the system: Flooded vs. dry Does the image help to determine whether a certain area is flooded or not? Usually , one would assume flooding of a certain area based on the intersection of the water lev el height and the elev ation of the terrain. Ho we ver , the area might actually be dry due to a flood-protection wall, for example. An image considered as rele v ant would show the boundary between flooded and dry areas. Images that do not sho w any inundation at all are considered not rele vant. While these could be used to track the spread of the flood at a certain location ov er time, we only consider the indi vidual rele v ance of images in this work, ignoring aspects that might become rele vant when compared with other images in the dataset. Inundation depth Is it possible to deri ve an estimate of the inundation depth from the image due to visual cues such as, for example, traf fic signs or other structures with known height? If there is no flooding at all, the image is considered as not rele vant for inundation depth. Fig. 2: V enn diagram of the sets of images per task in our nov el dataset. 8 Barz, Schröter , Münch, Y ang, Unger , Dransch, Denzler (a) Flooded vs. dry c  Mat ˇ ej Bat ’ha (CC BY -SA 3.0) (b) Inundation depth c  Dr . Bernd Gross (CC BY -SA 3.0) (c) W ater pollution c  Kallol Mustafa (CC BY -SA 4.0) Fig. 3: Examples for annotations of important image regions. W ater pollution Does the image show an y pollution substances? The focus is on heavy contamination by chemical substances such as oil, for e xample. Each image in the dataset has been assigned to one of sev eral domain experts for annotation. Any image could be relev ant for one, multiple, or none of the tasks described abov e. Figure 2 sho ws the number of images mark ed as rele vant for each task, the ov erlap of the categories, and an example image for each task. 9.5% of the images were found not to show an y flooding situation despite being associated with the flood in W ikimedia Commons. W e assigned the special label “irrele- v ant” to these images and treat them in the same way as the distractor images from the Flickr100k dataset (see Section 4). Due to limited resources, we only obtained a single annotation for each image. Howe ver , an additional domain expert was asked to assure the quality of random samples from the set of annotations and to select between 100 and 250 ideal query images for each task that reflect the search objecti v e well and could be used as initial query images for our image retrie val approach. 3.3 Important Image Regions Besides relev ance annotations for images as a whole, we have also asked one domain expert to highlight important regions on some of the images selected as queries for each task. This aims to account for the fact that the rele vance of a certain image is often due to a particular small part of the image without which it would not be relev ant at all, e.g., partially flooded traffic signs in the case of the inundation depth task. W e also allowed the e xpert to mark multiple Enhancing Flood Impact Analysis using Interactiv e Retriev al of Social Media Images 9 rele vant regions per image and to create groups of re gions that ha ve to be present together in a single image for being rele v ant. Example annotations are sho wn in Fig. 3. W e do not make use of these region-le vel information in our image retrie val system at the moment, b ut plan to do so in the future. 4 Interactive Image Retrie val In the follo wing, we describe and compare sev eral methods for the two compo- nents of our interacti v e image retrie v al pipeline depicted in Fig. 1: constructing a feature space for the baseline retriev al of similar images and incorporating rele vance feedback pro vided by the user . All methods are ev aluated on a combination of our no vel flood dataset intro- duced in Section 3 and images from Flickr100k (Philbin et al, 2007) as distrac- tors. While Flickr100k comprises a total of 100,031 images, we e xcluded those tagged with “riv er” or “water”, since some of them show flooding situations. After this, 97,085 distractor images remain, which we do not expect to show flooding given their tags. The set of flood-related images from our novel dataset hence accounts for as fe w as 4% of the combined dataset. W e employ the normalized discounted cumulative gain (Järvelin and K ekäläi- nen, 2002) among the top 100 results (NDCG@100) as performance metric, which does not only measure the fraction of relev ant images among the top 100 results but considers their order as well, assigning higher weights to earlier po- sitions in the ranking. For a query q and a ranked list of n ≥ k retrie ved images with rele vance labels y i ∈ { 0 , 1 } , i = 1 , . . . , n , the NDCG@ k is defined as: NDCG@ k ( y 1 , . . . , y n | q ) = k ∑ i = 1 y i log 2 ( i + 1 ) ! / min { k , | R ( q ) |} ∑ i = 1 1 log 2 ( i + 1 ) ! , (1) where R ( q ) denotes the set of all images relev ant for the query q . The best NDCG hence is 1 . 0 and the worst is 0 . 0 . W e cap the ranking at k = 100 since the advantage of our image retriev al system for finding relev ant flood images v anishes if the user has to inspect more than 100 results. In the following, we always report the average NDCG@100 over all 611 query images from our dataset identified as suitable by the domain experts, which are issued as individual queries to the system. Images from the dataset are considered as relev ant with respect to a certain query if they are assigned to the label for which the query image has been selected as “ideal example”. 10 Barz, Schröter , Münch, Y ang, Unger , Dransch, Denzler 4.1 Baseline Retrieval The main challenge of content-based image retrie v al (CBIR) is constructing a feature space where similar images lie close together , so that retriev al can be performed by searching for the nearest neighbors of the query in that space. The notion of similarity is often fuzzy and depends on the application. This relation is most often defined as two images either sho wing the same object, objects of the same class, or being “visually similar”, which is dif ficult to formalize. T raditional CBIR approaches usually consist in detecting in v ariant keypoints in an image, extracting handcrafted local descriptors from the neighborhood of these keypoints, embedding them in a high-dimensional space, and finally aggregating them into a single global image descriptor . A good summary of these approaches has been gi ven by Babenko and Lempitsk y (2015). In the past few years, howe v er , such approaches have been outperformed by deep-learning-based image features extracted from a con v olutional neural network (CNN) (LeCun et al, 1989) pre-trained on a classification task. A CNN typically consists of a sequence of con v olution operations with learned filters and non-linear acti v ation functions in-between. After certain layers, the feature map is sub-sampled using local pooling operations. The result of the last con- volutional layer is hence a lo w-resolution map of features for dif ferent re gions of the image. These local feature vectors are then aggregated by av eraging with either uniform or learned weights and fed through a sequence of so-called fully- connected layers, which essentially realize a multiplication of the features with a learned matrix follo wed by a non-linear activ ation function. In classification scenarios, the output of the final layer is interpreted as the logits of a probabil- ity distribution o ver the classes. The entire network is trained end-to-end using backpropagation (LeCun, 1985), so that all the intermediate feature representa- tions are learned from data and optimized for the task at hand, where the degree of abstraction from visual to semantic features increases with the depth in the network (Zeiler and Fer gus, 2014). Surprisingly , Babenko et al (2014) found these features, which they extracted from the fully-connected layers of a pre-trained network, to also perform com- petiti vely for the task of content-based image retriev al. The traditional CBIR approaches with handcrafted features were finally outperformed using local CNN features extracted from the last con volutional layer (Babenko and Lem- pitsky, 2015). These need to be aggre gated into a global image descriptor first, which provides additional leeway for adapting the pre-trained features to the Enhancing Flood Impact Analysis using Interactiv e Retriev al of Social Media Images 11 retrie val scenari o. Besides simple av erage and maximum pooling, a v ariety of sophisticated pooling functions has been proposed in the past few years. In this work, we e v aluate the follo wing ones: A verage P ooling (avg) Uniform av erage (Babenko and Lempitsk y, 2015). P artial Mean P ooling (PMP) A veraging ov er the top 10% highest acti v ations per channel (Zhi et al, 2016). This combines a verage and maximum pooling. Generalized-Mean P ooling (GeM) Using the L p -norm of each spatial feature map (Radeno vi ´ c et al, 2018), which generalizes between average (for p = 1 ) and maximum (for p → ∞ ) pooling. W e have empirically found p = 2 to work well on our dataset. Adaptive Co-W eighting (adacow) Combination of spatial and channel-wise weighting, where the spatial weights are based on the sum of acti v ations at each position and channel weights are determined in a way so that frequently occurring bursty features get a lo w weight (W ang et al, 2018). In all cases, we extract the local features to be aggreg ated from the last con v o- lutional layer of the so-called VGG16 CNN architecture (Simonyan and Zis- serman, 2014), pre-trained 3 for classification on millions of images from the ImageNet dataset (Russakovsk y et al, 2015). This is the network architecture that has initially been used by Babenko et al (2014) and Babenko and Lempitsky (2015) for the first CBIR approaches using neural features and has remained popular until today . W e also ev aluate global image features extracted from the first fully-connected (FC) layer of the same CNN, as done by Babenko et al (2014). This corresponds to a complex aggre gation function with learned weights for both feature dimensions and spatial posi tions. Regardless of the ag- gregation function being used, we al ways L 2 -normalize the final global image descriptors, which has prov en to be beneficial for image retrie v al, because the direction of high-dimensional feature vectors often carries more information than their magnitude (Jégou and Zisserman, 2014; Horiguchi et al, 2019) Besides the use of pre-trained CNNs for feature extraction, neural networks trained end-to-end specifically for image retrie v al hav e sho wn superior perfor- mance recently . In this regard, we e v aluate the approach of Gordo et al (2017), who extended a ResNet-101 architecture (He et al, 2016) with R-MAC pooling (sum-pooling ov er maximum-pooled features from se veral re gions of interest; T olias et al, 2016), followed by PCA and L 2 -normalization. This network has been trained for image retriev al on a landmarks dataset using a triplet loss, 3 The pre-trained VGG16 model can be obtained at http://www.robots.ox.ac.uk/~vgg/ research/very_deep/ (accessed July 10 th , 2019). 12 Barz, Schröter , Münch, Y ang, Unger , Dransch, Denzler 0.396 0.475 0.480 0.465 0.471 0.518 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 VGG16 FC VGG16 avg VGG16 PMP VGG16 GeM VGG16 adacow Deep R-MAC NDCG@100 (a) Single Scale 0.500 0.512 0.493 0.507 0.545 VGG16 avg VGG16 PMP VGG16 GeM VGG16 adacow Deep R-MAC (b) Multi-Resolution 0.66 0.73 0.51 0.51 0.16 0.19 VGG16 PMP Deep R-MAC (c) Individual T asks Flooding? Depth Pollution Fig. 4: Baseline retrie val performance. which enforces similar images to be closer together in the feature space than dissimilar ones. W e denote this approach as “Deep R-MA C” 4 . The performance comparison in Fig. 4a sho ws that aggregated conv olu- tional features perform significantly better than features extracted from fully- connected layers, which are presumably already too class-specific. The choice of the pooling method makes only a slight difference, while PMP performed best. Features from the Deep R-MA C network fine-tuned for object retrie val provided ev en better performance than VGG16, resulting in an NDCG@100 of 51.8% for the simple baseline retrie val. For these e xperiments, all images hav e been resized so that their larger side is 512 pixels wide, except for fully-connected (FC) pooling, which only works with rather small images of size 224 × 224 due to the fixed number of learned weights. Follo wing Gordo et al (2017) we ha ve also ev aluated a veraging image descriptors extracted from 3 differently scaled versions of the same image, where we resized the larger side to 550, 800, and 1050 pix els. The results in Fig. 4b show that the use of multiple resolutions leads to an absolute improv ement of NDCG@100 by about 3%, regardless of the features. Since the number of images and queries per task in our flood dataset is not balanced (cf. Fig. 2), we also report the per-task performance of the two best- performing types of features in Fig. 4c. Obviously , finding images relev ant for pollution is much more dif ficult than the other two tasks, which we do not solely attribute to the small number of rele v ant images, but also to the fact that images of oil films are easily confused with photos of abstract art from Flickr . 4 The pre-trained Deep R-MAC model can be obtained at https://github.com/figitaki/ deep- retrieval (accessed July 10 th , 2019). Enhancing Flood Impact Analysis using Interactiv e Retriev al of Social Media Images 13 4.2 Relevance F eedback Approaches for incorporating relev ance feedback into image retrie v al can usu- ally be di vided into four categories: Query P oint Movement The query vector is modified based on the feedback, e.g., by averaging ov er the features of all images marked as rele vant (Rocchio, 1971). These approaches belong to the oldest ones in information retriev al, but since their use is v ery limited, we will not address them in this work. Pr obabilistic The distribution of the probability that a particular image is rele vant giv en the feedback is estimated. Here, we in v estigate the simple kernel density estimation (KDE) method proposed by Deselaers et al (2008). Classification A classifier is trained for distinguishing between rele v ant and irrele vant images. In this work, we in vestigate two approaches for this: an Exemplar -LD A classifier (Hariharan et al, 2012) and a support vector ma- chine (SVM, Cortes and V apnik, 1995), falling back to a One-Class SVM (Schölkopf et al, 2001) if only positi ve feedback is gi v en. Metric Learning A new metric d : R D × R D → R is applied to the D -dimensional feature space R D , minimizing the distance between relev ant images and max- imizing the distance between rele v ant and irrele v ant ones. Man y approaches use a Mahalanobis metric of the form d M ( x 1 , x 2 ) = ( x 1 − x 2 ) > M ( x 1 − x 2 ) and learn a positive semi-definite matrix M ∈ R D × D . This is equiv alent to a linear transformation of the data into a ne w space where the Euclidean distance corresponds to d M in the original space. In this w ork, we in vestigate the feature weighting approach of Deselaers et al (2008), the diagonal v ariant of MMC (Xing et al, 2003), and information- theoretic metric learning (ITML, Davis et al, 2007). The first two approaches learn a diagonal matrix M , which corresponds to a weighting of individual features, but use dif ferent objecti v es and optimization algorithms: Deselaers et al (2008) minimize the ratio of distances between similar and dissimilar samples using gradient descent, while Xing et al (2003) employ a con ve x optimization objectiv e minimizing the distance of similar samples while keeping dissimilar ones away by at least a fixed radius. ITML (Davis et al, 2007), in contrast, learns a full matrix M , so that similar pairs are closer than a certain threshold and dissimilar ones are apart by at least another threshold. This is possible despite the high dimensionality of the feature space and limited annotations thanks to regularization to w ards the Euclidean distance as a prior metric. W e choose the two thresholds needed for ITML on a per- 14 Barz, Schröter , Münch, Y ang, Unger , Dransch, Denzler 0.4 0.5 0.6 0.7 0.8 0.9 1 0 1 2 3 4 5 6 7 8 9 10 NDCG@100 Feedback Rounds VGG16 with PMP (multi-resolution) 0 1 2 3 4 5 6 7 8 9 10 Feedback Rounds Deep R-MA C (multi-resolution) KDE SVM Exemplar-LD A Feature W eighting Feature W eighting + KDE Diagonal MMC Diagonal MMC + KDE ITML ITML + KDE Fig. 5: Comparison of methods for incorporating rele vance feedback. query basis as follows: All pairs of rele v ant images should be closer to each other than half the distance between the query and the first irrelev ant retrie v al result. The distance of all images tagged as irrelev ant to an y relev ant image should be greater than the 95 th percentile of the distances between the query and all images in the dataset. Additionally , we in v estigate combinations of the three metric learning methods with the KDE-based approach of Deselaers et al (2008). The annotations of our flood dataset allo w us to completely simulate the feedback process for a quantitati ve e v aluation: F or all the 611 images denoted as ideal queries, we first perform the baseline retriev al and then mark 10 random images out of the top 100 results either as relev ant or irrelev ant according to their labels. This is repeated for a total number of 10 feedback rounds and the retrie val quality is e v aluated after each round in terms of the NDCG@100. T o get an impression of the variance of the results, we repeat the entire e xperiment 10 times with dif ferent random sub-samples of 75% of the dataset. Based on the findings from the previous section, we use the two best- performing types of features (Deep R-MA C and PMP on the last con v olutional layer of VGG16), a veraged o ver multiple image scales. The results a v eraged over the 10 repetitions are shown in Fig. 5, where round 0 denotes the performance of the baseline retriev al. It can be seen that the simple KDE method already performs quite well, especially better than the tw o feature weighting techniques. The tw o classification-based approaches (SVM and Exemplar -LD A) apparently suf fer from the limited amount of annotations and behav e extremely unstable during the first rounds. ITML, on the other hand, Enhancing Flood Impact Analysis using Interactiv e Retriev al of Social Media Images 15 provides superior performance from the be ginning and leads to an NDCG@100 of 86.9% after 5 rounds of feedback and 92.9% after 10 rounds. Though KDE can be added on top of any other method, the benefit when combined with ITML is too marginal to justify the computational o verhead. After 5 rounds of feedback, the SVM-based approach starts to outperform ITML slightly when using Deep R-MA C features. Ho we ver , the performance during the first few rounds is of greater importance, since we do not expect most users to regularly spend more than five rounds of feedback for refining the results. During the early iterations, ho we v er , SVM performs w orst among all methods. It might hence be an interesting direction for future work to in- vestigate ho w ITML and SVM can be combined to improv e performance at all stages of the process. The maximum standard deviation of all methods and all feedback rounds ov er the 10 repetitions was 1.2%. W e conducted a paired Student’ s t -test to assess the significance of the dif ferences between the methods in Fig. 5 at a sig- nificance lev el of 5%. At the final feedback round, all differences are significant except that between Exemplar -LD A and Diagonal MMC + KDE when using Deep R-MA C features. At the first round, all differences are significant for VGG16 features and all besides that between ITML and Feature W eighting + KDE for Deep R-MA C features. F or VGG16 features, ITML and ITML + KDE performed significantly better than all other methods across all rounds. W ith the Deep R-MA C features, ITML coincided with Feature W eighting + KDE at round 1 and with SVM at round 4, b ut was otherwise significantly dif ferent from the rest. SVM started to perform significantly better than the rest from round 7 on. Besides that, Feature W eighting, Feature W eighting + KDE, and Diagonal MMC, performed significantly dif ferent from the rest in at least 9 of 10 rounds. 5 Conclusions and Future W ork W e ha ve proposed an interacti v e image retriev al approach with rele v ance feed- back for finding flood images on online image platforms that are rele vant for a particular information interest. T o e v aluate our approach, we ha ve presented a nov el dataset comprising 3,710 flood images annotated with relev ance labels regarding three e xemplary search objecti v es and important image regions. 16 Barz, Schröter , Münch, Y ang, Unger , Dransch, Denzler For the baseline retrie val, Deep R-MA C features (Gordo et al, 2017) av er- aged o ver multiple image scales perform best. Con volutional features extracted from other networks not fine-tuned for object retriev al can also perform well when aggregated using partial mean pooling (Zhi et al, 2016). Regarding the incorporation of relev ance feedback, an SVM-based approach provides the best performance in the long run, b ut needs a substantial amount of feedback for being useful. Information-theoretic metric learning (Da vis et al, 2007), on the other hand, provides superior performance during the early feed- back rounds and remains competitiv e with SVM later on. Finally , the simple KDE method of Deselaers et al (2008) has turned out to be a quick and decent baseline as well, which is particularly easy to implement and combine with ex- isting framew orks. Using relev ance feedback, the a verage NDCG@100 can be improv ed from 55% yield by the baseline retrie val to 87% after fiv e rounds and 93% after ten rounds of feedback, which we expect to be useful for hydrologists to find rele vant images quickly . In the future, we would like to in v estigate how the selection of important image regions can be integrated as an additional component of the system to improv e the relev ance of the retriev ed images ev en further , as proposed by Freytag et al (2015), for example. It seems also appealing to combine ITML with the SVM-based approach to improv e the performance at all stages of the feedback process. Moreover , it seems promising to apply active learning methods for asking the user for feedback reg arding certain acti vely selected images from which the system e xpects the most benefit. Finally , the interactiv e image retriev al system should be integrated into a visual analytics interface providing data from other sensors as well, enabling a case-study on a more recent flood e vent with real users. Acknowledgements This work was supported by the German Research F oundation as part of the priority programme “V olunteered Geographic Information: Interpretation, V isualisation and Social Computing” (SPP 1894, contract number DE 735/11-1). References Assumpção TH, Popescu I, Jonoski A, Solomatine DP (2018) Citizen ob- serv ations contributing to flood modelling: opportunities and challenges. Hydrology and Earth System Sciences 22(2):1473–1489, DOI 10.5194/ hess- 22- 1473- 2018 Enhancing Flood Impact Analysis using Interactiv e Retriev al of Social Media Images 17 Babenko A, Lempitsky V (2015) Aggregating local deep features for image retrie val. In: IEEE International Conference on Computer V ision (ICCV), pp 1269–1277, DOI 10.1109/ICCV .2015.150 Babenko A, Slesarev A, Chigorin A, Lempitsky V (2014) Neural codes for image retrie val. In: European Conf. Computer V ision (ECCV), Springer , pp 584–599, DOI 10.1007/978- 3- 319- 10590- 1_38 Brouwer T , Eilander D, V an Loenen A, Booij MJ, W ijnber g KM, V erkade JS, W agemaker J (2017) Probabilistic flood e xtent estimates from social media flood observ ations. Natural Hazards and Earth System Sciences 17(5):735– 747, DOI 10.5194/nhess- 17- 735- 2017 Comfort LK, K o K, Zagorecki A (2004) Coordination in rapidly e volving disas- ter response systems: The role of information. American Behavioral Scientist 48(3):295–313, DOI 10.1177/0002764204268987 Cortes C, V apnik V (1995) Support-vector netw orks. Machine learning 20(3):273–297, DOI 10.1007/BF00994018 Davis JV , K ulis B, Jain P , Sra S, Dhillon IS (2007) Information-theoretic metric learning. In: Proceedings of the 24th international conference on Machine learning, A CM, pp 209–216, DOI 10.1145/1273496.1273523 Deselaers T , Paredes R, V idal E, Ne y H (2008) Learning weighted distances for rele vance feedback in image retriev al. In: International Conference on Pat- tern Recognition (ICPR), IEEE, pp 1–4, DOI 10.1109/ICPR.2008.4761730 Fohringer J, Dransch D, Kreibich H, Schröter K (2015) Social media as an infor - mation source for rapid flood inundation mapping. Natural Hazards and Earth System Sciences 15(12):2725–2738, DOI 10.5194/nhess- 15- 2725- 2015 Freytag A, Schadt A, Denzler J (2015) Interactive image retrie v al for bio- di versity research. In: Gall J, Gehler P , Leibe B (eds) Pattern Recogni- tion, Springer International Publishing, Cham, pp 129–141, DOI 10.1007/ 978- 3- 319- 24947- 6_11 Goodchild MF (2007) Citizens as sensors: the world of volunteered geography . GeoJournal 69(4):211–221, DOI 10.1007/s10708- 007- 9111- y Gordo A, Almazán J, Rev aud J, Larlus D (2017) End-to-end learning of deep visual representations for image retriev al. International Journal of Computer V ision 124(2):237–254, DOI 10.1007/s11263- 017- 1016- 8 Hariharan B, Malik J, Ramanan D (2012) Discriminative decorrelation for clustering and classification. In: European Conference on Computer V ision, Springer , pp 459–472, DOI 10.1007/978- 3- 642- 33765- 9_33 18 Barz, Schröter , Münch, Y ang, Unger , Dransch, Denzler He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recogni- tion. In: 2016 IEEE Conference on Computer V ision and Pattern Recognition (CVPR), pp 770–778, DOI 10.1109/CVPR.2016.90 Horiguchi S, Ikami D, Aizawa K (2019) Significance of softmax-based features in comparison to distance metric learning-based features. IEEE Transactions on Pattern Analysis and Machine Intelligence DOI 10.1109/TP AMI.2019. 2911075 Ireson N (2009) Local community situational aw areness during an emergenc y . In: 2009 3rd IEEE International Conference on Digital Ecosystems and T ech- nologies, pp 49–54, DOI 10.1109/DEST .2009.5276763 Järvelin K, Kekäläinen J (2002) Cumulated gain-based ev aluation of IR tech- niques. A CM T ransactions on Information Systems (T OIS) 20(4):422–446, DOI 10.1145/582415.582418 Jégou H, Zisserman A (2014) T riangulation embedding and democratic ag- gregation for image search. In: IEEE Conference on Computer V ision and Pattern Recognition (CVPR), IEEE, pp 3310–3317, DOI 10.1109/CVPR. 2014.417 Krikorian R (2013) New tweets per second record, and ho w! URL https://blog.twitter.com/engineering/en_us/a/2013/ new- tweets- per- second- record- and- how.html LeCun Y (1985) Une procedure d’apprentissage pour reseau a seuil asym- metrique (a learning scheme for asymmetric threshold networks). In: Pro- ceedings of Cogniti va 85 LeCun Y , Boser B, Denker JS, Henderson D, Ho ward RE, Hubbard W , Jackel LD (1989) Backpropagation applied to handwritten zip code recognition. Neural Computation 1(4):541–551, DOI 10.1162/neco.1989.1.4.541 Niblack CW , Barber R, Equitz W , Flickner MD, Glasman EH, Petkovic D, Y anker P , Faloutsos C, T aubin G (1993) Qbic project: querying images by content, using color , texture, and shape. In: Storage and retrie v al for image and video databases, International Society for Optics and Photonics, vol 1908, pp 173–188, DOI 10.1117/12.143648 Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2007) Object retrie v al with large vocab ularies and fast spatial matching. In: Proceedings of the IEEE Conference on Computer V ision and Pattern Recognition, DOI 10.1109/ CVPR.2007.383172, dataset online av ailable: http://www.robots.ox. ac.uk/~vgg/data/oxbuildings/flickr100k.html Enhancing Flood Impact Analysis using Interactiv e Retriev al of Social Media Images 19 Poser K, Dransch D (2010) V olunteered geographic information for disaster management with application to rapid flood damage estimation. Geomatica 64(1):89–98 Radenovi ´ c F , T olias G, Chum O (2018) Fine-tuning cnn image retriev al with no human annotation. IEEE T ransactions on Pattern Analysis and Machine Intelligence pp 1–1, DOI 10.1109/TP AMI.2018.2846566 Rocchio JJ (1971) Relev ance feedback in information retrie val. The SMAR T retrie val system: e xperiments in automatic document processing pp 313–323 Rosser JF , Leibovici D, Jackson M (2017) Rapid flood inundation mapping using social media, remote sensing and topographic data. Natural Hazards 87(1):103–120, DOI 10.1007/s11069- 017- 2755- 0 Russako vsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpa- thy A, Khosla A, Bernstein M, Berg A C, Fei-Fei L (2015) Imagenet large scale visual recognition challenge. International Journal of Computer V ision 115(3):211–252, DOI 10.1007/s11263- 015- 0816- y Sakaki T , Okazaki M, Matsuo Y (2010) Earthquake shakes twitter users: Real- time event detection by social sensors. In: Proceedings of the 19th Interna- tional Conference on W orld W ide W eb, A CM, Ne w Y ork, NY , USA, WWW ’10, pp 851–860, DOI 10.1145/1772690.1772777 Schnebele E, Cervone G (2013) Improving remote sensing flood assessment using volunteered geographical data. Natural Hazards and Earth System Sci- ences 13(3):669–677, DOI 10.5194/nhess- 13- 669- 2013 Schölkopf B, Platt JC, Sha we-T aylor J, Smola AJ, W illiamson RC (2001) Esti- mating the support of a high-dimensional distribution. Neural computation 13(7):1443–1471, DOI 10.1162/089976601750264965 Simonyan K, Zisserman A (2014) V ery deep con v olutional networks for lar ge- scale image recognition. arXi v:14091556 Smeulders A W , W orring M, Santini S, Gupta A, Jain R (2000) Content-based image retriev al at the end of the early years. IEEE Transactions on Pattern Analysis and Machine Intelligence (TP AMI) 22(12):1349–1380, DOI 10. 1109/34.895972 Thieken AH, Bessel T , Kienzler S, Kreibich H, Müller M, Pisi S, Schröter K (2016) The flood of June 2013 in Germany: how much do we kno w about ˘ aits ˘ aimpacts? Natural Hazards and Earth System Sciences 16(6):1519–1540, DOI 10.5194/nhess- 16- 1519- 2016 T olias G, Sicre R, Jégou H (2016) Particular object retrie val with inte gral max- pooling of cnn acti vations. In: International Conference on Learning Repre- sentations (ICLR) 20 Barz, Schröter , Münch, Y ang, Unger , Dransch, Denzler T urof f M (2002) Past and future emergenc y response information systems. Communications of the A CM 45(4):29–32, DOI 10.1145/505248.505265 V iewe g S, Hughes AL, Starbird K, Palen L (2010) Microblogging during two natural hazards e vents: What twitter may contrib ute to situational aw areness. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, A CM, New Y ork, NY , USA, CHI ’10, pp 1079–1088, DOI 10.1145/ 1753326.1753486 W ang J, Zhu J, Pang S, Li Z, Li Y , Qian X (2018) Adaptive co-weighting deep con v olutional features for object retrie val. arXi v preprint arXi v:180307360 Xing EP , Jordan MI, Russell SJ, Ng A Y (2003) Distance metric learning with application to clustering with side-information. In: Advances in neural infor - mation processing systems, pp 521–528 Y in J, Lampert A, Cameron M, Robinson B, Power R (2012) Using social media to enhance emergenc y situation aw areness. IEEE Intelligent Systems 27(6):52–59, DOI 10.1109/MIS.2012.6 Zeiler MD, Fergus R (2014) V isualizing and understanding con volutional net- works. In: European Conference on Computer V ision (ECCV), Springer, pp 818–833 Zhi T , Duan L Y , W ang Y , Huang T (2016) T wo-stage pooling of deep con v olu- tional features for image retrie v al. In: Image Processing (ICIP), 2016 IEEE International Conference on, IEEE, pp 2465–2469, DOI 10.1109/ICIP .2016. 7532802

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment