Towards a Forensic Event Ontology to Assist Video Surveillance-based Vandalism Detection

T o w ards a F orensic Ev en t On tology to Assist Video Surv eillance-based V andalism Detection F aranak Sobhani 1 and Um b erto Straccia 2 1 Queen Mary Univ ersity of London, UK 2 ISTI-CNR, Italy Marc h 22, 2019 Abstract The detection and represen tation of even ts is a critical element in automated surv eillance systems. W e present here an on tology for represen ting complex semantic even ts to assist video surv eillance-based v andalism detection. The ontology contains the deﬁnition of a rich and articulated ev ent vocabulary that is aimed at aiding forensic analysis to ob jectively iden tify and represen t complex even ts. Our on tology has then b een applied in the con text of London Riots, which to ok place in 2011. W e report also on the experiments conducted to support the classiﬁcation of complex criminal even ts from video data. 1 In tro duction In the context of v andalism and terrorist activities, video surveillance forms an integral part of an y inciden t inv estigation and, thus, there is a critical need for developing an “automated video surv eillance system” with the capabilit y of detecting complex ev ents to aid the forensic in v estigators in solving the criminal cases. As an example, in the aftermath of the London riots in August 2011 p olice had to scour through more than 200,000 hours of CCTV videos to iden tify susp ects. Around 5,000 oﬀenders w ere found by trawling through the footage, after a process that to ok more than ﬁv e months. With the aim to dev elop an op en and expandable video analysis framework equipp ed with to ols for analysing, recognising, extracting and classifying even ts in video, which can b e used for search- ing during in vestigations with unpredictable characteristics, or exploring normative (or abnormal) b eha viours, several eﬀorts for standardising ev ent representation from surv eillance fo otage hav e b een made [9, 10, 11, 22, 23, 28, 30, 37]. While v arious approaches hav e relied on oﬀering foundational supp ort for the domain on tology extension, to the best of our kno wledge, a systematic ontology for standardising the even t v o cabulary for forensic analysis and an application of it has not b een presented in the literature so far. In this pap er, we present an OWL 2 [25] ontology for the seman tic retriev al of complex ev ents to aid video surveillance-based v andalism detection. Speciﬁcally , the on tology is a deriv ativ e of the DOLCE foundational ontology [7] aimed to represent even ts that forensic analysts commonly 1 encoun ter to aid in the inv estigation of criminal activities. The systematic categorisation of a large n umber of even ts aligned with the philosophical and linguistic theories enables the ontology for in terop erabilit y b etw een surv eillance systems. W e also rep ort on the exp erimen ts we conducted with the dev elop ed on tology to supp ort the (semi-) automatic classiﬁcation of complex criminal ev ents from semantically annotated video data. Our work signiﬁcantly extends the preliminary works [12, 31]. The work [12] is an embry onal w ork inv estigating ab out the use of an ontology for automated visual surveillance systems, whic h then has b een then further developed in [31]. While our work shares with [31] some basic principles in the dev elopment of the ontology , here the level of details is now higher (e.g., the Endurant class (see Section 4.2.2 and its sub-classes hav e not b een addressed in [31]) and v arious ontological errors ha ve b een revised. Additionally , and somewhat more imp ortan t, in our work exp erimen ts ha ve b een conducted for criminal even t classiﬁcation based on London 2011 riots videos. F urthermore, but less related, is [32] in which the technical challenges facing researchers in developing computer vision tec hniques to process street-scene videos are addressed. The w ork focusses on standard image pro cessing metho ds and do es not deal with ontologies in any wa y . The remainder of the pap er is organised as follows. Related w ork is addressed in Section 2. F or the sake of completeness, Section 3 presen ts complementary , theoretical material to ease the understanding of the ontology expressions used in our w ork. Section 4 presen ts a detailed description of the forensic ontology ab out complex criminal even ts. In Section 5 w e discuss how to use the on tology to assist video surveillance-based v andalism detection. In Section 6 we conduct some exp erimen ts with our ontology based on CCTV fo otage of London riots from 2011, and ﬁnally , Section 7 concludes. 2 Related W ork In [23], the Event R e c o gnition L anguage (ERL) is presented, which can describ e hierarc hical rep- resen tation of complex spatiotemp oral and logical even ts. The prop osed even t structure consists of primitive, single-thread, and multi thread even ts. Another even t representation ontology , called CASE E , is based on natural language representation and is prop osed in [11] and then extended in [10]. Subsequently , in [9, 22] a Vide o Event R epr esentation L anguage (VERL) was prop osed for describing an ontology of even ts and the companion language called Vide o Event Markup L anguage (VEML), which is a representation language for describing even ts in video sequences based on O WL [21]. In [30], even t detection is p erformed using a set of rules using the SWRL language [24]. The Event Mo del E [37] has b een dev elop ed based on an analysis and abstraction of even ts in v arious domains suc h as researc h publications, personal media [1], meetings [13], enterprise collab oration [14] and sp orts [26]. The framework provides a generic structure for the deﬁnition of even ts and is extensible to the requirements ontology of even ts in the most diﬀeren t concrete applications and domains. In [28] a formal mo del of even ts is presented, called Event-Mo del-F . The mo del is based on the foundational ontology DOLCE+DnS Ultralite (DUL) and pro vides comprehensiv e supp ort to represen t time and space, ob jects and p ersons, as well as mereological, casual, and correlative re- lationships b et ween ev ents. In addition, the Even t-Model-F provides a ﬂexible means for even t comp osition, mo delling even t causality and even t correlation, and representing diﬀerent interpreta- tions of the same even t. The Ev ent-Model-F is developed following the pattern-oriented approach of DUL, is mo dularised in diﬀerent ontologies, and can b e easily extended by domain sp eciﬁc on- tologies. 2 While the ab o ve-men tioned approaches essen tially pro vide framew orks for the representation of even ts, none of them address the problem of formalising forensic even ts in terms of a standard represen tation language such as OWL 2 1 and, imp ortan tly , none hav e b een applied and tested so far in a real use case, which are the topics of the following sections. 3 Description Logics Basics F or the sak e of completeness of this w ork, we recap here succinctly some basic notions related to the Description L o gics (DLs) family of languages, which is the logical counterpart of the Ontolo gy Web L anguage (OWL 2) [25]. W e will in fact use DL expressions to explain the ingredients of the forensic even t ontology , which we will use in the following sections. W e refer the reader to e.g. [2, 3] for further insights on DLs. Roughly , DLs allow to describ e classes (also called concepts), prop erties of classes and relation- ships among classe s and prop erties. F ormally , we start with the basic DL called ALC ( A ttributive L anguage with C omplement) [29]. Elemen tary descriptions are atomic c onc epts , also called c onc ept names (denoted A ) and atomic r oles (denoted r ). Complex c onc epts (denoted C ) can b e built from them inductiv ely with concept constructors. Sp eciﬁcally , concepts in ALC are formed according to the following syntax rule: C, D → A | (atomic concept) > | (univ ersal concept) ⊥ | (bottom concept) C u D | (concept conjunction) C t D | (concept disjunction) ¬ C | (concept negation) ∀ r .C | (univ ersal restriction) ∃ r .C (qualiﬁed existential restriction) . Sometimes, the abbreviation ∃ R is used in place of the unqualiﬁed existential restriction ∃ R . > . W e further extend ALC b y allowing b oth (i) c omplex r oles (or, simply r oles ) deﬁned inductively from atomic roles as follows ( r, s are roles): • an atomic role is a role; • r − is a role and denotes the inv erse of role r ; • r ◦ s is a role and denotes the comp osition of role r and s ; and (ii) existential value r estrictions , whic h are concepts of the form ∃ r. { a } , where a is an individual. W e recall that, informally , a concept denotes a set of ob jects, while a role is a binary relation o ver ob jects. So, for instance, the concept Person u ∃ hasChild . Femal will denote people having a female child, while Person u ∃ hasFriend . { ma ry } will denote p eople having mary as friend. A know le dge b ase K consists of a ﬁnite set of axioms. An axiom ma y b e a • Gener al Conc ept Inclusion axiom (GCI) C v D , where C and D are concepts (read it as “all instances of C are instances of D ”). An example of GCI is Male v ¬ Female . 1 W e recall that the relationship to our previous w ork [12, 31, 32] has b een addressed in the introductory section 3 • c onc ept and r ole assertion axiom C ( a ) and r ( a, b ), resp ectively , where a and b are individuals. Examples of assertion axioms are Person ( tim ) (“tim is a p erson”) and hasChild ( tim , pat ) (“tim has pat as child”). • R ole Inclusion axiom (RI) r v s , where r and s are roles (read as “all instances of role r are instances of role s ”). An example of RI is hasPa rt ◦ hasPa rt v hasPa rt , which declares role hasP art as transitive. F rom a semantics point of view, an interpr etation I is a pair I = (∆ I , · I ) consisting of a non-empty set ∆ I (called the domain ) and of an interpr etation function · I that assigns to each atomic concept a subset of ∆ I , to each role a subset of ∆ I × ∆ I and to each individual a an element in ∆ I . Ev entually , the mapping · I is extended to complex roles as follows: ( r − ) I = {h x, y i | h y , x i ∈ r I } ( r ◦ s ) I = {h x, y i | ∃ z s.t. h x, z i ∈ r I and h z , y i ∈ s I } . The mapping · I is extended to complex concepts as follows: > I = ∆ I ⊥ I = ∅ ( C u D ) I = C I ∩ D I ( C t D ) I = C I ∪ D I ( ¬ C ) I = ∆ I \ C I ( ∀ r .C ) I = { x ∈ ∆ I | r I ( x ) ⊆ C I } ( ∃ r .C ) I = { x ∈ ∆ I | r I ( x ) ∩ C I 6 = ∅} ( ∃ r . { a } ) I = { x ∈ ∆ I | r I ( x ) ∩ { a I } 6 = ∅} , where r I ( x ) = { y : h x, y i ∈ r I } . The satisﬁability of an axiom E in an interpretation I = (∆ I , · I ), denoted I | = E , is deﬁned as follo ws: I | = C v D iﬀ C I ⊆ D I , I | = r v s iﬀ r I ⊆ s I , I | = C ( a ) iﬀ a I ∈ C I , and I | = r ( a, b ) iﬀ h a I , b I i ∈ r I . Given a knowledge base K , we sa y that I satisﬁes K iﬀ I satisﬁes each element in K . If I | = K we sa y that I is a mo del of K . An axiom E is a lo gic al c onse quenc e of a knowledge base K , denoted K | = E , iﬀ every mo del of K satisﬁes E . Determining whether K | = C ( a ) is called the instanc e che cking pr oblem , while determining whether K | = C v D is called the subsumption pr oblem . 4 A F orensic Ev en t Ontology In the following, we presen t an OWL 2 ontology to supp ort to some extent the semantic retriev al of complex even ts to aid automatic or semi-automatic video s urv eillance-based v andalism detection. The idea is to develop an ontology that not only conv eys a shared vocabulary , but some inferences based on it may assist a human b eing to supp ort the video analysis by hinting to videos that may b e more relev ant than others in the detection of criminal even ts. 4 Spati oTem por a l Parti cula r Abs tr act Regio n Perdurant Propo s it io n Set Endurant Quali ty Part icular Thing Figure 1: The foundational DOLCE ontology . 4.1 The Role of a F oundation Ontology T o facilitate the elimination of the terminological ambiguit y and the understanding and interoper- abilit y among p eople and machines [19], it is common practice to consider a so-called foundational ontolo gy . Let us note that several eﬀorts hav e b een tak en b y researchers in deﬁning the foun- dational ontologies, such as BFO, 2 SUMO, 3 UF O 4 and DOLCE, 5 to name a few. AS DOLCE on tology oﬀers a cognitive bias with the ontological categories underlying natural language and h uman common sense, the same is selected for our prop osed extension. W e recall that the DOLCE foundational on tology (see Figure 1) encompasses Endurant and P erdurant en tities. Endurant en tities are ev er-present at any time as opposed to Perdurant entities that extended in time b y accum ulating diﬀeren t temp oral parts. A more thorough explanation on the DOLCE ev ents conceptualisation can b e found e.g. in [7]. 4.2 A F orensic Complex Even t Ontology Our complex ev ent classes extend DOLCE’s Perdurant class. T o assign the action classes into resp ectiv e categories, w e follo w a four-wa y classiﬁcation of action-v erbs: namely , into State , Pro cess , Achievement and Accomplishment using even t prop erties such as telic , stage and cumulative (see [27, 35, 36]). The distinction betw een these concepts are deriv ed from the even t prop erties as illustrated in T able 1, which we summarise b elo w. • State [-telic,-stage] This action category represents a long, non-dynamic ev ent in which 2 http://ifomis.uni-saarland.de/bfo/ 3 http://www.adampease.org/OP/ 4 https://o xygen.informatik.tu-cottbus.de/drupal7/ufo/ 5 http://www.loa.istc.cnr.it/old/P apers/DOLCE2.1-FOL.pdf 5 T able 1: Classiﬁcation of Even t Types. State -telic -stage cum ulative Pro cess -telic +stage - Achievement +telic -stage not cumulativ e Accomplishment +telic +stage not cumulativ e ev ery instance is the same: there cannot b e any distinction made b et ween the stages. States are cumulativ e and homogenous in nature. • Pro cess [-telic, +stage] The action category , like State , is ate lic, but unlik e State , the action undertaken are dynamic. The actions app ear progressively and thus can b e split into a set of stages for analysis. • Accomplishment [+telic, +stage] Accomplishmen ts are telic and cum ulative activities, and thus b eha ve diﬀerently from both State and Process . The p erformed action can be analysed in stages and in this w ay , they are similar to Pro cess . Intuitiv ely , an accomplishment is an activity which mov es tow ard a ﬁnishing p oint as it has v ariously b een called in the literature. Accomplishment is also cumulativ e activity . • Achiev emen t [+telic, -stage] Achiev emen ts are similar to Accomplishment in their telicity . They are also not cumulativ e with resp ect to contiguous ev ents. Achiev ements do not go on or progress, b ecause they are near instantaneous, and are ov er as so on as they hav e b egun. 4.2.1 F orensic Perduran t Entities P erdurant entities, extend in time by accumulating diﬀerent temp oral parts and some of their prop er temp oral parts may be not present. T o this end, Perdurant entities are divided into the classes Event and Stative , classiﬁed according to their temp oral characteristics. Axiom sets (1)-(5) b elo w pro vide a subset of our formal extension of the Perdurant vocabulary . The forensic extension of the ontology structure is shown in Figure 2. P erdurant v SpatioT emp oralP a rticular P erdurant v ∃ participant . Endurant Fighting v ∃ participant . GroupOfP eople P erdurant v ¬ Endurant Kicking v ¬ Vehicle (1) State v Stative MetaLevelEvent v State Accusing v MetaLevelEvent Beiliving v MetaLevelEvent PsycologicalAggression v State Blaming v PsycologicalAggression Bullying v PsycologicalAggression (2) 6 Pro cess v Stative Action v Process Gesture v Process PhysicalAggresion v Process ActivePhysicalAggresion v PhysicalAggresion (3) Accomplishment v Event CriminalEvent v Accomplishment EventCatego ry v Accomplishment Crimecatego ry v Stative (4) Achievement v Event Sa ying v Achievement Seeing v Achievement . (5) An excerpt of the forensic ontology is shown in Figure 3 and strictly adheres to the ab o v e termino- logical determination of action categories as mentioned b efore and extends the classes with suitable ev ent concepts. The concept State oﬀers representation for MetaLevelEvent which encompasses abstract human ev ents such as Accusing , Believing and Liking among others. As previously stated, the concept State represents a collection of even ts which are exhibited b y a human that is time-consuming, non- dynamic, cum ulative and homogenous. The other sub-class of State is PsychologicalAggression whic h c haracterises the human actions suc h as Blaming , Decrying , Harassing and so forth. The concept Pro cess includes several human action categories that represent dynamic even ts whic h can b e split in to several intermediate stages for analysis. F or the purp oses of clarity , the concept Pro cess oﬀers three sub-concepts namely Action , Gesture and PhysicalAggression . The Action class incorp orates diﬀeren t even t such as Dancing , Greeting , Hugging among other concepts deﬁned. The concept Gesture formalises the diﬀerent interest p oin ts related to human gestures. In order to eliminate the am biguity traditionally present in human gestures across cross-cultural impact, the action p erformed during the gesture is captured and represented in the ontology and, thus, enabling the remov al of sub jectivity from the concept deﬁnition. The ﬁnal sub-class of the Process class includes the concept PhysicalAggression and formalises h uman conﬂicting actions. By and large, the human action categorised into State and Pro cess represent the microscopic mo vemen ts of humans. F rom the automatic surveillance viewp oin t, these microscopic even ts may b e extracted from media items. In contrast, the even t represen tation formalised b y means of the concepts Achievement and Accomplishment oﬀer a rich combination of human even ts that allow for the construction of complex even ts with or without the combination of microscopic features. F or instance, the concept hierarc hy for Vandalism is illustrated in Figure 4, while the concept hierarch y for Cyb erCrime is sho wn in Figure 5 instead. 4.2.2 F orensic Endurant En tities DOLCE is based on fundamen tal distinction among Endurant and Perdurant en tities. The diﬀerence b et w een Endurant and Perdurant entities is related to their b eha viour in time. Endurant are wholly presen t at any time they are present. Philosophers b eliev e that enduran t are entities that are in 7 Figure 2: The forensic extension of the Perdurant class. 8 Per dura nt Event St at ive Ach ievement Accomplis h men t Sayi ng See ing Pr oces s St at e Action Ges tu re Phys ica l Agg r essi on Mat aLevel Event Psyco logi cal Agg r essi on Figure 3: The P erdurant class hierarch y for forensic even ts descriptions. Vandal is m Enter ing Pr oper t y Dam ag e Vehi c le Gu n Shot Att emp te d Forc ibleE ntry Forc ible Entr y Unla wful Entr y Gra ff iti Maki ng Dam ag e St ru ctur e Mol otov Throwi ng Damag e Apartment Figure 4: The concept hierarch y of V andalism , direct sub class of CrimeAgainstProp ert y . The latter is a sub class of class Accomplishment . time while lacking, how ever, temp oral parts [19]. Therefore, the prop osed vocabulary structure of all p ossible forensic entities also extends on Endurants entities. Axiom set (6) describ es a subset formalization of the Endurant vocabulary and an excerpt of the 9 Cybe r Cr i me TheftOf Informati on Cybe r Bullying Cybe r Threat Cybe r mobbi ng Cybe r stalkin g Black mail Botn et Mal war e Phi shi ng Hac k ing TheftOf Identi ty TheftOf Pas sw ord Figure 5: The concept hierarch y of Cyb erCrime . forensic extension of the ontology structure shown in Figure 6. Endurant v SpatioT emp o ralPa rticular Endurant v ∃ pa rticipantIn . Perdurant pa rticipantIn = pa rticipant − NonPhysicalEndurant v Endurant PhysicalEndurant v Endurant Arbitra rySum v Endurant . (6) 5 Assisting Video Surv eillance-based V andalism Detection W e next show how the so far dev elop ed ontology is exp ected to b e used to assist video surveillance- based v andalism detection. 5.1 Annotating Media Ob jects, viz. Surv eillance Videos Giv en surveillance videos and any media in general, we need a metho d to annotate them by using the terminology provided b y our ontology . This gives rise to a set of facts that, together with the inferred facts, may supp ort a more eﬀective automatic or, more likely , semi-automatic retriev al of relev ant information, such as e.g. v andalic acts. Sp eciﬁcally , the inferred information may suggest a user lo ok at some e.g. video sequences or video still images, rather than to others ﬁrst. The general mo del we are inspired on is based on [20]. Conceptually , according to [20], a media ob ject o (e.g., an image region, a video sequence, a piece of text, etc.) is annotated with one (or more) entities t of the ontology (see e.g. Figure 7). 10 Non Physical Endurant Physical Object Physical Endurant Arbitrary Sum NonAge ntive PhysicalO bject Social Object Material Artifact Agentive Physical Object Figure 6: Excerpt of the Endurant concept hierarch y in the forensic ontology . F or instance, stating that an image ob ject o is ab out a DamageVehicle can b e represented con- ceptually via the DL expression ( ∃ isAb out . DamageV ehicle )( o ) . As sp eciﬁed in [20], such an annotation ma y come manually from a user or, if, av ailable, from an image classiﬁer. In the latter case, it may annotate the image automatically , or, semi-automatically b y suggesting to a human annotator , which are the most relev an t entities of the ontology that may b e used for a sp eciﬁc media ob ject o . Note, ho wev er, that, the ab o ve metho dology just illustrates the concept. In our case, for the sake of ease the annotation, we may not enforce the use of the ob ject prop ert y isAb out (see Example 2 later on). Generally , we will annotate a Resource with P erdurant s and Endurant s: thus, if an image is annotated with e.g. a p erduran t that is a damaged v ehicle, then this means that the image is ab out a damaged vehicle. W e recall that Resource s (and Source s) are mo delled as follows (see Figure 8): Source v Endurant u ∃ has . Resource u∃ hasCameraId . string u∃ hasLatitude . string u∃ hasLongitute . string u∃ hasLo cationName . string Resource v Endurant u ∃ has . P erdurant has = isF rom − has ◦ has v has . 11 Figure 7: Examples of still image annotations from the London Riots 2011 of even ts as p er T able 2. Note that in the last RI axiom, ◦ is role comp osition and, thus, has ◦ has v has dictates that the prop ert y has is transitive, while with has = isFrom − w e say that isFrom is the inv erse of has . Therefore, isFrom is transitive as well. The following example illustrates the mechanism of image of annotation together with a mean- ingful inference. Example 1. Consider the fol lowing DL axioms r esulting fr om annotating images of a vide o ( video6 ) r e gister e d by a c amer a ( cameraC004 ): pa rticipateIn ( p ersonA , throwing5 ) Thro wing ( thro wing5 ) NaturalP erson ( p ersonA ) Thro wing v ActivePhysicalAggression ActivePhysicalAggression v PhysicalAggression PhysicalAggression v Pro cess isF rom ( throwing5 , endurant6 ) Resource ( endurant6 ) hasVideoId ( endurant6 , video6 ) Source ( endurant7 ) hasCameraId ( endurant7 , cameraC004 ) has ( endurant7 , endurant6 ) . Now, as isFrom is tr ansitive, we may infer: isF rom ( throwing5 , endurant7 ) . 12 Figure 8: Mo delling transitive prop ert y axiom in our deﬁned ontology . Figure 9: Example of DamageVehicle and DamageStructure scenes in CCTV. Then, it is not diﬃcult to se e that we ﬁnal ly infer ∃ paticipateIn . ( PhysicalAggression u ∃ isF rom . ( Source u ∃ hasCameraID . { cameraC004 } )) ( PersonA ) , which c an b e r e ad as: “A p erson ( P ersonA ) p articip ate d in a physic al aggr ession that has b e en r e gister e d by c amer a C004”. 5.2 Mo delling GCIs for V andalism Ev ent Detection As w e are focusing on forensic domain and dealing with v ariety of concepts aiming at aiding forensic analysis, to ob jectiv ely identify and represen t complex even ts, we next show that a (manually build) Gener al Conc ept Inclusion (GCI) axiom may help to classify high-level even ts in terms of a comp osition of some low er level even ts. The following are such GCI examples: DamageV ehicle: P erdurant u ∃ pa rticipant . ( Vehicle u ∃ pa rticipantIn . ( BreakingDo o r t BreakingWindo ws )) v DamageVehicle . 13 Figure 10: Example of Vandalism scenes in CCTV videos. “If an event involves a vehicle that is subje ct of a br e aking do or or br e aking windows then the event is ab out a damage d vehicle” (see Figure 9). DamageStructure: P erdurant u ∃ pa rticipant . ( Structure u ∃ pa rticipantIin . Kicking ) v DamageStructure . “If an event involves a structur e that is subje ct of kicking, then the event is ab out a damage d structur e” (see Figure 9). The following example illustrates the use of such GCIs. Example 2. Supp ose we have an image classiﬁer that is able to pr ovide us with the fol lowing facts. Sp e ciﬁc al ly, assume it is able to identify vehicles and br e aking windows: pa rticipant ( Perdurant2 , Endurant1 ) , V ehicle ( Endurant1 ) BreakingWindo ws ( Perdurant2 ) . F r om these facts and the GCI ab out DamageVehicle , we may infer that the image is ab out a damage d vehicle, i.e. we may infer DamageV ehicle ( Perdurant2 ) . The following set of GCIs illustrates instead ho w one may hav e multiple GCIs to classify a single ev ent, such as those for Vandalism (see, e.g. Figure 10). 6 P erdurant u ∃ pa rt . ( Crowding u DamageStructure ) v Vandalism P erdurant u ∃ pa rt . ( Crowding u DamageVehicle ) v Vandalism P erdurant u ∃ pa rt . ( Explosion u Throwing ) v Vandalism . Note that in the example ab o ve, we assume that even ts (p erdurant) may b e complex in the sense that they may comp ose by multiple sub-ev ents (parts). So, e.g. in the last GCI, w e roughly state “If a (c omplex) event involves b oth thr owing and an explosion (two sub-events) then the event is ab out vandalism” . 6 Recall that all these GCIs pro vide suﬃcien t conditions to be an instance of V andalism , but no necessary condition. 14 Figure 11: Examples of even ts that happ en in the same lo cation ( lo catedSameAs ) from CCTV. F ollowing our previous examples, w e next are going to form ulate another kind of bac kground kno wl- edge. Our main fo cus in this example is on recognizing high-level even ts, which o ccur in the same lo cation (same street in our mo delling). In order to mo del this scenario, we may use the Semantic Web Rule L anguage (SWRL) to mo del the lo catedSameAs role and then use it in GCIs. The SWRL rule is: “Two p er dur ants that o c cur in the same str e et o c cur in the same plac e. P erdurant (? p 1) , P erdurant (? p 2) , hasLo cationName (? p 1 , ? l 1) , hasLo cationName (? p 2 , ? l 2) , SameAs (? l 1 , ? l 2) → lo catedSameAs (? p 1 , ? p 2) . The follo wing axioms illustrate how to use the previously deﬁned relation (few examples captured from our data set by these rules are illustrated in Figure 11). P erdurant u ∃ pa rt . ( Crowding u ∃ lo catedSameAs . Explosion ) v Vandalism P erdurant u ∃ pa rt . ( Crowding u ∃ lo catedSameAs . DamageStructure ) v Vandalism P erdurant u ∃ pa rt . ( Crowding u ∃ lo catedSameAs . Thro wing ) v Vandalism P erdurant u ∃ pa rt . ( DamageStructure u ∃ lo catedSameAs . Thro wing ) v Vandalism . 6 Exp erimen ts W e conducted t wo exp erimen ts with our ontology , which we are going to describ e in the following. 7 In the ﬁrst case, we ev aluated the classiﬁcation eﬀectiveness of man ually built GCIs to identify crime even ts, while in the second case we drop the manual-built GCIs and, try to learn such GCIs instead automatically from examples and compare their eﬀectiveness with resp ect the manually built ones. 7 The ontologies used in the exp eriments and exp erimen tal results can b e found at http://www.umbertostraccia. it/cs/ftp/ForensicOntology.zip . 15 T able 2: Criminal even t classes considered. V andalism (13 , 57) Riot (4 , 21) Abno rmalBehavior (2 , 80) Cro wding (1 , 64) DamageStructure (3 , 9) DamageVehicle (3 , 16) Thro wing (1 , 30) T able 3: On tology Metrics. Axioms 9889 Logical axiom count 7176 Class count 483 Ob ject prop erty count 148 Data prop erty count 51 Individual count 1800 DL expressivity SHIQ(D) SubclassOf axioms count 532 Equiv alen tClasses axioms count 5 DisjointClasses axioms count 11 GCI count 38 Hidden GCI Count 5 SubOb jectProp ert yOf axioms count 93 Inv erseOb jectPropertie s axioms count 20 T ransitiveOb jectProperty axioms count 5 SymmetricOb jectProp ert y axioms count 2 Ob jectProp ert yDomain axioms count 19 Ob jectProp ert yRange axioms count 18 SubDataProperty axioms count 11 DataPropertyDomain axioms count 1 DataPropertyRange axioms count 5 ClassAssertion axioms count 1793 Ob jectProp ert yAssertion axioms count 2964 DataPropertyAssertion axioms count 1706 AnnotationAssertion axioms count 195 6.1 Classiﬁcation via Man ually Built GCIs Roughly , we ha ve considered a num b er of crime videos, annotated them manually and then chec ked whether the manually built GCIs, as describ ed in Section 5.2, were able to determine crime even ts correctly . Setup. Sp eciﬁcally , w e considered our ontology and around 3.07 TB of video data ab out the London riot 2011, 8 of which 929 (GB) is in a non-proprietary format. W e considered 140 videos (ho wev er, the videos cannot b e made publicly av ailable). Within these videos, all the av ailable CCTV cameras (35 CCTV) along with their features such as latitude, longitude, start time, end time and street name, hav e b een annotated manually according to our metho dology describ ed in Section 5 and included into our ontology . W e hav e also calculated all the geographic distances b et w een each camera. The resulting ontology contains 1800 created individuals of which, e.g. 106 are of type Event . Then, we considered criminal ev ents occurring in the videos (sp eciﬁcally , we focused on v andalic ev ents). F or each class of even ts, we manually built one or more GCIs, as illustrated in Section 5.2. The list of crime even ts considered is rep orted in T able 2. In it, the ﬁrst num b er in paren thesis 8 These are part of the EU funded pro ject LASIE “Large Scale Information Exploitation of F orensic Data”, http://www.lasie- project.eu . 16 rep orts the num b er of GCIs we built for each of them, while the second num b er indicates the n umber of even t instances (individuals) w e created during the manual video annotation pro cess. So, for instance, for the ev ent DamageStructure w e hav e built 3 classiﬁcation GCIs and we hav e created 9 instances of DamageStructure during the man ual video annotation pro cess. F or further clariﬁcation, the 3 GCIs for DamageStructure are P erdurant u ∃ pa rticipant . ( Structure u ∃ pa rticipantIin . Kicking ) v DamageStructure P erdurant u ∃ pa rticipant . ( Structure u ∃ pa rticipantIin . Beating ) v DamageStructure P erdurant u ∃ pa rticipant . ( Structure u ∃ pa rticipantIin . BreakingWindows ) v DamageStructure , while, e.g., an instance of DamageStructure is the individual Kicking1 , whose related information excerpt is: Kicking ( Kicking1 ) , isFrom ( Kicking1 , 2b df ) , Resource ( 2b df ) , isFrom ( 2bdf , C004 ) , has ( 2b df , pr11 ) , pa rt ( pr11 , Kicking1 ) , part ( p r11 , BreackingWindows3 ) , BreackingWindo ws ( BreackingWindows3 ) , . . . As a matter of general information, the global metric statistics of the so built ontology is rep orted in T able 3. Ev aluation. Let O b e the built ontology from which we drop axioms stating explicitly that an individual is an instance of a crime even t listed in T able 2. Please note that without the GCIs none of the crime even ts instances in O can b e inferred to b e instances of the crime ev ents in T able 2. 9 No w, on O we run an OWL 2 reasoner that determines the instances of all crime even t classe s in the ontology . T o determine the classiﬁcation eﬀectiveness of the GCIs, we compute the so-called micro/macro a verages of precision, recall and F1-score w.r.t. inferred data, which for clarit y and completeness we rep ort below. Sp eciﬁcally , we measure the outcomes of binary classiﬁcation using a 2 × 2 c ontingency table or c onfusion matrix , as follows: for each class C in T able 2, consider the following con tingency table T rue Condition in C is not in C Prediction in C T P C F P C is not in C F N C T N C where, • the true p ositives T P C is calculated as the num b er of p ositiv e predictions for the class C pro vided that the actual v alue is p ositiv e; 9 Roughly , crime even ts are sub classes of the Event class, while crime even t instances are instances of the class Stative (see Figure 5). 17 • the false p ositives F P C is calculated as the num b er of p ositive predictions for the class C pro vided that the actual v alue is negative; • the false ne gatives F N C is calculated as the num b er of negative predictions for the class C pro vided that the actual v alue is p ositiv e; • the true ne gatives T N C is calculated as the num b er of negative predictions for the class C pro vided that the actual v alue is negative. F ormally , for each crime even t class C , let tr ue ( C ) b e the set of manually determined instances e of it, i.e. e ∈ tr ue ( C ) denotes that fact that e has to b e an instance of C . Then 10 T P C = { e | O | = C ( e ) and e ∈ tr ue ( C ) } F P C = { e | O | = C ( e ) but e 6∈ tr ue ( C ) } F N C = { e | O 6| = C ( e ) but e ∈ tr ue ( C ) } T N C = { e | O 6| = C ( e ) and e 6∈ tr ue ( C ) } . Thereafter, with | C | = | T P C | + | F P C | , | tr ue ( C ) | = | T P C | + | F N C | , we do the following: 1. for each class C in T able 2, we determine P recision C = | T P C | | C | Recal l C = | T P C | | tr ue ( C ) | F 1 C = 2 · P recision C · R ecall C P recision C + R ecall C 2. for eac h of these measures we also compute the micro- and macro-av erage (where N is the n umber of classes in T able 2): P recision micro = P C | T P C | P C | C | Recal l micro = P C | T P C | P C | tr ue ( C ) | F 1 micro = 2 · P recision micro · R ecall micro P recision micro + R ecall micro P recision macro = P C P recision C N Recal l macro = P C Recal l C N F 1 macro = 2 · P recision macro · R ecall macro P recision macro + R ecall macro The ev aluation result of the ﬁrst test is shown in T able 4. 10 Recall that O | = C ( e ) dictates that e has b een automatically classiﬁed/inferred as b eing instance of class C . 18 T able 4: Results for the exp erimen t on classiﬁcation of manually build GCIs . Even t TP FP FN TN | C | | trueC | P r ecision C Recall C F 1 C V andalism 42 0 15 168 42 57 1.00 0.74 0.85 DamageV ehicle 11 0 5 209 11 16 1.00 0.69 0.81 DamageStructure 9 0 0 216 9 9 0.89 0.89 0.89 Crowding 60 1 4 160 61 64 0.98 0.94 0.96 Throwing 30 0 0 195 30 30 1.00 1.00 1.00 Riot 5 0 16 204 5 21 1.00 0.24 0.38 AbnormalBehaviour 70 22 10 123 92 80 0.76 0.88 0.81 P r ecision micro Recall micro F 1 micro P r ecision macro Recall macro F 1 macro 0.91 0.82 0.86 0.96 0.78 0.86 6.2 Classiﬁcation via Automatically Learned GCIs In the second experiment, w e apply a concept learning approach to replace the manually built GCIs describing the crime even ts listed in T able 2. T o this end, the DL-Learner 11 system was used to ﬁnd descriptions of the criminal even ts in T able 2, based on existing instances of these classes. W e recall roughly that DL-learner is, among others, a framework for sup ervised Machine Learning in O WL 2, whic h is capable of making class expression suggestions for a sp eciﬁed class C by relying on the instances of C . Setup . Let no w O b e the ontology as in Section 6.1, but from which w e also drop additionally the manually created GCIs for the crime ev en t listed in T able 2. On it w e used the CELOE algorithm [6, 15] with its default settings to generate suggestion deﬁnitions (inclusion axioms) for eac h target class C b y using the manually identiﬁed crime even t instances. Sp eciﬁcally , we used a K -fold cross style v alidation metho d [8], whic h divides the av ailable crime even t instances into a K disjoint subsets. That is, we split each target class C into K disjoint subsets C 1 , . . . , C K of equal size. In our exp erimen t, K is the num b er of instances of C and, thus, eac h C i has size one. F or eac h C i , the tr aining set is ( S K i =1 C i ) \ C i and is denoted as T r ainset i . Then, for each C i w e run CELOE on the training set T rainset i and generated at most 10 class expressions of the form D j v C i , out of whic h we hav e chosen the b est solution (denoted D C i v C i or GC I i ). If the b est solution is not unique, we select the ﬁrst listed one. The b est-selected GCIs found b y CELOE for each of the target classes in T able 2 are: PhysicalAggression u ∃ immediateRelation . Structure v DamageStructure ∃ immediateRelation . V ehicle v DamageVehicle ∃ immediateRelation . V andalism v AbnormalBehavio r ∃ immediateRelation . Arm v Thro wing ∃ immediateRelation . Group v Cro wding . With the help of a reasoner, we then infer all instances in O , that are not in T rainset i , that are instances of the selected D C i and consider them as our r esult set (denoted Resul tset i ). Ev aluation . T o determine the classiﬁcation eﬀectiv eness of the learned GCIs, i.e. of GC I i , precision 11 http://dl-learner.org/ 19 T able 5: Results for the exp erimen t on classiﬁcation using DL-Learner CELOE algorithm. Ev ent P recision C Recal l C F 1 C DamageV ehicle 0.69 0.98 0.81 Damage Structure 1.00 1.00 1.00 Cro wding 0.96 1.00 0.98 Thro wing 0.86 0.99 0.92 AbnormalBeha vior 0.69 0.99 0.81 P recision micro Recal l micro F 1 micro P recision macro Recal l macro F 1 macro 0.753 0.964 0.845 0.599 0.709 0.649 ( P r i ) and recall ( R e i ) across the folds is computed: • the false p ositives F P i is calculated as the diﬀerence b et ween the instances in Resul tset i and those in tr ue ( C ); • the true p ositives T P i is calculated as the diﬀerence betw een Resultset i and the false p ositiv e; • the false ne gatives F N i is calculated as the diﬀerence b et ween C i and Resul tset i . Using F P i , T P i and F N i , we hav e that P r i = | T P i | | T P i | + | F P i | , while Re i = | T P i | | T P i | + | F N i | . W e then determine b oth av erage precision ( P r ecision C ) and recall ( R ecall C ) across the folds, i.e. P recision C = 1 K · k X i =1 P r i Recal l C = 1 K · k X i =1 Re i . Using these measures we then compute the micro- and macro-a verages of precision, recall and F1 as in Section 6.1. The ev aluation results of the second test are shown in T able 5. Discussion. The results are generally promising. In the manually built GCI case, precision and F1 are reasonably go o d, though in one case ( Riot ) the recall and, thus, F1 is not satisfactory . F or the learned GCI case, the individual measures are generally comparable to the manual ones. Giv en that the learned GCIs are completely diﬀerent than the man ually built ones, it is surprising that b oth sets p erform more or less the same. Ho wev er, please note that DL-Learner was neither able to learn a GCI for Vandalism nor for Riot . This fact is reﬂected in the generally worse micro/macro precision, recall and F1 measures. Ev entually , we also merged the manually built GCIs and the learned ones together and tested them as in Section 6.1. The results in T able 6 show, how ever, that globally their eﬀectiveness is as for the manual case (and do es not improv e). 20 T able 6: Results of merging man ual and learned GCIs. Ev ent P recision C Recal l C F 1 C V andalism 1.00 0.74 0.85 DamageV ehicle 1.00 0.69 0.81 Damage Structure 0.89 0.89 0.89 Cro wding 0.98 0.94 0.96 Thro wing 1.00 1.00 1.00 Riot 1.00 0.24 0.38 AbnormalBeha vior 0.76 0.89 0.82 P recision micro Recal l micro F 1 micro P recision macro Recal l macro F 1 macro 0.90 0.82 0.86 0.95 0.77 0.85 7 Conclusions In this w ork, w e hav e proposed an extensiv e ontology for representing complex criminal ev ents. The prop osed ontology focuses on even ts that are often required by forensic analysts. In this con text, the P erdurant , as deﬁned in the DOLCE ontology as an o ccurrence in time, and the Endurant , deﬁned in the DOLCE on tology as con tentious in time, hav e b oth b een extended to represent all forensics en tities together with meaningful entities for video surveillance-based v andalism detection. The aim of the built ontology is to supp ort the interoperability of the automated surveillance system. T o classify high-level even ts in terms of the comp osition of low er level ev ents w e fo cused on b oth manually built and automatically learned GCIs and hav e compared the ev aluation results of b oth exp eriments. The results are generally promising and the eﬀectiveness of mac hine derived deﬁnitions for high-level crime even ts is encouraging though needs further dev elopment. In the future, we intend to deal with v ague or imprecise knowledge and we would lik e to work on the problem of automatically learn fuzzy concept description [4, 5, 16, 17, 18, 33, 34] as mosto of the inv olv ed entities are fuzzy . Ac kno wledgemen ts This work is partially funded by the Europ ean Union’s Seven th F ramework Programme, gran t agreemen t num b er 607480 (LASIE IP pro ject). References [1] Preetha Appan and Hari Sundaram. Netw ork ed multimedia even t exploration. In Pr o c e e dings of the 12th Annual ACM International Confer enc e on Multime dia , pages 40–47. ACM, 2004. 21 [2] F ranz Baader, Diego Calv anese, Deborah McGuinness, Daniele Nardi, and Peter F. Patel- Sc hneider, editors. The Description L o gic Handb o ok: The ory, Implementation, and Applic a- tions . Cambridge Universit y Press, 2003. [3] F ranz Baader, Ian Horro c ks, and Ulrike Sattler. Description logics. In Steﬀen Staab and Rudi Studer, editors, Handb o ok on Ontolo gies , International Handb o oks on Information Systems, pages 21–43. Springer V erlag, 2009. [4] F ernando Bobillo and Umberto Straccia. F uzzy ontology representation using OWL 2. Inter- national Journal of Appr oximate R e asoning , 52:1073–1094, 2011. [5] F ernando Bobillo and Umberto Straccia. The fuzzy ontology reasoner fuzzyDL . Know le dge- Base d Systems , 95:12 – 34, 2016. [6] Lorenz B ¨ uhmann, Jens Lehmann, and Patric k W estphal. DL-Learner framework for inductive learning on the seman tic web. Web Semantics: Scienc e, Servic es and A gents on the World Wide Web , 39:15–24, 2016. [7] Rob erto Casati and Achille V arzi. Ev ents. In Edward N. Zalta, editor, The Stanfor d Ency- clop e dia of Philosophy . Winter 2015 edition, 2015. http://plato.stanford.edu/archives/ win2015/entries/events/ . [8] George F orman and Martin Scholz. Apples-to-apples in cross-v alidation studies: pitfalls in classiﬁer p erformance measurement. ACM SIGKDD Explor ations Newsletter , 12(1):49–57, 2010. [9] Alexandre R. J. F ran¸ cois, Ram Nev atia, Jerry R. Hobbs, and Rob ert C. Bolles. VERL: an on tology framework for representing and annotating video ev ents. IEEE MultiMe dia , 12(4):76– 86, 2005. [10] Asaad Hakeem, Khurram Shaﬁque, and Mubarak Shah. An ob ject-based video co ding frame- w ork for video sequences obtained from static cameras. In Pr o c e e dings of the 13th Annual A CM International Confer enc e on Multime dia , pages 608–617. ACM, 2005. [11] Asaad Hakeem, Y aser Sheikh, and Mubarak Shah. CASE E : A hierarchical ev ent representa- tion for the analysis of videos. In Pr o c e e dings of the 19th National Confer enc e on Artiﬁcial Intel ligenc e , pages 263–268. AAAI Press, 2004. [12] Craig Henderson, Sa verio G. Blasi, F aranak Sobhani, , and Ebroul Izquierdo. On the impurity of street-scene video fo otage. In 6th International Confer enc e on Imaging for Crime Pr evention and Dete ction (ICDP-15) . The Institution of Engineering and T echnology (IET), 2015. [13] Ramesh Jain, Pilho Kim, and Zhao Li. Exp eriential meeting system. In Pr o c e e dings of the 2003 ACM SIGMM Workshop on Exp eriential T elepr esenc e , pages 1–12. ACM, 2003. [14] Pilho Kim, Mark Podlaseck, and Gopal Pingali. P ersonal c hronicling to ols for enhancing information archiv al and collab oration in en terprises. In Pr o c e e dings of the the 1st ACM Workshop on Continuous Ar chival and R etrieval of Personal Exp erienc es , pages 56–65. ACM, 2004. 22 [15] Jens Lehmann, S¨ oren Auer, Lorenz B¨ uhmann, and Sebastian T ramp. Class expression learning for on tology engineering. Web Semantics: Scienc e, Servic es and A gents on the World Wide Web , 9(1):71–81, 2011. [16] F rancesca A. Lisi and Umberto Straccia. A logic-based computational metho d for the auto- mated induction of fuzzy ontology axioms. F undamenta Informatic ae , 124(4):503–519, 2013. [17] F rancesca Alessandra Lisi and Umberto Straccia. A foil-lik e metho d for learning under incom- pleteness and v agueness. In 23r d International Confer enc e on Inductive L o gic Pr o gr amming , v olume 8812 of L e ctur e Notes in Artiﬁcial Intel ligenc e , pages 123–139, Berlin, 2014. Springer V erlag. Revised Selected Papers. [18] Thomas Luk asiewicz and Umberto Straccia. Managing uncertaint y and v agueness in descrip- tion logics for the semantic web. Journal of Web Semantics , 6:291–308, 2008. [19] Claudio Masolo, Stefano Borgo, Aldo Gangemi, Nicola Guarino, and Alessandro Oltramari. W onderweb Deliverable D18, Ontology Library (ﬁnal). ICT pr oje ct , 33052, 2003. [20] Carlo Meghini, F abrizio Sebastiani, and Um b erto Straccia. A model of multimedia information retriev al. Journal of the ACM , 48(5):909–970, 2001. [21] Boris Motik, Bernardo Cuenca Grau, Ian Horrocks, Zhe W u, Achille F okoue, Carsten Lutz, et al. OWL 2 Web Ontology Language Proﬁles. W3C r e c ommendation , 27:61, 2009. [22] Ram Nev atia, Jerry R. Hobbs, and Bob Bolles. An ontology for video even t representation. In IEEE Confer enc e on Computer Vision and Pattern R e c o gnition Workshops , page 119. IEEE Computer So ciet y , 2004. [23] Ram Nev atia, T ao Zhao, and Som b o on Hongeng. Hierarchical language-based representation of even ts in video streams. In IEEE Confer enc e on Computer Vision and Pattern R e c o gnition , page 39. IEEE Computer So ciet y , 2003. [24] SWRL: A Semantic W eb Rule Language Combining OWL and RuleML. https: // www. w3. org/ Submission/ SWRL/ . W3C, 2004. [25] OWL 2 W eb Ontology Language Do cumen t Overview. http: // www. w3. org/ TR/ 2009/ REC- owl2- overview- 20091027/ . W3C, 2009. [26] Gopal Sarma Pingali, Agata Opalach, Yv es D. Jean, and Ingrid B. Carlb om. Instantly indexed m ultimedia databases of real world even ts. IEEE T r ansactions on Multime dia , 4(2):269–282, 2002. [27] Susan Rothstein. Chapter 1: V erb classes and asp ectual classiﬁcation. In Structuring Events: A Study in the Semantics of L exic al Asp e ct , pages 1–35. Wiley Online Library , 2004. [28] Ansgar Sc herp, Thomas F ranz, Carsten Saathoﬀ, and Steﬀen Staab. F–A mo del of ev ents based on the foundational ontology DOLCE+DnS ultralight. In Pr o c e e dings of the Fifth International Confer enc e on Know le dge Captur e , pages 137–144. A CM, 2009. [29] Manfred Schmidt-Sc hauß and Gert Smolk a. A ttributive concept descriptions with comple- men ts. Artiﬁcial Intel ligenc e , 48:1–26, 1991. 23 [30] Lauro Snidaro, Massimo Belluz, and Gian Luca F oresti. Represen ting and recognizing complex ev ents in surveillance applications. In F ourth IEEE International Confer enc e on A dvanc e d Vide o and Signal Base d Surveil lanc e , pages 493–498. IEEE Computer So ciet y , 2007. [31] F aranak Sobhani, Krishna Chandramouli, Qianni Zhang, and Ebroul Izquierdo. F ormal repre- sen tation of even ts in a surveillance domain on tology . In 2016 IEEE International Confer enc e on Image Pr o c essing , pages 913–917. IEEE Computer So ciety , 2016. [32] F aranak Sobhani, Nur F arhan Kahar, and Qianni Zhang. An ontology framework for automated visual surveillance system. In 13th International Workshop on Content-Base d Multime dia Indexing , pages 1–7. IEEE Computer So ciet y , 2015. [33] Umberto Straccia. F oundations of F uzzy L o gic and Semantic Web L anguages . CRC Studies in Informatics Series. Chapman & Hall, 2013. [34] Umberto Straccia and Matteo Mucci. pF OIL-DL: Learning (fuzzy) E L concept descriptions from crisp OWL data using a probabilistic ensemble estimation. In Pr o c e e dings of the 30th A nnual ACM Symp osium on Applie d Computing (SAC-15) , pages 345–352, Salamanca, Spain, 2015. ACM. [35] Zeno V endler. V erbs and times. The Philosophic al R eview , 62(2):143–160, 1957. [36] Zeno V endler, editor. Linguistics in Philosophy . G - Reference, Information and In terdisci- plinary Sub jects Series. Cornell Universit y Press, 1967. [37] Utz W estermann and Ramesh Jain. T ow ard a common ev ent mo del for multimedia applications. IEEE Multime dia , 14(1), 2007. 24

Towards a Forensic Event Ontology to Assist Video Surveillance-based Vandalism Detection

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment