Learning to Generate Posters of Scientific Papers

Learning to Generate P osters of Scientiﬁc Papers ∗ Y uting Qiang 1 , Y anwei Fu 2 , Y anwen Guo 1 † , Zhi-Hua Zhou 1 and Leonid Sigal 2 1 National Ke y Laboratory for Novel Softw are T echnology , Nanjing Uni versity , Nanjing 210023, China 2 Disney Research Pittsb urgh, 4720 Frobes A venue, Lo wer Lev el, 15213, USA { qiangyuting.new ,ywguo.nju } @gmail.com, zhouzh@nju.edu.cn, { yanwei.fu,lsigal } @disneyresearch.com Abstract Researchers often summarize their work in the form of posters. Posters provide a coherent and efﬁcient way to con vey core ideas from scientiﬁc papers. Generating a good scientiﬁc poster, howe ver , is a complex and time consuming cogniti ve task, since such posters need to be r eadable , informative , and visually aesthetic . In this pa- per , for the ﬁrst time, we study the challenging prob- lem of learning to generate posters from scientiﬁc pa- pers. T o this end, a data-driven framew ork, that utilizes graphical models, is proposed. Speciﬁcally , given con- tent to display , the key elements of a good poster, in- cluding panel layout and attributes of each panel, are learned and inferred from data. Then, giv en inferred lay- out and attributes, composition of graphical elements within each panel is synthesized. T o learn and validate our model, we collect and make public a Poster-Paper dataset, which consists of scientiﬁc papers and corre- sponding posters with exhaustiv ely labelled panels and attributes. Qualitativ e and quantitative results indicate the effecti veness of our approach. Introduction The emergence of large number of scientiﬁc papers in var - ious academic ﬁelds and venues (conferences and journals) is notew orthy . For example, IEEE Conference on Computer V ision and Pattern Recognition (CVPR) accepted ov er 600 papers in 2016 alone. It is time-consuming to read all of these papers for the researchers, particularly those interested to holistically assess state-of-the-art or emerge with under- standing of core scientiﬁc ideas explored in the last year . Con verting a conference paper into a poster provides impor- tant means to efﬁciently and coherently con vey core ideas and ﬁndings of the original paper . T o achieve this goal, it is therefore essential to keep the posters readable, informa- tiv e and visually aesthetic. It is challenging, howe ver , to de- sign a high-quality scientiﬁc poster which meets all of the abov e design constraints, particularly for those researchers who may not be proﬁcient at design tasks or familiar with design packages (e.g., Adobe Illustrator). ∗ This work is supported by NSFC (61333014, 61373059, and 61321491) and JiangsuSF (BK20150016). † Corresponding author Copyright c  2016, Association for the Advancement of Artiﬁcial Intelligence (www .aaai.org). All rights reserved. In general, poster design is a complicated and time- consuming task; both understanding of the paper content and experience in design are required. Automatic tools for scientiﬁc poster generation would help researchers by providing them with an easier way to effecti vely share their research. Further , giv en avid amount of scientiﬁc papers on ArXi v and other on-line repositories, such tools may also provide a way for other researchers to consume the content more easily . Rather than browsing raw papers, they may be able to browse automatically generated poster pre views (potentially constructed with their speciﬁc preferences in mind). Howe ver , in order to generate a scientiﬁc poster in accor- dance with, and representati ve of, the original paper , many problems need to be solv ed: 1) Content extr action. Both im- portant textual and graphical content needs to be extracted from the original paper; 2) P anel layout. Content should ﬁt each panel, and the shape and position of panels should be optimized for readability and design appeal; 3) Graphical el- ement (ﬁgur es and tables) arrangement. W ithin each panel, textual content can typically be sequentially itemized, but for graphical elements, their size and placement should be carefully considered. Due to these challenges, there are fe w automatic tools for scientiﬁc poster generation. In this paper , we propose a data-driv en method for au- tomatic scientiﬁc poster generation (giv en a corresponding paper). Contents extraction and layout generation are two key components in this process. For content extraction, we use T extRank (Mihalcea and T arau 2004) to extract textual content, and provide an interface for extraction of graphical content (e.g., ﬁgures, tables, etc.). Our approach focuses pri- marily on poster layout generation. W e address the layout in three steps. First, we propose a simple probabilistic graph- ical model to infer panel attributes. Second, we introduce a tree structure to represent panel layout, based on which we further design a recursiv e algorithm to generate ne w lay- outs. Third, in order to synthesize layout within each panel, we train another probabilistic graphical model to infer the attributes of the graphical elements. Compared with posters designed by the authors, our ap- proach can generate different results to adapt to dif ferent paper sizes/aspect ratios or styles, by training our model with dif ferent dataset, and thus provides more expressi ve- ness in poster layout. T o the best of our knowledge, this pa- per presents the ﬁrst framework for poster generation from the original scientiﬁc paper . Our paper makes the follo wing contributions: • Probabilistic graphical models are proposed to learn sci- entiﬁc poster design patterns, including panel attrib utes and graphical element attributes, from e xisting posters. • A ne w algorithm, that considers both information con- ve yed and aesthetics, is dev eloped to generate the poster layout. • W e also collected and make av ailable a Poster -Paper dataset with labelled poster panels and attributes. Related W ork General Graphical Design. Graphical design has been studied extensi vely in computer graphics community . This in volves several related, yet different topics, including te xt- based layout generation (Jacobs et al. 2003; Damera- V enkata, Bento, and O’Brien-Strain 2011; Hurst, Li, and Marriott 2009), single-page graphical design (O’Donovan, Agarwala, and Hertzmann 2014; Harrington et al. 2004), photo albums layout (Geigel and Loui 2003), furnitur e lay- out (Merrell et al. 2011; Y u et al. 2011), and even interface design (Gajos and W eld 2005). Among them, text-based lay- out pays more attention on informativeness, while attractiv e- ness also needs to be considered in poster generation. Other topics would take aesthetics as the highest priority . Howe ver , some principles (such as alignment or read-order) need to be follo wed in poster design. In summary , poster generation needs to consider readability , informativ eness and aesthetics of the generated posters simultaneously . Manga Lay out Generation. Sev eral techniques hav e been studied to facilitate layout generation for western comics or manga. For example,, for example, scene frame extrac- tion (Arai and Herman 2010; Pang et al. 2014), automatic stylistic manga layout gener ation (Cao, Chan, and Lau 2012; Jing et al. 2015), and graphical elements composition (Cao, Lau, and Chan 2014). For previe w generation of comic episodes (Hoashi et al. 2011), both frame extraction and lay- out generation are considered. Other research areas, such as manga r etar geting (Matsui, Y amasaki, and Aiza wa 2011) and manga-like r endering (Qu et al. 2008) also draw con- siderable attention. Ho wever , none of these methods can be directly used to generate scientiﬁc posters, which is the fo- cus of this paper . Our panel layout generation is inspired by the recent work on Manga layout (Cao, Chan, and Lau 2012). W e use a binary tree to represent the panel layout. By contrast, the manga Layout trains a Dirichlet distribution to sample a splitting conﬁguration, and different Dirichlet distribution for each kind of instance need to be trained. Instead, we pro- pose a recursive algorithm to search for the best splitting conﬁguration along a tree. Overview Problem Formulation. Assume that we ha ve a set of posters M and their corresponding scientiﬁc papers. Each poster m ∈ M includes a set of panels P m , and each panel p ∈ P m has a set of graphical elements (ﬁgures and tables) G p . Each panel p is characterized by ﬁ ve attrib utes: text length ( l p ) text length within a panel; text ratio ( t p ) text length within a panel relativ e to text length of the whole poster , t p = l p / P q ∈ P m l q ; graphical elements ratio ( g p ) 1 the size of graphical ele- ments within a panel relative to the total size of graphical elements in the poster . panel size ( s p ) and aspect ratio ( r p ), s p = w p × h p and r p = w p /h p , where w p and h p denote the width and height of a panel with respect to the poster , separately . Each graphical element g ∈ G p has four attributes: graphical element size ( s g ) and aspect ratio ( r g ), s g = w g × h g and r g = w g /h g , where w g and h g denote the width and height of a graphical element relati ve to the whole paper respectiv ely; horizontal position ( h g ) we assume that panel content is arranged sequentially from top to bottom 2 ; hence only rel- ativ e horizontal position needs to be considered, which is deﬁned by a discrete variable h g ∈ { lef t, center, r ig ht } ; graphical element size in poster ( u g ) is the ratio of the width of the graphical element with width of the panel. T o learn ho w to generate the poster , our goal is to determine the above attributes of each panel p and each graphical el- ement g ∈ G p , as well as to infer the arrangement of all panels. Intuitiv ely , a trivial solution is to use a learning model (e.g., SVR) to learn how to regress these attributes, includ- ing s p , r p , u g , and h g , while regarding t p , g p , l p , r g , and s g as features. Howe ver , such a solution lacks an insight mechanism for exploring the relationships between the panel attributes (e.g., s p ) and graphical elements attributes (e.g., u g ). And it may fail to meet the requirements of readabil- ity , informati veness, and aesthetics. W e thus propose a nov el framew ork to solve our problem. Overview . T o generate a readable , informative and aesthetic poster , we simulate the rule-of-thumb on ho w people design the posters in practice. W e generate the panel layout, then ar- range the textual and graphical elements within each panel. Our frame work overall has four steps (as sho wn in Fig- ure 1). Howe ver , the core of our frame work focuses on three speciﬁc algorithms designed to facilitate poster generation. W e ﬁrst extract textual content from the paper using T ex- tRank (Mihalcea and T arau 2004) 3 , this will be detailed in the Experimental Result section. Non-textual content (ﬁg- ures and tables) are extracted by user interaction. All these extracted contents are sequentially arranged and represented by the ﬁrst blob in Figure 1. Inference of the initial panel key 1 Note that there is a little difference between this variable and text ratio t p . W e do not use the ﬁgure size in poster . Instead, we use the corresponding ﬁgure from the original paper . 2 This holds true when using latex beamer to make posters. 3 W e use T extRank for text content extraction, howe ver , T ex- tRank can be replaced with other state-of-the-art textual summary algorithms. Figure 1: Overvie w of the proposed approach. attributes (such as panel size s p and aspect ratio r p ) is then conducted by learning a probabilistic graphical model from the training data. Furthermore, panel layout is synthesized by developing a recursiv e algorithm to further update these key attrib utes (i.e., s p and r p ) and generate an informative and aesthetic panel layout. Finally , we compose panels by utilizing the graphical model to further synthesize the visual properties of each panel (such as the size and position of its graphical elements). Methodology Panel Attribute Inference. Our approach tries to divide a scientiﬁc poster into se veral rectangular panel blocks. Each panel should not only be of an appropriate size, to contain corresponding textual and graphical content, but also be in a suitable shape (aspect ratio) to maximize aesthetic appeal. Our approach learns a probabilistic graphical model to infer the initial values for the size and aspect ratio of each panel. As each panel is composed of both textual description and graphical elements, we assume that panel size ( s p ) and as- pect ratio ( r p ) are conditionally dependent on text ratio t p and graphical element ratio g p . Therefore, the likelihood of a set of panels p can be deﬁned as: P r ( s p , r p | t p , g p ) = Y p ∈ P P r ( s p | t p , g p ) P r ( r p | t p , g p ) (1) where P r ( s p | t p , g p ) and P r ( r p | t p , g p ) are conditional prob- ability distributions (CPDs) of s p and r p giv en t p and g p . W e deﬁne them as two conditional linear Gaussian distrib utions: P r ( s p | t p , g p ) = N ( s p ; w s · [ t p , g p , 1] T , σ s ) (2) P r ( r p | t p , g p ) = N ( r p ; w r · [ t p , g p , 1] T , σ r ) (3) where t p and g p are deﬁned by the content extr action step demonstrated in Figure 1; w s and w r are the parameters that lev erage the inﬂuence of various factors; σ s and σ r are the variances. The parameters ( w s , w r , σ s and σ r ) are esti- mated using maximum likelihood from training data. Using the learned parameters, initial attrib utes of each panel can be inferred. Note that in order to learn from limited data, this step ac- tually employs two assumptions: (1) s p and r p are condi- tionally independent; (2) The attribute sets of panels are in- dependent. W e need the panels to be neither too small in size ( s p ), nor too distorted in aspect ratio ( r p ), to ensure read- able, informativ e and aesthetic poster . The two assumptions introduced here are sufﬁcient for this task. Furthermore, the attribute values estimated from this step are just good ini- tial values for the property of each panel. W e use the next two steps to further relax these assumptions and discuss the relationship between s p and r p , as well as the relationship among different panels (Algorithm 1). T o ease exposition, we denote the set of panels as L = { ( s p 1 , r p 2 ) , ( s p 2 , r p 2 ) , · · · , ( s p k , r p k ) } , where s p i and r p i are the size and aspect ratio of i th panel p i , separately; with | L | = k . Panel Layout Generation . One con ventional way to design posters is to simply arrange them in two or three columns style. This scheme, although simple, howe ver , makes all posters look similar and unattractive. Inspired by mang a lay- out generation (Cao, Chan, and Lau 2012), we propose a more vi vid panel layout generation method. Speciﬁcally , we arrange the panels with a binary tree structure to help rep- resent the panel layout. The panel layout generation is then formulated as a process of recursiv ely splitting of a page, as is illustrated and explained in Figure 2. Con veying information is the most important goal for a scientiﬁc poster, thus we attempt to maintain relativ e size for each panel during panel layout generation. This motiv ates the following loss function for the panel shape v ariation, l ( p i ) = | r p i − r 0 p i | (4) where r 0 p i is the aspect ratio of a panel after optimization. This will lead to a combined aesthetic loss for the poster , Loss ( L, L 0 ) = k X i =1 l ( p i ) (5) where L 0 is the poster panel set after optimization. In each splitting step, the combinatorial choices for splitting posi- Figure 2: Panel layout and the corresponding tree structure. The tree structure of a poster layout contains ﬁ ve panels. The ﬁrst splitting is vertical with the splitting ratio (0.5, 0.5). The poster is further di vided into three panels in the left, and two panels in the right. This makes the whole page as two equal columns. F or the left column, we resort to a horizontal splitting with the splitting ratio (0.4, 0.6). The larger one is further horizontally divided into two panels with the splitting ratio (0.33, 0.67). W e only split the right column once, with the splitting ratio (0.5, 0.5). tions can be recursi vely computed and compared with re- spect to the loss function above. W e choose the panel at- tributes with the lo west loss (Eq. 5). The whole algorithm is summarized in Algorithm 1. Composition within a Panel. Ha ving inferred layout of the panels, we turn our attention to composition of graphical elements within the panels. W e model and infer attributes of graphical elements using another probabilistic graphical model. Particularly , the ke y attrib utes we need to estimate are the horizontal position h g and graphical element size u g . In our model, horizontal position h g relies on s p , l p and s g , while u g relies on r p , s g and r g , so the likelihood is P r ( h g , u g | s p , r p , l p , s g , r g ) = Y p ∈ P Y g ∈ p P r ( h g | s p , l p , s g ) P r ( u g | r p , s g , r g ) (6) P r ( u g | s p , l p , s g ) and P r ( h g | r p , s g , r g ) are the conditional probability distributions (CPDs) of u g and h g giv en s p , l p , r p , s g and r g respectiv ely . The conditional linear Gaussian distribution is also used here, P r ( u g | s p , l p , s g ) = N ( u g | w u · [ s p , l p , s g , 1] T , σ u ) (7) where w u is the parameter to balance the inﬂuence of dif- ferent factors. Since we take horizontal position h g as an enumerated variable, a natural way to estimate it is to make it a classiﬁcation problem by using the softmax function, P r ( h g = i | r p , s g , r g ) = e w h i · [ r p ,s g ,r g , 1] T P H j =1 e w h j · [ r p ,s g ,r g , 1] T (8) where H is the cardinality of the v alue set of h g , i.e. H = 3 , w h i is the i th row of w h . The maximum likelihood method is used to estimate parameters, including w u , w h and σ u . Algorithm 1 Panel layout generation Input: Panels which we learned from graphical model L = { ( s p 1 , r p 1 ) , ( s p 2 , r p 2 ) , · · · , ( s p k , r p k ) } ; rectangular page area x , y , w , h . Output: 1: if k == 1 then 2: adjust panels [0] to adapt to the whole rectangular page area, return the aesthetic loss: | r p 0 − w /h | ; 3: else 4: for each i ∈ [1 , k − 1] do 5: t = P i j =1 s p j / P n j =1 s p j ; 6: Loss 1 = Panel Arrangement( ( s p 1 , r p 1 ) , ( s p 2 , r p 2 ) , · · · , ( s p i , r p i ) , x , y , w , h × t ); 7: Loss 2 = P anel Arrangement( ( s p i +1 , r p i +1 ) , ( s p i +2 , r p i +2 ) , · · · , ( s p k , r p k ) , x , y + h × t , w , h × (1 − t ) ); 8: if Loss > Loss 1 + Loss 2 then 9: Loss = Loss 1 + Loss 2 ; 10: record this arrangement; 11: end if 12: Loss 1 = Panel Arrangement( ( s p 1 , r p 1 ) , ( s p 2 , r p 2 ) , · · · , ( s p i , r p i ) , x , y , w × t , h ); 13: Loss 2 = P anel Arrangement( ( s p i +1 , r p i +1 ) , ( s p i +2 , r p i +2 ) , · · · , ( s p k , r p k ) , x + w ∗ t , y , w × (1 − t ) , h ); 14: if Loss > Loss 1 + Loss 2 then 15: Loss = Loss 1 + Loss 2 ; 16: record this arrangement; 17: end if 18: end for 19: end if 20: retur n Loss and arrangement. Different from Eq. 1, directly inferring h g and u g is not advisable, since the panel content may exceed the panel bounding box and affect the aesthetic measure of a poster . T o av oid this problem, we employ the likelihood-weighted sampling method (Fung and Chang 1990) to generate sam- ples from the model, by maximizing the likelihood function (Eq. 6) with this strict constraint, X g ∈ p h p × u g + α × β × l p /w p < h p (9) where α and β denote the width and height of a single char- acter respectiv ely . The ﬁrst term of the abov e constraint indi- cates the height of graphical elements while the second term represents the height of textual contents. Experimental Results Experimental Setup. W e collect and make av ailable to the community the ﬁrst Poster -Paper dataset. Speciﬁcally , we selected 25 well-designed pairs of scientiﬁc papers and their corresponding posters from 600 publicly a vailable pairs we collected. These papers are all about scientiﬁc topics, and their posters have relatively similar design styles. W e further stage A verage time T ext extraction ? 28.81s Panel attrib utes inference learn 0.85s infer 0.013s Panel layout generation 0.13s Composition within panel learn 2.17s infer 0.03s+19.09s ? T able 1: Running time of each step. ? : it takes us 0.03s for inference computation and the 19.09s time for latex ﬁle gen- eration. annotate panel attributes, such as panel width, panel height and so on. W e make a training and testing split: 20 pairs for training and ﬁve for testing. There is total of 173 panels in our dataset. 143 for training and 30 for testing. W e use T extRank to extract te xtual content from the orig- inal paper . In order to give different importance of different sections, we can set different extraction ratio for each sec- tion. This will result in important sections generating more content and hence occupying bigger panels. For simplic- ity , this paper uses equal important weights for all sections. User-interaction is also required to highlight and select im- portant ﬁgures and tables from original paper . W e use the Bayesian Network T oolbox (BNT) (Murphy 2002) to esti- mate key parameters. F or graphical element attributes infer - ence, we generate 1000 samples by the likelihood-weighted sampling method (Fung and Chang 1990) for Eq. 6 while the constraint Eq.9 is used. With the inferred metadata, the ﬁnal poster is generated in latex Beamerposter format with Lankton theme. For baseline comparison, we in vite three second-year P hd students, who are not familiar with our project, to hand de- sign posters for the test set. These three students work in computer vision and machine learning and hav e not yet pub- lished any papers on these topics; hence they are novices to research. Giv en the test set papers, we ask the students to work together and design a poster for each paper . Running time. Our framework is very efﬁcient. Our ex- periments were done on a PC with an Intel Xeon 2.0 GHz CPU and 144GB RAM. T ab . 1 sho ws the av erage time we needed for each step. Strictly speaking, we can not com- pare with “pre vious methods”, since we are the ﬁrst work on poster generation and there is no existing directly compara- ble work. Ne vertheless, we argue that the total running time is signiﬁcantly less than the time people require to design a good poster , it is also less than the time spent to generate the posters made by three no vices in Quantitati ve e valuation section. Quantitative Evaluation. W e quantitati vely ev aluate the ef- fectiv eness of our approach. (1) Effectiveness of panel inference. For this step, we compare the inferred size and aspect ratio of panels with the trivial solution – SVR which trains a linear regressor 4 4 s p and r p are used as features for SVR. The parameters are chosen using cross-v alidation. Nonlinear kernels (such as RBF) perform worse due to ov er-ﬁtting on training data. Unnecessarily Complicated Research Title Author 1 Author 2 University and Department Name ABSTRACT In this demonstration, we present a novel DBMS-oriented research infrastructure, called Arizona Database Laboratory (AZDBLab), to assist database researchers in conducting a large-scale empirical study across multiple DBMSes. INTRODUCTION There, however , have been few DBMS-dedicated laboratories for supporting such scientiﬁc investigation, while prior work mainly has focused on networks and smartphones as we will discuss in Section 3.In this demonstration, we present a novel DBMS-oriented research infrastructure, called Arizona Database Laboratory (AZDBLab), to assist database researchers to conduct a large-scale empirical study across multiple DBMSes. Note that the data provenance of the study is collected into a labshelf, managed by a central DBMS server.? For conducting large-scale experiments, AZDBLab provides several decentralized monitoring schemes: a stand-alone Java application (named Observer), an Ajax [1] web app, and a mobile app. MOTIV ATION These cover • (i) cardinality estimation (identifying what affects the accuracy of cardinality estimates), • (ii) operator impact (characterizing how speciﬁc types of operators, e.g., join, projection, sorting, affect the accuracy of cardinality estimates, execution time estimates, and optimal plan selection), and • (iii) execution plan search space (determining its detailed inner structure). AZDBLAB SYSTEM OVER VIEW • LabShelves: The schema of a labshelf captures who, what, when, which, where, why , and how, complying with the 7-W model [8]. • Decentralized Monitoring Schemes: Decentralized Monitoring Schemes: In this section, we present a variety of novel, decentralized monitoring schemes being in use in AZDBLab. • For example, a scenario source code, called onepass, can be written to study the query suboptimality phenomenon such that when the execution plans of a query change between two adjacent cardinalities (called a change point), the actual elapsed time of that query at a lower cardinality is greater than that of a higher cardinality. Observer updates its GUI to show the pending run.If an executor, to be discussed in Section 4.3, has any assigned pending run, the executor starts to execute that pending run, whose status then is updated to running. • The web app provides the same functionalities and GUI as Observer, without requiring direct access to the labshelf server, thereby achieving greater security . • The user cannot conduct query execution data analysis through the mobile apps, but the user can set up a convenient monitoring environment on the mobile apps.The mobile apps also make a request to the same AZDBLab web server, which invokes methods from AjaxManager. • Executor:The executor then creates and populates tables, executes queries, records QE results into AZDBLab. D emonstration Our demo consists of two parts: 1) running experiments with hundreds of queries on different DBMSes and 2) then analyzing QE results from the completed runs. (a) Unnecessarily Complicated Research Title Author 1 Author 2 University and Department Name ABSTRACT Using dense point trajectories, our approach separates and describes the foreground motion from the background, represents the appearance of the extracted static background, and encodes the global camera motion that interestingly is shown to be discriminative for certain action classes. INTRODUCTION Inspired by this work, our proposed method also makes use of these dense trajectories; however, w e enlist a more general camera model (by estimating the fundamental matrix between video frames) that allows for a more reliable separation between foreground and background pixels , especially in non-planar cluttered scenes. P roposed M ethodology Given a set of labelled videos, a set of features is extracted from each video , represented using visual descriptors, and combined into a single video descriptor used to train a multi-class classiﬁer for recognition. • Camera Motion: As observed in the three top rows of Figure 3, there is a correlation between how the camera moves. • Foreground/Bac kground Separation: Our proposed separation will allow each type of trajectory (foreground and background) to be represented independently and thus more reliably than other methods that encode context information using information from entire video frames [15]. • Background/Conte xt Appearance: Unlike other methods that encode scene context holistically (using both foreground and background) in a video [15]. • Implementation details: The ﬁrst follows the Bag of Features (BoF) paradigm, using k-means for visual codebook generation, VQ for feature encoding, L2 normalization, and a 2 kernel SVM within a multichannel approach (MCSVM) [26]. EXPERIMENTAL RESUL TS We compare the performance of both classiﬁcations frameworks mentioned in Section 2.4, as well as, state-of-the-art recognition methods on benchmark datasets, when possible. • Datasets and Evaluation Protocols: Hollywood2 [15] contains a large number of videos retrieved from 69 different Hollywood movies. • Impact of Contextual F eatures: Unlike other methods that extract context information from all the trajectories (both background and foreground) in the video, we see that extracting surrounding SIFT and CamMotion features from the background alone improves overall performance. • Comparison with State-of-the-Art: The performance gain over the method in [23], which reports the best performance in the literature, is as follows: +2 CONCLUSION Contextual features: When combined with foreground trajectories, we show that these features, can improve state-of-the-art recognition on challenging action datasets. (b) Figure 4: Qualitative comparison of our result (b) and novice’ s result (a). Please refer to supplementary material for larger size ﬁgures. to predict the panel size and panel aspect ratio from training data. W e use the panel attributes from the original posters 5 as the ground-truth and compute the mean-square error (MSE) of inferred values versus ground-truth values. Our results can achieve 3650.4 and 0.67 for panel size and aspect ra- tio. By contrast, the values of SVR method are 3831.3 and 0.76 respectiv ely . This shows that our algorithm can better estimates the panel attributes than SVR. (2) User study . User study is employed to compare our results with original posters and posters made by novices. W e in vited 10 researchers (who are experts on the e valuated topic and kept unknown to our projects) to ev aluate these results on readability , informativ eness and aesthetics. Each researcher is sequentially sho wn the three results generated (in randomized order) and asked to score the results from 0 − 10 , where 0 , 5 and 10 indicate the lowest, middle and highest scores of corresponding metrics. The ﬁnal results are av eraged for each metric item. As in T ab . 2, our method is comparable to original posters on r eadability and informativeness ; and it is signiﬁcantly better than posters made by novices. This v alidates the ef- fectiv eness of our method, since the inferred panel attributes and generated panel layout will sav e most valuable and im- portant information. In contrast, our method is lower than the original posters on aesthetics metric (yet, still higher than those from novices). This is reasonable, since aesthetics is a relativ ely subjecti ve metric and aesthetics generally requires a “human touch”. It is an open problem to generate more aesthetic posters from papers. Qualitative Evaluation of Three Methods. W e qualita- tiv ely compare our result (Figure 3(b)) with the poster from novices in Figure 3(a) and the original poster Figure 3(c). All of them are for the same paper . 5 Note that, though the panels of original poster may not be the best ones, they are the best candidate to serve as the ground truth here. F ACE SPOOFING DETECTION THROUGH P ARTIAL LEAST SQUARES AND LOW-LEVEL DESCRIPTORS William Robson Schwartz Anderson Rocha Helio PedriniI Institute of Computing, University of Campinas I ntroduction • Problem: 2-D image-based facial veriﬁcation or recognition system can be spoofed with no difﬁculty (a person displays a photo of an authorized subject either printed on a piece paper) • Idea: anti-spooﬁng solution based on a holistic representation of the face region through a robust set of low-level f eature descriptors, exploiting spatial and temporal information • Advantages: PLS allows to use multiple features and avoids the necessity of choosing before-hand a smaller set of features that may not be suitable for the problem P ar tial L east S quares • PLS deals with a large number of variables and a small number of examples • Data matrix X and response matrix Y X n × N = TP T + E , Y n × N = UQ T + F • Practical Solution: NIPALS algorithm Iterativ e approach to calculate PLS factors • PLS weights the feature descriptors and estimates the location of the most discriminative regions ANTI-SPOOFING PROPOSED SOLUTION • A video sample is divided into m parts, feature extraction is applied f or every k-th frame. The resulting descriptors are concatenated to compose the feature vector • PLS is employed to obtain the latent f eature space, in which higher weights are attributed to feature descriptors extracted from regions containing discriminatory characteristics between the two classes • The test procedure evaluates if a nov el sample belongs either to the live or non-live class. When a sample video is presented to the system, the face is detected and the frames are cropped and rescaled E xperimental R esults Print-Attack Dataset • Dataset: 200 real-access and 200 printed-photo attack videos [1] • Setup: face detection, rescale to 110 x 40 pixels, 10 frames are sampled for feature extraction (HOG, intensity , color frequency (CF) [2], histogram of shearlet coefﬁcients (HSC) [3], GLCM) • Classiﬁer evaluation: SVM type C with linear kernel achieved EER of 10 NUAA Dataset • Dataset: 1743 live images and 1748 non-live images for training. 3362 live and 5761 non-live images for testing [4] • Setup: faces are detected and images are scaled to 64 x 64 pixels • Comparison: Tan et al. [4] achieved AUC of 0.95 [1] https://www.idiap.ch/dataset/printattack [2] W. R. Schwartz, A. Kembhavi, D . Harwood, and L. S. Davis. Human Detection Using Partial Least Squares Analysis. In IEEE ICCV, pages 24C31, 2009. [3] W. R. Schwartz, R. D . da Silva, and H. Pedrini. A Novel F eature Descriptor Based on the Shearlet Transf or m. In IEEE ICIP , 2011. [4] X. Tan, Y . Li, J. Liu, and L. Jiang. Face liv eness detection from a single image with sparse low rank bilinear discriminative model. In ECCV , pages 504C517, 2010. (a) Designed by novice F ACE SPOOFING DETECTION THROUGH P ARTIAL LEAST SQUARES AND LOW-LEVEL DESCRIPTORS William Robson Schwartz Anderson Ro cha Helio PedriniI Institute of Computing, University of Campinas Introduction • Problem: 2-D image-based facial veriﬁcation o r recognition system can be spoofed with no diﬃculty (a person displa ys a photo of an authorized subjec t either printed on a piece paper) • Idea: anti-spooﬁng solution based on a holistic representation of the face region through a robust set of low-level feature descriptors, exploiting spatia l and temp o ral information • Advantages: PLS allows to use multiple features and avoids the necessity of choosing before-hand a smaller set of features that may not b e suitable for the problem Partial Least Squares • PLS deals with a la rge number of variables and a small number of examples • Data matrix X and response matrix Y X n × N = TP T + E , Y n × N = UQ T + F • Practical Solution: NIPALS algorithm Iterative approach to calculate PLS factors • PLS weights the feature descriptors and estimates the location of the most discriminative regions Anti-Spooﬁng Proposed Solution • A video sample is divided into m parts, feature extraction is applied for every k-th frame. The resulting descriptors are concatenated to compose the feature vector • PLS is employ ed to obtain the latent feature space, in which higher weights are attributed to feature descriptors extracted from regions containing discriminatory characteristics between the two classes • The test procedure evaluates if a novel sample belongs either to the live or non-live class. When a sample video is presented to the system, the face is detected and the frames are cropped and rescaled Experimental Results Print-Attack Dataset • Dataset: 200 real-access and 200 p rinted-photo attack videos [1] • Setup: face detection, rescale to 110 x 40 pixels, 10 frames are sampled for feature extraction (HOG, intensit y, colo r frequency (CF) [2], histogram of shearlet coeﬃcients (HSC) [3], GLCM) • Classiﬁer evaluation: SVM type C with linear kernel achieved EER of 10 NUAA Dataset • Dataset: 1743 live images and 1748 non-live images for training. 3362 live and 5761 non-live images for testing [4] • Setup: faces are detected and images are scaled to 64 x 64 pixels • Comparison: T an et al. [4] achieved AUC of 0.95 [1] https://www.idiap.ch/dataset/printattack [2] W. R. Schwartz, A. Kembhavi, D. Harw o od, and L. S. Davis. Human Detection Using Partial Least Squares Analysis. In IEEE ICCV, pages 2431, 2009. [3] W. R. Schwartz, R. D. da Silva, and H. Pedrini. A Novel Feature Descriptor Based on the Shearlet Transform. In IEEE ICIP, 2011. [4] X. Tan, Y. Li, J. Liu, and L. Jiang. F ace liveness detection from a single image with sparse low rank bilinear discriminative model. In ECCV, pages 504517, 2010. (b) Our result Partia l Least Squares Problem : 2 - D i mag e - based faci al ve rificatio n or recogn itio n system can be spo ofe d wi th no diffi culty (a perso n displa ys a photo of an auth orized subject eithe r printed on a piece paper) Idea : anti - sp oo fin g solut ion based on a holis tic rep resen tation of t he face reg ion through a robu st set of lo w - leve l feat ure de scripto rs, exploiti ng spatial and temporal informatio n Advanta ge s : PLS allo ws to use multip le feat ures and avo ids the necessi ty of ch oo sing befo re - hand a small er s et of fea tures that may not be sui table for the problem Introduct ion Anti - Spoofing Proposed Solution Experimental R esults Print - Attack Dataset  Dataset : 20 0 real - access and 200 printed - photo attack videos [1]  Setup : face detecti on, rescale to 110 x 40 pix els, 10 frames are sa mpl ed for fea ture e xtraction (HOG, inten sity, colo r freque ncy ( CF) [ 2 ], histo gram of shearlet coefficien ts (HSC) [ 3 ], GLCM)  Classif ier evalu ati on : SVM typ e C wi th linear kernel ach ieve d EER of 10 % . PLS method achieved EER of 1 . 67 %  A vid eo sa mpl e is di vided into m pa rts, feature extracti on is appli e d for e very k - th frame . Th e resu lting descripto rs are concatena ted to compose the feature vector  PLS is e mpl oye d to o btain the latent feature s pace, in whi ch hig her wei ghts are attributed to feature de s criptors extracted from reg ion s contai ning discriminato ry characteristics between the two classes  T he test procedu re e valua tes if a nove l sa mpl e belon g s eithe r to the live or non - li ve clas s . When a sam pl e video is prese nted to the system, the face is detected and the frames are cropped and rescaled  PLS de als wi th a large nu mber of vari ables and a smal l number of examp les  Data matrix X and response matrix Y  Practical Solut ion : NIPALS algorith m Iterative approach to calculate PLS factors  PLS w eig hts the feature descriptors and e stimates the locati on of the most discr iminati ve regio ns F ACE S POOFING D ETE CTION THROUGH P ARTI AL L EAS T S QUARES AND L OW - L EVEL D ESCRIPT ORS Wil li am R ob son Sch w artz , Ander son R ocha , Heli o P edrini { sch w artz , ande r son.r ocha , helio }@ ic.unic amp.br Ins titut e of C omput ing , Univ er sity o f Ca mpinas Loadin gs Re sid uals Scor es Name # de sc rip t or s EER (%) HOG 326 ,880 11. 67 In t ensi ty 154 ,000 8.3 3 CF 27,2 40 6.6 7 GL CM 159 ,360 6.6 7 HSC 581 ,120 4.3 3 Combi na ti on 1,09 4,60 0 1.6 7 Featu re combination T eam F AR (%) FRR (%) IDIA P 0.0 0 0.0 0 UOUL U 0.0 0 0.0 0 AM ILA B 0.0 0 1.2 5 CA SIA 0.0 0 0.0 0 SIAN I 0.0 0 21. 25 Our r esul ts 1.2 5 0.0 0 Comparisons NUAA Dataset  Dataset : 174 3 live i mages and 17 48 no n - live image s for trai ning . 33 62 live and 5761 non - live images for testin g [ 4 ]  Setup : faces are det ected and images are scaled to 64 x 64 pixels  Comp aris on : Tan et al. [4] achi eved AUC o f 0.95 [1] http s://www .idi ap.ch/dataset/ printattac k [2] W. R . Schw artz, A. Ke mbh avi, D. Harwood, and L. S. Davis. Huma n Dete cti on Usi ng Pa rti al Least Squares Analysis . In IEE E ICC V, pag es 24 – 31, 2009. [3] W. R . Schw artz, R. D. d a Silva, and H. Pedri ni. A N ovel F eatur e Descri ptor Ba se d on the Sh earle t Transform . In IEE E ICIP, 2011. [4] X. Tan, Y. Li, J. Li u, and L. Ji ang. Fa ce livene ss det ect io n from a si ngle imag e wit h spa rse low rank bili near dis criminati ve model . In ECC V, page s 5 04 – 517, 2010. Name # de sc rip t or s EER (%) A UC In t ensi ty 4,09 6 52. 20 0.4 25 HOG 6,98 4 16. 80 0.9 08 HSC 12,4 16 12. 40 0.9 44 GL CM 3,55 2 9.6 0 0.9 60 Combi na ti on 22,9 52 8.2 0 0.9 66 Featu re combination (c) Original poster Figure 3: Results generated by different w ays Metric Readability Informativ eness Aesthetics A vg. Our method 6.94 7.06 6.86 6.95 Posters by novices 6.69 6.83 6.12 6.54 Original posters 7.08 7.03 7.43 7.18 T able 2: User study of dif ferent posters generated. It is interesting to show that if compared with the panel layout of original poster, our panel layout looks more simi- lar to the original one than the one by novices. This is due to, ﬁrst, the Poster-P aper dataset has a relati vely similar graphi- cal design with high quality , and second, our split and panel layout algorithms that work well to simulate the way how people design posters. In contrast, the poster designed by novices in Figure 3(a) has two columns, which appears less attractiv e to our 10 researchers; it takes the novices around 2 hours to ﬁnish all the posters. Further Qualitative Evaluation. W e further qualitatively ev aluate our results (Figure 4) by the general graphical design principles (O’Donov an, Agarwala, and Hertzmann 2014), i.e., ﬂow , alignment ,and overlap and boundaries . Flow It is essential for a scientiﬁc poster to present infor- mation in a clear read-order , i.e. readability . People always read a scientiﬁc poster from left to right and from top to bot- tom. Since Algorithm 1 recursiv ely splits the page of poster into left, right or top, bottom , the panel layout we generate ensure that the read-order matches the section order of orig- inal paper . W ithin each panel, our algorithm also sequen- tially or ganizes contents which also follo w the section order of original paper and this improv es the readability . Alignment . Compared with the comple x alignment con- straint in (O’Dono van, Agarwala, and Hertzmann 2014), our formulation is much simpler and uses an enumeration vari- able to indicate the horizontal position of graphical elements h g . This simpliﬁcation does not spoil our results which still hav e reasonable alignment as illustrated in Figure 4 and quantitativ ely ev aluated by three metrics in T ab . 2. Overlap and boundaries . Overlapped panels will make the poster less readable and less esthetic. T o avoid this, our approach (1) recursi vely splits the page for panel layout; (2) sequentially arranges the panels; (3) enforces the constraint Eq. 9 to penalize the cases of overlapping between graphical elements and panel boundaries. As a result, our algorithm can achiev e reasonable results without signiﬁcant o verlap- ping and/or crossing boundaries. Similar to the manually created poster – Figure 3(c), our result (i.e., Figure 3(b)) does not ha ve signiﬁcantly ov erlapped panels and/or bound- aries. Conclusion and Future W ork Automatic tools for scientiﬁc poster generation are impor- tant for poster designers. Designers can sa ve a lot of time with these kinds of tools. Design is a hard work, especially for scientiﬁc posters, which require careful consideration of both utility and aesthetics. Abstract principles about scien- tiﬁc poster design can not help designers directly . By con- trast, we propose an approach to learn design patterns from existing examples, and this approach will hopefully lead to an automatic tool for scientiﬁc poster generation to aid de- signers. Except for scientiﬁc poster design, our approach also pro- vides a framework to learn other kinds of design patterns, for example web-page design, single-page graphical design and so on. And by providing dif ferent set of training data, our approach could generate different layout styles. Our work has se veral limitations. W e do not consider font types in our current implementation and only adopt a simple yet effec- tiv e aesthetic metric. W e plan to address these problems in future. Acknowledgements W e would like to thank the anonymous re viewers for their insightful suggestions in improving this paper . References [Arai and Herman 2010] Arai, K., and Herman, T . 2010. Method for automatic e-comic scene frame extraction for reading comic on mobile de vices. In Information T echnol- ogy: New Generations (ITNG), 2010 Seventh International Confer ence on , 370–375. IEEE. [Cao, Chan, and Lau 2012] Cao, Y .; Chan, A. B.; and Lau, R. W . H. 2012. Automatic stylistic manga layout. ACM T rans. Graph. 31(6):141:1–141:10. [Cao, Lau, and Chan 2014] Cao, Y .; Lau, R. W .; and Chan, A. B. 2014. Look over here: Attention-directing compo- sition of manga elements. ACM T ransactions on Graphics (TOG) 33(4):94. [Damera-V enkata, Bento, and O’Brien-Strain 2011] Damera-V enkata, N.; Bento, J.; and O’Brien-Strain, E. 2011. Probabilistic document model for automated doc- ument composition. In Pr oceedings of the 11th A CM symposium on Document engineering , 3–12. A CM. [Fung and Chang 1990] Fung, R. M., and Chang, K.-C. 1990. W eighing and integrating evidence for stochastic sim- ulation in bayesian networks. 209–220. [Gajos and W eld 2005] Gajos, K., and W eld, D. S. 2005. Preference elicitation for interface optimization. In Pr oceed- ings of the 18th annual A CM symposium on User interface softwar e and technology , 173–182. A CM. [Geigel and Loui 2003] Geigel, J., and Loui, A. 2003. Using genetic algorithms for album page layouts. IEEE multimedia (4):16–27. [Harrington et al. 2004] Harrington, S. J.; Na veda, J. F .; Jones, R. P .; Roetling, P .; and Thakkar , N. 2004. Aesthetic measures for automated document layout. In Proceedings of the 2004 A CM symposium on Document engineering , 109– 111. A CM. [Hoashi et al. 2011] Hoashi, K.; Ono, C.; Ishii, D.; and W atanabe, H. 2011. Automatic previe w generation of comic episodes for digitized comic search. In Pr oceedings of the 19th ACM international confer ence on Multimedia , 1489– 1492. A CM. [Hurst, Li, and Marriott 2009] Hurst, N.; Li, W .; and Mar- riott, K. 2009. Re view of automatic document formatting. In Pr oceedings of the 9th A CM symposium on Document en- gineering , 99–108. A CM. [Jacobs et al. 2003] Jacobs, C.; Li, W .; Schrier , E.; Bargeron, D.; and Salesin, D. 2003. Adaptiv e grid-based document layout. 22(3):838–847. [Jing et al. 2015] Jing, G.; Hu, Y .; Guo, Y .; Y u, Y .; and W ang, W . 2015. Content-aware video2comics with manga-style layout. Multimedia, IEEE T ransactions on 17(12):2122– 2133. [Matsui, Y amasaki, and Aizawa 2011] Matsui, Y .; Y a- masaki, T .; and Aizaw a, K. 2011. Interacti ve manga retargeting. In A CM SIGGRAPH 2011 P osters , 35. ACM. [Merrell et al. 2011] Merrell, P .; Schkufza, E.; Li, Z.; Agrawala, M.; and K oltun, V . 2011. Interacti ve furniture layout using interior design guidelines. ACM T ransactions on Graphics (T OG) 30(4):87. [Mihalcea and T arau 2004] Mihalcea, R., and T arau, P . 20 04. T extrank: Bringing order into texts. Association for Compu- tational Linguistics. [Murphy 2002] Murphy , K. 2002. Bayes net toolbox for mat- lab . [O’Donov an, Agarwala, and Hertzmann 2014] O’Donovan, P .; Agarwala, A.; and Hertzmann, A. 2014. Learning layouts for single-page graphic designs. V isualization and Computer Graphics, IEEE T ransactions on 20(8):1200–1213. [Pang et al. 2014] Pang, X.; Cao, Y .; Lau, R. W .; and Chan, A. B. 2014. A rob ust panel e xtraction method for mang a. In Pr oceedings of the A CM International Confer ence on Mul- timedia , A CM MM. [Qu et al. 2008] Qu, Y .; Pang, W .-M.; W ong, T .-T .; and Heng, P .-A. 2008. Richness-preserving manga screening. 27(5):155. [Y u et al. 2011] Y u, L.-F .; Y eung, S.-K.; T ang, C.-K.; T er- zopoulos, D.; Chan, T . F .; and Osher, S. J. 2011. Make it home: automatic optimization of furniture arrangement. A CM T ransactions on Graphics (TOG)-Pr oceedings of A CM SIGGRAPH 2011, v . 30, no. 4, J uly 2011, article no. 86 .

Learning to Generate Posters of Scientific Papers

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment