A Conditional Adversarial Network for Scene Flow Estimation

A Conditional Adv ersarial Network for Scene Flo w Estimation Ravi Kumar Thakur and Snehasis Mukherjee Abstract —The problem of Scene ﬂo w estimation in depth videos has been attracting attention of resear chers of r obot vision, due to its potential application in various ar eas of robotics. The con ventional scene ﬂow methods are difﬁcult to use in real- life applications due to their long computational overhead. W e propose a conditional adversarial network SceneFlowGAN for scene ﬂow estimation. The proposed SceneFlowGAN uses loss function at two ends: both generator and descriptor ends. The proposed netw ork is the ﬁrst attempt to estimate scene ﬂow using generative adversarial networks, and is able to estimate both the optical ﬂow and disparity fr om the input stereo images simultaneously . The proposed method is experimented on a huge RGB-D benchmark sceneﬂow dataset. I . I N T R O D UC T I O N Scene ﬂo w is a three dimensional (3D) motion ﬁeld repre- sentation of points moving in the 3D space. Scene ﬂo w gives the complete information about the motion and geometry in a stereo pair of frames in 3D space, of all the visible scene points in the frames. Thus, the estimation of the ﬂow ﬁeld is an important task in 3D computer vision and robot vision. The work on motion estimation has been done earlier for rigid scenes. Howe v er , the problem of scene ﬂow estimation started gaining attention when scene ﬂow was ﬁrst introduced for dynamic scenes [23]. This complete understanding of dynamic scene can be used in many application areas of computer vision including activity recognition, 3D reconstruction, au- tonomous navig ation, free-viewpoint video, motion capture system, augmented reality , and structure from motion. The problem of scene ﬂow estimation can be considered as 3D counterpart of optical ﬂo w estimation. Despite of sev eral efforts, estimation of scene ﬂow is still an under-determined problem. Sev eral approaches for scene ﬂow estimation have been proposed since its introduction, where most of the approaches rely on con ventional procedures of computer vision [7]. Some scene ﬂow estimation methods extend the popular optical ﬂow estimation techniques to 3D by introducing disparity map, for estimating scene ﬂo w [26]. While other approaches are based on optimization of energy function and v ariational methods [19]. Most of the scene ﬂo w estimation methods rely on the calculation of optical ﬂow alongwith depth for estimating the scene ﬂow [11]. The fun- damental assumptions behind the state-of-the-art algorithms are brightness and gradient constancy of the stereo images. As a result, most of the methods work very well on scenes with small displacement, but can not perform well on large displacement samples. In realistic scenario, these assumptions are violated often. Figure 1 shows such an example from our prediction results. Fig. 1. An example of stereo pair alongwith the Ground truth and predicted disparities, optical ﬂow . First row shows a sample stereo pair, second row shows the ground truth scene ﬂow and the third ro w shows the reconstructed scene ﬂow along x-, y- and z-directions. The problem of scene ﬂow estimation using deep networks has recently attracted the attention of computer vision research community with av ailability of lar ge scale datasets [16]. Nearly all the classical methods take se veral minutes to process a frame. Hence, the computational time does not permit real time application. Recently , learning based methods [18, 16] hav e been proposed due to av ailability of large scale dataset with ground truth. These methods take more time for training, but are able to reduce the run time to less than a second. Though in terms of accuracy , this learning based approaches can be currently not on par with the classical methods. W ith the training in synthetic dataset the scene ﬂo w estimation may not work on naturalistic scenes. Estimating scene ﬂow is a challenging problem because of the dependency of the estimation algorithms on the assump- tions of brightness and gradient constancy across subsequent stereo frames. Even an occlusion in the image can affect the stereo correspondence between the frames. V arying illumina- tion and lack of texture information can also giv e erroneous information about the brightness pattern. Similarly , large dis- placement can also cause error in scene ﬂow computation. The deep networks are good in understanding highly abstract features. The automatic feature extraction capability of the deep networks can be used to de velop more robust model for the cases where assumptions are violated. There are a few learning based methods for scene ﬂow estimation. This can be useful in semi-supervised learning sce- nario as well, since acquiring data will remain a challenge. W e propose a conditional adversarial network for estimating scene ﬂow from stereo images obtained at different time instances. T o our knowledge this is the ﬁrst attempt on scene ﬂo w esti- mation using GANs. The presence of additional discriminator network as a critic can direct the training process for scene ﬂow estimation. Thus, the scene ﬂo w estimation beneﬁts from this adversarial model. This generative modeling approach can also be used for unsupervised learning of scene ﬂow . At present, there are no large scale dataset with naturalistic scene is av ailable. Howe ver , the proposed SceneFlowGAN can be used to generate such datasets. I I . R E L A T E D W O R K S A N D B AC K G RO U N D Scene ﬂo w estimation using deep networks is an activ e area of research [8]. W e discuss recent advances in scene ﬂow esti- mation, generati ve adversarial networks and their applications in structure prediction problems in separate subsections. A. Scene Flow Estimation Classical scene ﬂow estimation methods are generally based on data extracted from sequence of images obtained from multiple cameras, stereo images and depth data. Scene ﬂow was ﬁrst proposed by V edula et al. [23] using multi-view images. The y obtained multi-vie w scene ﬂow from optical ﬂo w and surface geometry . Usually such methods use some 3D reconstruction procedure. Scene ﬂo w based on stereo image from binocular setting often in v olves joint estimation of optical ﬂow and disparity [7, 26]. Though, some scene ﬂow estimation methods are based on stereo images decoupled into stereo and motion estimation [25]. Basha et al. [2] formulated the struc- ture and scene ﬂow in point cloud representation. Scene ﬂow by enforcing depth discontinuity using image segmentation information was introduced by Zhang and Kambhamettu [28]. It computes both the 3D motion and the 3D structure. Most of these methods uses variational framework. Howe ver , Schuster et al. [21] proposed scene ﬂow estimation method based on dense interpolation of sparse matches from stereo images. The variational optimization was used at later stage for reﬁnement. W ith the advent of depth cameras, scene ﬂow estima- tion methods using RGB-D data were explored [19, 6, 11]. Howe v er , the depth cameras are not suf ﬁciently accurate in outdoor en vironment. They pose limitation due to changes in illumination, frame rate and limited ﬁeld of view . The classical methods of estimating motion rigidly follow the brightness and gradient constancy assumption. Howe ver , most of these assumptions do not hold true in dynamic environments. B. Structure and Motion Estimation fr om Deep Networks For motion estimation based on deep networks, the av ail- ability of large dataset was a challenge. Since acquiring motion data for naturalistic scene was tedious, Mayer et al. [16] introduced FlyingThings3D synthetic dataset. Recently , motion estimation based on deep network have shown the promise of such methods. The introduction of FlyingThings3D dataset gave boost to such CNN based methods for motion estimation. They were also the ﬁrst to apply CNN for scene ﬂow estimation by proposing SceneFlo wNet Mayer et al. [16]. SceneFlowNet used combined architecture of FlowNet[4] and DispNet[16] for estimating scene ﬂow . This was subsequently revised in FlowNet2 [9]. They addressed the problem of large displacement by stacking different architectures of FlowNet. The small displacement was addressed using small strides in con volution layers. SpyNet [20] used spatial pyramid of input data to reduce the number of training parameters. Motiv ated by the success in estimating optical ﬂo w through CNN, a fe w deep networks for scene ﬂow estimation were also proposed. Ilg et al. [8] introduced stacked architecture based on FlowNet2.0 to estimate disparity and scene ﬂow in occluded stereo images. Behl et al. [3] combined recognition with geometry information to estimate scene ﬂow in dynamic scene with large displacement. SF-Net [18] introduced end-to- end training for scene ﬂow estimation from RGB-D images. A CNN for direct estimation of scene ﬂow was proposed by Thakur and Mukherjee [22]. The model SceneEDNet [22] estimates three dimensional motion from temporal sequence of stereo images, without giving geometry information. V i- jayanarasimhan et al. [24] solved for 3D motion and 3D geometry simultaneously by using two different networks for structure and motion. The SceneEDNet is a deep network for end-to-end learning of sceneﬂow using only stereo images, which can be fed into a GAN readily . Hence, we use the SceneEDNet architecture in the Generator part of the proposed SceneFlowGAN architecture. C. Generative Adversarial Networks The work on Generativ e Adversarial Network was proposed by Goodfellow et al. [5]. The GAN architecture consists of two networks training in an adversarial mode against each other . The generator is tasked to generate realistic images giv en a latent noise sample. While, the discriminator network is supposed to train on both real as well generated image so to be able to distinguish between the two. Both, the generator and the discriminator are in volv ed in a min-max game. This can be represented by following equation. min G max D E x P ( x ) [ log ( D ( x )] + E z P ( z ) [ log (1 − D ( G ( z ))] (1) The generator is denoted by G and discriminator by D . P ( z ) is the distribution of noise T raining of GAN has been difﬁcult due to problems such as vanishing gradient, mode collapse. The training can also be highly unstable. W asserstien GAN [1] ov ercame some of these challenges by using EM or earth mover’ s distance as loss function. Also, in some cases the discriminator and generator loss values are not good indi- cator of training of GAN. Mirza and Osindero [17] proposed conditioning of both the generator and the discriminator on additional information a vailable with the data. This allowed to direct the training of GAN for data generation. These advances Fig. 2. The proposed architecture for SceneFlowGAN. the generator follows an encoder-decoder architecture. The networks are composed of units in the form of conv olution-batch norm-leakyReLU. This units are also part of discriminator network in GAN were followed by its application in various area of computer vision. Kupyn et al. [13] demonstrated a conditional adversarial network for restoring a blurred image. They used residual blocks for generator with perceptual loss. Zhang et al. [27] synthesized images from te xt description from stack of two GANs. The ﬁrst GAN generates rough images based on text description. The second GAN is conditioned on ﬁrst one to perform reﬁnement. Generation of super-resolution from single image was achiev ed by Ledig et al. [15] using perceptual loss. A model for image to image translation [10] was proposed by conditioning both the adversarial networks on input image. A semi-supervised optical ﬂow estimation using conditional GAN was proposed [14] using both labeled and unlabeled data. I I I . P R O P O S ED M E T H O D W e propose a conditional adversarial network for estimating scene ﬂow from pairs of stereo images. The weights of the generator and discriminator are updated together during the training phase. The learning of optical ﬂo w and disparity are coupled in SceneFlowGAN. A. Scene Flow Estimation Giv en stereo image pairs at consecutive time instances the scene ﬂow can be constructed from optical ﬂow ( u, v ) and disparity ( d t , d t +1 ). The dense scene ﬂow provides 3D position and the constituent 3D motion vector for all the points. Our proposed method takes set of stereo images deﬁned by I = ( I t L , I t +1 L , I t R , I t +1 R ) to generate scene ﬂow S . Thus, the scene ﬂow can be considered as a 4D vector . S = ( u, v , d t , d t +1 ) . The horizontal and vertical components of optical ﬂow is represented by u and v respectiv ely . Disparities of stereo pairs at t and t + 1 are denoted by d t and d t +1 . In point cloud the scene ﬂow can be computed using the camera parameters and pinhole projection model. When projected on the image plane, the scene ﬂow giv es corresponding optical ﬂow . B. Adversarial T raining For training SceneFlo wGAN, the loss function is com- puted twice, one at the end of discriminator and other at the generator’ s. The discriminator’ s loss makes the network learn to identify ground truth and generated scene ﬂo w . The discriminator is not conditioned like the one proposed in [17]. The loss at the end of generator G makes sure that network is optimized for scene ﬂow estimation task from pair of stereo images. At the same time, the generator is also trained to pass the critic test by discriminator . This one-to-many mapping directs the training of generator for scene ﬂow estimation task. L = L GAN + L J ointLoss For training the generator we deﬁne a joint loss function. It is composed of av erage end point error for optical ﬂow and an L 1 loss for calculating the error between the two disparity values. The optical ﬂow error is the a verage end-point-error . The error in disparity is given by L1 loss. W e use wasserstein metric as GAN loss function for stable training using gradient descent[1] . The joint loss function can be giv en as below L j ointloss =Σ p ( u − u 0 ) 2 + ( v − v 0 ) 2 + Σ ( | d t − d 0 t | ) + Σ  | d t +1 − d 0 t +1 |  (2) The GAN loss takes the decision on input scene ﬂow being real or generated. The conditioning of generator on input stereo images makes generator learn to estimate scene ﬂow and also to fool the discriminator . Thus, the SceneFlowGAN is trained to optimize the following objectiv e function min G max D L GAN ( G, D ) + L J ointLoss ( G ) (3) The discriminator network tries to maximize the objectiv e function while the generator tries to minimize it. C. Arc hitectur e of the Pr oposed Model The architecture of SceneFlowGAN is shown in 2. The model consists of a generator and discriminator network. Both the networks are con volutional. For the generator we hav e used SceneEDNet[22]. Unlike [22] we have used batch- normalization layers for regularization. The network needs to learn correspondences between the stereo pairs. Howe v er , much information is lost while propagating from encoding to decoding stage. Thus, we hav e used skip connections between layers with same dimension in the encoder and decoder part. The composition unit of the generator and discriminator networks are of the form con volution-batch normalization- leakyReLU. The discriminator network has three conv olution layer each followed by batch-normalization and leakyReLU. The ﬁnal con v olution layer is ﬂattened to connect to set of three dense layers. There is dropout layer with value of 0.4. The last dense layer gi ves probability value of the scene ﬂow being generated or real. I V . E X P E R I M E N T S A N D R E S U LT S W e describe the datasets being used for training of Scene- FlowGAN follo wed by implementation and results. A. Dataset For training SceneFlowGAN we hav e used the lar ge scene ﬂow dataset Mayer et al. [16]. The dataset is di vided into three sections, FlyingThings3D, Monkaa and Driving. All the datasets pro vides 3D scene points. The 3D models were used to create frame artiﬁcially used Blender . The scenes are rendered in a way to provide variation in orientation and position for all the visible scene points. The datasets comes with bi-directional optical ﬂow and bi-directional disparity ground truths. The stereo images are av ailable in two formats. One is clean pass, with no noise or external effects. Other is ﬁnal pass, which comes with motion blur , illumination effects and image degradations. For training we hav e used FlyingThings3D dataset with ﬁnal pass images. Fig. 3. T raining procedure for SceneFlo wGAN. The discriminator and generator are trained in alternating manner . B. Implementation Details The estimated scene ﬂow is conditioned on the input stereo pairs at consecutiv e time instances. For the generator G archi- tecture we have used SceneEDNet[22] with skip-connection. The discriminator D is unconditoned, which is trained to distinguish between generated and ground truth scene ﬂow . During the training, both the network are trained in adversarial manner . For training the SceneFlo wGAN we follo w the procedure mentioned in original work[5] as shown in 3. W e alternate between the training of discriminator and the generator . The discriminator network is trained on both, the ground truth and the generated scene ﬂo w . The generator is trained via GAN by making the weights of discriminator frozen. Both the network were trained with Adam[12] optimizer with learning rate of 1e- 5. The calculation of loss happens at two places, one at each of generator and discriminator’ s end. All the training were performed on NVIDIA-1080 GPUs. C. Results The SceneFlowGAN was trained on FlyingThings3D[16] dataset. The learning of optical ﬂow and disparity are coupled. For a input pairs of I = ( I t L , I t +1 L , I t R , I t +1 R ) we hav e obtained corresponding optical ﬂow and disparities for consecuti ve time instances. The model was trained on ﬁnal pass stereo images with added image degradations. The results on a stereo pair is showed in 4 D. Ablation Studies The choice of dataset for training was based on training performance of the SceneEDNet [22] on three sets of Fly- ingThings3D. The training loss curve for SceneEDNet is giv en in 5. The drop in the av erage end point error was more for set-B and set-C as compared to A. Moreover , we also observed the drop in the loss value due to additional batch- normalization layers. W e trained our model on set-A and set-C Fig. 4. The predicted scene ﬂow from SceneFlowGAN trained on set A and C of FlyingThinsg3D for a pair of stereo images. From left to right. Left and right stereo pair ov erlaid, ground truth disparity and optical ﬂow . The predictions (from left to right) are for SceneFlowGAN trained on set A for 70 epcohs and 50 epcohs, trained on set C for 70 and 50 epochs respectiv ely . Fig. 5. T raining loss of SceneEDNet on three sets of FlyingThings3D scene ﬂow . The original SceneEDNet does not have batch-normalization layers. W e found decrease in training loss its introduction. of FlyingThings3D scene ﬂow data. This was done to see the effect of data distribution on learning the generator . I shows the ﬂow and disparity error obtained by SceneFlowGAN on all the test sets of FlyingThings3D. The test was done for both the models trained on A and C. V . C O N C L U S I O N S In this paper we hav e a presented a conditional generative adversarial network to estimate the scene ﬂo w from stereo im- ages. The training of the SceneFlowGAN remains a challenge giv en the complexity of the problem. The choice of generator was dependent on the training loss obtained in training the generator separately . In future the proposed GAN based scene ﬂow estimation method can be extended for applying on naturalistic images after creating a suf ﬁciently large dataset, which may lead to a new direction of research on ﬂow ﬁeld estimation. R E F E R E N C E S [1] Martin Arjovsky , Soumith Chintala, and L ´ eon Bottou. W asserstein gan. arXiv preprint , 2017. [2] T ali Basha, Y ael Moses, and Nahum Kiryati. Multi- view scene ﬂow estimation: A view centered variational approach. International journal of computer vision , 101 (1):6–21, 2013. [3] Aseem Behl, Omid Hosseini Jafari, Siv a Karthik Mustikovela, Hassan Abu Alhaija, Carsten Rother , and Andreas Geiger . Bounding Boxes, Segmentations and Object Coordinates: How Important Is Recognition for 3D Scene Flow Estimation in Autonomous Dri ving Scenarios? In The IEEE International Conference on Computer V ision (ICCV) , Oct 2017. [4] Alexe y Dosovitskiy , Philipp Fischer , Eddy Ilg, Philip Hausser , Caner Hazirbas, Vladimir Golkov , Patrick van der Smagt, Daniel Cremers, and Thomas Brox. FlowNet: Learning Optical Flo w With Conv olutional Networks. In The IEEE International Confer ence on Computer V ision (ICCV) , December 2015. [5] Ian Goodfellow , Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David W arde-F arley , Sherjil Ozair, Aaron Courville, and Y oshua Bengio. Generativ e adversarial nets. In Ad- vances in neural information pr ocessing systems , pages 2672–2680, 2014. [6] Evan Herbst, Xiaofeng Ren, and Dieter Fox. Rgb- d ﬂow: Dense 3-d motion estimation using color and depth. In Robotics and Automation (ICRA), 2013 IEEE International Confer ence on , pages 2276–2282. IEEE, 2013. [7] Fr ´ ed ´ eric Huguet and Fr ´ ed ´ eric Dev ernay . A v ariational method for scene ﬂow estimation from stereo sequences. In Computer V ision, 2007. ICCV 2007. IEEE 11th Inter - national Conference on , pages 1–7. IEEE, 2007. SceneFlo wGAN-A(70) SceneFlo wGAN-C(70) SceneFloGAN-A(50) SceneFlo wGAN-C(50) Flo w d 1 d 2 Flo w d 1 d 2 Flo w d 1 d 2 Flo w d 1 d 2 A 72.33 33.68 32.82 72.11 37.37 39.12 71.50 35.61 35.33 72.27 36.29 37.89 B 33.89 31.15 29.73 28.99 34.09 34.91 31.13 32.67 32.07 29.18 33.16 34.12 C 25.18 32.72 30.89 19.06 35.40 35.66 22.08 34.28 32.68 19.86 34.54 34.86 T ABLE I F L OW A N D D IS PA RI T Y E R RO R O B T A IN E D F O R S C EN E F L OW GA N . T H E E R RO R VAL U E S A RE O B T A I NE D A F T ER T E ST I N G B OT H T H E T R A I NE D M O D E LS ( A , C ) O N T E ST S E T ( A , B , C ) . T H E V A LU E I N T H E B R AC K ET A F TE R R M O DE L N A ME S H OW S N UM B E R O F E P OC H S . [8] E. Ilg, T . Saikia, M. Keuper , and T . Brox. Occlusions, Motion and Depth Boundaries with a Generic Network for Disparity , Optical Flow or Scene Flow Estimation. In Eur opean Confer ence on Computer V ision (ECCV) , 2018. [9] Eddy Ilg, Nikolaus Mayer , T onmoy Saikia, Margret Keu- per , Alexey Dosovitskiy , and Thomas Brox. Flownet 2.0: Evolution of optical ﬂo w estimation with deep networks. In 2017 IEEE confer ence on computer vision and pattern r ecognition (CVPR) , pages 1647–1655. IEEE, 2017. [10] Phillip Isola, Jun-Y an Zhu, T inghui Zhou, and Alexei A Efros. Image-to-image translation with conditional ad- versarial networks. 2017. [11] Mariano Jaimez, Mohamed Souiai, Javier Gonzalez- Jimenez, and Daniel Cremers. A primal-dual frame work for real-time dense RGB-D scene ﬂow. In Robotics and Automation (ICRA), 2015 IEEE International Confer ence on , pages 98–104. IEEE, 2015. [12] Diederik P Kingma and Jimmy Lei Ba. Adam:A method for stochastic optimization. 2014. [13] Orest Kupyn, V olodymyr Budzan, Mykola Mykhailych, Dmytro Mishkin, and Ji ˇ r ´ ı Matas. DeblurGAN: Blind Mo- tion Deblurring Using Conditional Adversarial Networks. In Pr oceedings of the IEEE Conference on Computer V ision and P attern Recognition , pages 8183–8192, 2018. [14] W ei-Sheng Lai, Jia-Bin Huang, and Ming-Hsuan Y ang. Semi-supervised learning for optical ﬂow with generative adversarial networks. In Advances in Neural Information Pr ocessing Systems , pages 354–364, 2017. [15] Christian Ledig, Lucas Theis, Ferenc Husz ´ ar , Jose Ca- ballero, Andrew Cunningham, Alejandro Acosta, An- drew P Aitken, Alykhan T ejani, Johannes T otz, Ze- han W ang, et al. Photo-Realistic Single Image Super- Resolution Using a Generati ve Adversarial Network. In CVPR , volume 2, page 4, 2017. [16] Nikolaus Mayer, Eddy Ilg, Philip Hausser , Philipp Fis- cher , Daniel Cremers, Alexey Dosovitskiy , and Thomas Brox. A large dataset to train con volutional networks for disparity , optical ﬂow , and scene ﬂo w estimation. In Pr oceedings of the IEEE Confer ence on Computer V ision and P attern Recognition , pages 4040–4048, 2016. [17] Mehdi Mirza and Simon Osindero. Conditional Gener - ativ e Adversarial Nets. arXiv pr eprint arXiv:1411.1784 , 2014. [18] Y i-Ling Qiao, Lin Gao, Y u-Kun Lai, Fang-Lue Zhang, Mingzhe Y uan, and Shihong Xia. SF-Net: Learning Scene Flow from RGB-D Images with CNNs. In British Machine V ision Conference 2018, BMVC 2018, Northum- bria University , Newcastle, UK, September 3-6, 2018 , page 281, 2018. [19] Julian Quiroga, Thomas Brox, Fr ´ ed ´ eric Dev ernay , and James Cro wley . Dense semi-rigid scene ﬂo w estimation from rgbd images. In Eur opean Confer ence on Computer V ision , pages 567–582. Springer , 2014. [20] Anurag Ranjan and Michael J Black. Optical Flow Estimation using a Spatial Pyramid Network. In IEEE Confer ence on Computer V ision and P attern Recognition (CVPR) , pages 2720–2729. IEEE, 2017. [21] Ren ´ e Schuster , Oliver W asenmuller, Georg K uschk, Christian Bailer, and Didier Stricker . Sceneﬂowﬁelds: Dense interpolation of sparse scene ﬂow correspon- dences. In 2018 IEEE W inter Confer ence on Applications of Computer V ision (W A CV) , pages 1056–1065. IEEE, 2018. [22] Ravi Kumar Thakur and Snehasis Mukherjee. Sce- neEDNet: A Deep Learning Approach for Scene Flow Estimation. In 2018 15th International Confer ence on Contr ol, Automation, Robotics and V ision (ICARCV) , pages 394–399. IEEE, 2018. [23] Sundar V edula, Simon Baker , Peter Rander , Robert Collins, and T akeo Kanade. Three-dimensional scene ﬂow. In Computer V ision, 1999. The Proceedings of the Seventh IEEE International Conference on , volume 2, pages 722–729. IEEE, 1999. [24] Sudheendra V ijayanarasimhan, Susanna Ricco, Cordelia Schmid, Rahul Sukthankar , and Katerina Fragkiadaki. Sfm-net: Learning of structure and motion from video. arXiv preprint arXiv:1704.07804 , 2017. [25] Andreas W edel, Thomas Brox, T obi V audrey , Clemens Rabe, Uwe Franke, and Daniel Cremers. Stereoscopic scene ﬂow computation for 3D motion understanding. International Journal of Computer V ision , 95(1):29–51, 2011. [26] Koichiro Y amaguchi, David McAllester, and Raquel Ur- tasun. Efﬁcient joint segmentation, occlusion labeling, stereo and ﬂow estimation. In Eur opean Confer ence on Computer V ision , pages 756–771. Springer, 2014. [27] Han Zhang, T ao Xu, Hongsheng Li, Shaoting Zhang, Xi- aogang W ang, Xiaolei Huang, and Dimitris N. Metaxas. StackGAN: T ext to Photo-Realistic Image Synthesis W ith Stacked Generativ e Adversarial Networks. In ICCV) , Oct 2017. [28] Y e Zhang and Chandra Kambhamettu. On 3D scene ﬂow and structure estimation. In Computer V ision and P attern Recognition, 2001. CVPR 2001. Pr oceedings of the 2001 IEEE Computer Society Confer ence on , volume 2, pages II–II. IEEE, 2001.

A Conditional Adversarial Network for Scene Flow Estimation

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment