FastV2C-HandNet: Fast Voxel to Coordinate Hand Pose Estimation with 3D Convolutional Neural Networks

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in neural information processing systems, 2012, pp. 1097–1105. S. Yuan, G. Garcia-Hernando, B. Stenger, T.-K. Kim, et al. Depth-based 3d hand pose estimation: From current achievements to future goals. In IEEE Conference on Computer Vision and Pattern Recognition, 2018. H. Guo, G. Wang, X. Chen, and C. Zhang. Towards good practices for deep 3d hand pose estimation. arXiv preprint arXiv:1707.07248, 2017. H. Guo, G. Wang, X. Chen, C. Zhang, F. Qiao, and H. Yand. Region ensemble network: Improving convolutional network for hand pose estimation. IEEE International Conference on Image Processing, 2017. X. Chen, G. Wang, H. Guo, and C. Zhang. Pose guided structured region ensemble network for cascaded hand pose estimation. arXiv preprint arXiv:1708.03416, 2017. Moon, Gyeongsik, Ju Yong Chang, and Kyoung Mu Lee. “V2V-posenet: Voxel-to-voxel prediction network for accurate 3d hand and human pose estimation from a single depth map.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018. C. Keskin, F. Kırac¸, Y. E. Kara, and L. Akarun. Hand pose estimation and hand shape classification using multi-layered randomized decision forests. In European Conference on Computer Vision, pages 852–863. Springer, 2012. Oikonomidis, Iason, Nikolaos Kyriazis, and Antonis A. Argyros. “Efficient model-based 3D tracking of hand articulations using Kinect.” BmVC. Vol. 1. No. 2. 2011. James Steven Supancic III, Gregory Rogez, Yi Yang, Jamie Shotton, and Deva Ramanan, “Depth-based hand pose estimation: methods, data, and challenges,” in IEEE International Conference on Computer Vision, 2015. Deng, Xiaoming, Shuo Yang, Yinda Zhang, Ping Tan, Liang Chang, and Hongan Wang. “Hand3d: Hand pose estimation using 3d neural network.” arXiv preprint arXiv:1704.02224 (2017). L. Ge, H. Liang, J. Yuan, and D. Thalmann. Robust 3d hand pose estimation in single depth images: from single-view cnn to multi-view cnns. In IEEE Conference on Computer Vision and Pattern Recognition, pages 3593–3601, 2016. L. Ge, H. Liang, J. Yuan, and D. Thalmann. 3d convolutional neural networks for efficient and robust hand pose estimation from single depth images. In IEEE Conference on Computer Vision and Pattern Recognition, 2017. Tompson, Jonathan, Murphy Stein, Yann Lecun, and Ken Perlin. “Real-time continuous pose recovery of human hands using convolutional networks.” ACM Transactions on Graphics (ToG) 33, no. 5 (2014): 169. Kingma, Diederik P., and Jimmy Ba. “Adam: A method for stochastic optimization.” arXiv preprint arXiv:1412.6980(2014). T. Sharp, C. Keskin, D. Robertson, J. Taylor, J. Shotton, D. Kim, C. Rhemann, I. Leichter, A. Vinnikov, Y. Wei, et al. Accurate, robust, and flexible real-time hand tracking. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, pages 3633–3642. ACM, 2015. M. Madadi, S. Escalera, X. Baro, and J. Gonzalez. End-toend global to local cnn learning for hand pose recovery in depth data. arXiv preprint arXiv:1705.09606, 2017. Liang, Hui, Junsong Yuan, and Daniel Thalmann. “Parsing the hand in depth images.” IEEE Transactions on Multimedia 16.5 (2014): 1241-1253. Baek, Seungryul, Kwang In Kim, and Tae-Kyun Kim. “Augmented skeleton space transfer for depth-based hand pose estimation.” Proceedings ofthe IEEE Conference on Computer Vision and Pattern Recognition. 2018. X. Sun, Y. Wei, S. Liang, X. Tang, and J. Sun. Cascaded hand pose regression. In IEEE Conference on Computer Vision and Pattern Recognition, pages 824–832, 2015. A. Tagliasacchi, M. Schröder, A. Tkach, S. Bouaziz, M. Botsch, and M. Pauly. Robust articulated-icp for realtime hand tracking. In Computer Graphics Forum, volume 34, pages 101–114. Wiley Online Library, 2015. C. Qian, X. Sun, Y. Wei, X. Tang, and J. Sun. Realtime and robust hand tracking from depth. In IEEE Conference on Computer Vision and Pattern Recognition, pages 1106– 1113, 2014. Camgoz, Necati Cihan, Simon Hadfield, Oscar Koller, and Richard Bowden. “Subunets: End-to-end hand shape and continuous sign language recognition.” In 2017 IEEE International Conference on Computer Vision (ICCV), pp. 3075-3084. IEEE,
D. Tang, T.-H. Yu, and T.-K. Kim. Real-time articulated hand pose estimation using semi-supervised transductive regression forests. In IEEE International Conference on Computer Vision, pages 3224–3231, 2013. D. Tang, H. Jin Chang, A. Tejani, and T.-K. Kim. Latent regression forest: Structured estimation of 3d articulated hand posture. In IEEE Conference on Computer Vision and Pattern Recognition, pages 3786–3793,
D. Tang, J. Taylor, P. Kohli, C. Keskin, T.-K. Kim, and J. Shotton. Opening the black box: Hierarchical sampling optimization for estimating human hand pose. In IEEE International Conference on Computer Vision, pages 3325–3333, 2015. He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. “Deep residual learning for image recognition.” In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770-778. 2016. M. Oberweger, P. Wohlhart, and V. Lepetit. Hands deep in deep learning for hand pose estimation. Computer Vision Winter Workshop, pages 21–30, 2015. M. Oberweger and V. Lepetit. Deepprior++: Improving fast and accurate 3d hand pose estimation. In IEEE International Conference on Computer Vision Workshop, Oct 2017. Fang Yin, Xiujuan Chai, and Xilin Chen. Iterative reference driven metric learning for signer independent isolated sign language recognition.In European Conference on Computer Vision, pages 434–450. Springer, 2016. Anders Markussen, Mikkel Rønne Jakobsen, and Kasper Hornbæk. Vulture: a mid-air word-gesture keyboard. In Proceedings ofthe 32nd annual ACMconference on Human factors in computing systems, pages 1073–1082. ACM, 2014. C. Wan, A. Yao, and L. Van Gool. Hand pose estimation from local surface normals. In European Conference on Computer Vision, pages 554–569. Springer, 2016. S. Melax, L. Keselman, and S. Orsten. Dynamics based 3d skeletal hand tracking. In Proceedings ofGraphics Interface 2013, pages 63–70. Canadian Information Processing Society, 2013. Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang, and J. Xiao. 3d shapenets: A deep representation for volumetric shapes. In IEEE Conference on Computer Vision and Pattern Recognition, pages 1912–1920, 2015. J. Taylor, J. Shotton, T. Sharp, and A. Fitzgibbon. The Vitruvian Manifold: Inferring Dense Correspondences for OneShot Human Pose Estimation. In Conference on Computer Vision and Pattern Recognition, 2012. Hyung Jin Chang, Guillermo Garcia-Hernando, Danhang Tang, and Tae-Kyun Kim. Spatio-temporal hough forest for efficient detection–localisation–recognition of finY. Dou et al., “Cascaded Point Network for 3D Hand Pose Estimation*,” ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, United Kingdom, 2019, pp. 1982-1986.gerwriting in egocentric camera. Computer Vision and Image Understanding, 148:87–96,
D. Maturana and S. Scherer. Voxnet: A 3d convolutional neural network for real-time object recognition. In IEEE International Conference on Intelligent Robots and Systems, pages 922–928. IEEE, 2015. E. Ohn-Bar and M. M. Trivedi, “Hand gesture recognition in real time for automotive interfaces: A multimodal vision-based approach and evaluations,” IEEE Trans. Intell. Transportation Systems, vol. 15, no. 6, pp. 2368–2377, Dec 2014. A. Sinha, C. Choi, and K. Ramani. Deephand: Robust hand pose estimation by completing a matrix imputed with deep features. In IEEE Conference on Computer Vision and Pattern Recognition, pages 4150–4158, 2016. Molchanov, Pavlo, Xiaodong Yang, Shalini Gupta, Kihwan Kim, Stephen Tyree, and Jan Kautz. “Online detection and classification of dynamic hand gestures with recurrent 3d convolutional neural network.” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4207-4215. 2016. H. Yang and J. Zhang. Hand pose regression via a classification-guided approach. In Asian Conference on Computer Vision, pages 452–466. Springer, 2016. C. Xu, L. N. Govindarajan, Y. Zhang, and L. Cheng. Lie-x: Depth image based articulated object pose estimation, tracking, and action recognition on lie groups. International Journal of Computer Vision, pages 1–25, 2017. N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: A simple way to prevent neural networks from overfitting,” The Journal of Machine Learning Research, vol. 15, pp. 1929– 1958, 2014. M. Oberweger, P. Wohlhart, and V. Lepetit. “Training a feedback loop for hand pose estimation,” in Proc. Int. Conf. Comput. Vis., 2015, pp. 3316–3324. C. Wan, T. Probst, L. Van Gool, and A. Yao. Crossing nets: Combining gans and vaes with a shared latent space for hand pose estimation. In IEEE Conference on Computer Vision and Pattern Recognition, July 2017. Ge L., Ren Z., Yuan J. (2018) Point-to-Point Regression PointNet for 3D Hand Pose Estimation. In: Ferrari V., Hebert M., Sminchisescu C., Weiss Y. (eds) Computer Vision – ECCV 2018. ECCV
Lecture Notes in Computer Science, vol 11217. Springer, Cham Ge, Liuhao, et al. “Hand PointNet: 3d hand pose estimation using point sets.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018. Y. Dou et al., “Cascaded Point Network for 3D Hand Pose Estimation*,” ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, United Kingdom, 2019, pp. 1982-1986. Iqbal U., Molchanov P., Breuel T., Gall J., Kautz J. (2018) Hand Pose Estimation via Latent 2.5D Heatmap Regression. In: Ferrari V., Hebert M., Sminchisescu C., Weiss Y. (eds) Computer Vision – ECCV 2018. ECCV 2018. Lecture Notes in Computer Science, vol 11215. Springer, Cham Baek, Seungryul, Kwang In Kim, and Tae-Kyun Kim. “Pushing the Envelope for RGB-based Dense 3D Hand Pose Estimation via Neural Rendering.” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1067-1076. 2019. Zhou, Yidan, Jian Lu, Kuo Du, Xiangbo Lin, Yi Sun, and Xiaohong Ma. “Hbe: Hand branch ensemble network for real-time 3d hand pose estimation.” In Proceedings of the European Conference on Computer Vision (ECCV), pp. 501-516. 2018. Oberweger, Markus, Paul Wohlhart, and Vincent Lepetit. “Generalized Feedback Loop for Joint Hand-Object Pose Estimation.” IEEE transactions on pattern analysis and machine intelligence (2019) Kulon, Dominik, Haoyang Wang, Riza Alp Güler, Michael Bronstein, and Stefanos Zafeiriou. “Single Image 3D Hand Reconstruction with Mesh Convolutions.” arXiv preprint arXiv:1905.01326 (2019). Liangjian Chen, Shih-Yao Lin, Yusheng Xie, Hui Tang, Yufan Xue, Xiaohui Xie, Yen-Yu Lin, Wei Fan “TAGAN: Tonality Aligned Generative Adversarial Networks for Realistic HandPose Synthesis.”, BMVC, 2019