This paper presents a technology that enables the watching of videos at very high speed. Subtitles are widely used in DVD movies, and provide useful supplemental information for understanding video contents. We propose a "two-level fast-forwarding" scheme for videos with subtitles, which controls the speed of playback depending on the context: very fast during segments without language, such as subtitles or speech, and "understandably fast" during segments with such language. This makes it possible to watch videos at a higher speed than usual while preserving the entertainment values of the contents. We also propose "centering" and "fading" features for the display of subtitles to reduce fatigue when watching high-speed video. We implement a versatile video encoder that enables movie viewing with two-level fast-forwarding on any mobile device by specifying the speed of playback, the reading rate, or the overall viewing time. The effectiveness of our proposed method was demonstrated in an evaluation study.
The amount of information on the Internet continues to increase. In addition to text-based information, there is an increasing diversity of other media types including audio-based (e.g., podcasts) and video-based information. For example, as many as 600 movies (with a total length of ~25 hours) are submitted to YouTube each minute [6]. In this context, there are two possible approaches to help users to efficiently find and review the information they need/want: reduce the amount of information to a manageable level by appropriate filtering, or improve users' ability to consume information quickly. The former approach has been thoroughly studied in the fields of searching, recommendation, and summarization of information, and innovations in this area have been widely applied. However, these technologies do not change factors related to information consumption, which remains the bottleneck in the process model. Hence, it is necessary to develop approaches that address this issue.
Here, we propose one such system. Many methods have been proposed for the quick review of text-based information [11]. Similarly, the review of audio and video information can be sped up by controlling playback speed using many standard media players, such as commercial video recorders, portable devices such as iPods, and PCs. However, only up to about twice normal speed is available using these players because the maximum playback speed at which users can process and effectively understand content linguistically has a certain upper limit.
As the first step to enable video viewing at high speed, we propose the “two-level fast-forwarding” method for watching videos with subtitles, which are widely used in commercial DVD movies 1 . This method exploits the difference between the maximum acceptable speeds of video with and without language. That is, it controls the speed of video playback depending on the context: very fast during segments without language, and understandably fast during segments with language. This makes it possible to watch video at a higher speed than when simply fast-forwarding, in a way that also enables a clearer linguistic understanding of the contents (Figure 1). The rest of this paper is organized as follows. The next section reviews related work. Then the different methods for fastforwarding video are introduced. After that, we describe our prototype system implementation in detail. Finally, we report the results of a study that we conducted to evaluate our method.
Many media players allow users to change the playback speed. Foulke et al. reported the SOLAFS algorithm for changing the speed of speech without pitch shifting, and concluded that it was effective for rapid understanding of content [3]. Vermi et al. proposed a method for improving listener comprehension of fastforwarded speech by displaying text via speech recognition [19]. Aoki et al. proposed a fast-forwarding interface for reviewing music, using only auditory information [5]. Here, we consider the two modalities of audio and video simultaneously. In this context, the differences in the maximum acceptable speed between these modalities must be addressed.
Many studies have investigated issues related to changing the playback speed of videos. Kiyoyama et al. proposed a hardware device that generates low-speed speech output in combination with normal-speed video output from real-time video streams on television [13], which is effective mainly for elderly people who cannot keep up with the original speech streams. Cheng et al. [7] proposed “adaptive fast-forwarding,” which enables adjustment of the current playback speed based on the complexity of the present scene and predefined semantic events, but they muted the audio and thus did not simultaneously manage that modality. Peker et al. [14][15] developed a method to accelerate playback speed according to visual and audio analyses in the video to maintain a “constant pace.” We adopt a similar approach. However, we focus more on “language” modality, especially subtitles, which we believe is an important clue to allow users to watch videos much faster.
Displaying multiple information streams simultaneously is another approach to achieve quick or deep understanding of contents. There have been many reports about the use of threedimensional localization of sound to enable searching and selection among multiple audio streams. Vazquez-Alvarez et al. discussed the psychological burden of this approach [18]. Forlines proposed a content aware video presentation on high-resolution displays [1], which renders multiple parts of a video content based on image-based shot/scene detection technologies to enhance viewing experiences. Fabro et al. [12] reported a tool for fast nonsequential hierarchical video browsing, which proposed parallel style views for a content. In addition, many types of software are available on the web that enable simultaneous viewing of multiple videos to reduce the total time required
This content is AI-processed based on open access ArXiv data.