AppStreamer: Reducing Storage Requirements of Mobile Games through Predictive Streaming

AppStreamer: Reducing Storage Requirements of Mobile Games through   Predictive Streaming

Design Overview

The goal of is to reduce storage requirements of large mobile applications. This is done by storing file blocks that are needed by the application on the mobile device, and speculatively fetching more blocks from a cloud storage server as the application runs. We focus on files that are part of the application package. This includes required files that some applications (especially games) download on the first run. The key characteristic of these files is that they are read-only and are needed across all users and across different executions. Files written by the application are small and contain user-specific contents which cannot be shared across users, so we always store these files on the device.

We would like the system to work for all applications without any modifications or access to the source code. Thus, is implemented at the operating system level. As long as the blocks the application reads are already on the device, the application will operate normally without any user-perceptible stall, even when the entirety of the application blocks is not present on the device. Intuitively, past file reads should be a good predictor of future file reads, so this is what we use for our prediction model. It is possible to include other sources of the application’s behavior from the perspective of the operating system, such as CPU usage, memory usage, network usage, and past file writes, but we have found that past file reads alone already provide good prediction accuracy. There is also a desire to minimize the monitoring overhead which contributes to possible slowdown of the application.

’s operation can be divided into two phases: offline phase and online phase, as shown in Figure [fig:overview]. The figure shows the cloud storage server, which is a component introduced due to . The server is expected to have all of the content needed by the application, prior to the execution of the application on the mobile device. This can also be an edge server or a hybrid as proposed in . In the offline phase, the file access prediction model (Continuous-time Markov Chain in this case) is trained using a collection of file access traces from multiple users using a specific application. Note this does not need to include the specific user who will use the device in the online phase. As long as the runtime path is a combination of one or more paths from training, the model will be able to combine the patterns so that it has knowledge about all paths. In the online phase, as the user opens up and uses the application, the model predicts blocks that are needed in the near future, and fetches them from the cloud storage server in real time.

While is agnostic to the type of application whose storage requirement is to be reduced, most of the large mobile applications today are games, due to their rich media content and large number of control paths. Therefore, from this point onward, we will focus on mobile games running on smartphones and tablets as our target application domain.

Experimental evaluation

We evaluate in two ways: with user studies and with microbenchmarks. The goal of the user studies is to evaluate in a realistic setting, and measure the effect of delays introduced by by having users rate the user experience. The goal of the microbenchmarks is to evaluate how different parameters of the model affect the overall performance of . Due to the large number of parameter values explored, the microbenchmarks are done using trace-based simulation.

Games Used for Evaluation

We use two Android games in our evaluation: Dead Effect 2 and Fire Emblem Heroes. Dead Effect 2 is a 3D single-player first-person shooter game. Gameplay is divided into multiple levels, where the blocks needed for each level are loaded at the beginning of the level. The levels are linear (level 1, 2, 3, and so on), but different collected traces show diversity among users of the blocks accessed during gameplay. Its APK is 22.91 MB and all of its resources are stored in a single OBB (opaque binary blob) file which is 1.09 GB.

Fire Emblem Heroes is a 2D strategy role-playing game. Gameplay is divided into multiple small levels in several game modes. At first, only the main story mode can be played. As the player completes a few levels, paralogue mode, training mode, special maps mode, and player-versus-player mode are unlocked. These modes can be switched to and from at any time and in any order. The players choosing levels affects which blocks are needed, and also makes prediction in Fire Emblem Heroes nontrivial. The game has roughly 5,200 data files, whose sizes sum up to 577 MB, including the 41.01 MB APK. We chose these two games as they represent two dominant classes of games on mobile devices, with differing characteristics in terms of themes, player interaction, and control flow. Both have the common characteristics of heavy graphics and low latency interactions.

Training Data

For Dead Effect 2, the trace data for the user study consists of 6 runs collected from two players. For the microbenchmarks, the trace data consists of 12 runs collected from four players. Each run is from start of the game to the end of level 2, which takes roughly 30 minutes.

The file read pattern of Dead Effect 2 is shown in Figure [fig:dead_effect_2_runs]. As soon as the game is launched, the entire APK is read and a small part of the OBB file is read. When each level starts, the resources needed for that level are loaded at the beginning, resulting in a jump in the cumulative file read. During each level, there are several small jumps dispersed without any easily perceptible pattern. For Fire Emblem Heroes, the trace data for the user study consists of 7 runs collected from one player. For the microbenchmarks, the trace data consists of 14 runs collected from four players. For this game, each run consists of 20 minutes of gameplay from the beginning where the player is free to choose which level to play.

User Study - Dead Effect 2

Performance of depends on the network speed. For the user study, we set up the storage server on a local machine with network speed limited to 17.4 Mbps, which is the average worldwide LTE speed reported by Open Signal as of November 2016 . The phones used are Nexus 6P running Android 6.0.1. Each participant plays the first two levels of the game, once on a phone with , and once on an unmodified phone. Each user then filled a questionnaire with four questions: 1) user’s skill level (in that category of games, such as FPS), 2) quality of user experience, 3) delays during menu and loading screens, and 4) delays during gameplay inside a level.

The first user study is done with Dead Effect 2, with 23 users participating in the study. The amount of storage used by Dead Effect 2 is shown in Figure 1. Baseline refers to the current state of practice which is storing the whole game on the phone. The amount shown for corresponds to the permanent storage used by files that are always stored on the phone. As the user plays the game, more blocks are downloaded on the fly. These blocks are stored in the temporary space and not shown in the figure. Overall, uses 146.22 MB of permanent storage, compared to 1,139.07 MB for baseline. This represents a 87% storage saving.

The summarized responses to each question on the questionnaire are shown in Figure 2. 70% of the participants rate the overall user experience of playing with the same as playing on an unmodified phone. The remaining 30% rate the run with as marginally worse than baseline. There were no disruptions other than pauses, such as, crashes, hangs, or visual glitches, during gameplay with our technique in place.

On average, there were 336.2 KB of “cache miss", blocks that the game tries to read but are not present on the phone, during each run with . This translates to 0.15 second of delay, for 28 minutes of gameplay, giving a 0.009% delay. The cache hit rate is 99.87%. The run that has highest amount of cache misses is affected by 1.52 seconds of delay. Compared to each level’s loading time of roughly 20 seconds, this extra delay is barely noticeable. This shows that is able to predict and cache most of the necessary blocks before they are accessed for Dead Effect 2.

Storage requirements for Dead Effect 2

User study results for Dead Effect 2 with 23 participants

User Study - Fire Emblem Heroes

The second user study is done using Fire Emblem Heroes, with 26 users participating in the study. The study’s setup is the same as the first user study with Dead Effect 2, except that the user is free to play any level for 20 minutes on each phone. The game has several different modes which are unlocked as the first few levels are completed, and some of these modes have multiple levels which the player can choose to play. Before playing, the participants are informed of these modes, and are instructed that they can switch to any mode and play any level as desired.

The amount of storage used by Fire Emblem Heroes is shown in Figure 3. Baseline refers to the current state of practice which is storing the whole game on the phone. The amount shown for corresponds to the permanent storage used by files that are always stored on the phone. As the user plays the game, more blocks are downloaded on the fly. These blocks are stored in the temporary space and not shown in the figure. Overall, uses 79.69 MB of permanent storage, compared to 577 MB for baseline. This represents a 86% saving of storage space.

The summarized responses to each question on the questionnaire is shown in Figure 4. 88% of the participants rate the overall user experience of playing with the same as playing on an unmodified phone. Again, there were no disruptions other than longer loading time and delays during gameplay. Interestingly, two users comment that the run with was actually smoother. This might be explained by the fact that the delays before and after each level are dominated by network communication with the game server, rather than file reads, and the delay may depend on the current load of the game server.

On average, there were 1.63 MB of cache miss during each run with , which translates to 0.75 second of delay in 20 minutes of gameplay, giving a 0.0625% delay. The cache hit rate is 97.65%. The run that has highest amount of cache misses is affected by 5.12 seconds of delay. Nevertheless, the user still rated the overall user experience as no difference from the unmodified version. One user rates the run with as significantly worse due to significant delays. However, the communication log on the cloud storage server indicates that only 0.97 MB of blocks were missed and needed to be fetched urgently. This translates to 0.45 second of delay, which should be barely noticeable. Overall, the results of this user study show that is able to predict and cache most of the necessary blocks before they are accessed, even when there are different branches for different users in the gameplay.

Storage requirements for Fire Emblem Heroes

User study results for Fire Emblem Heroes with 26 participants

Comparison with Prior Work

Here we compare the bandwidth consumption and latency of to state-of-the art cloud gaming systems. One challenge in thin client gaming is that users are disturbed by latencies higher than 60 ms . Lee et al. addresses this problem by using speculative execution, which can mask up to 128ms of latency, at the cost of using between 1.51 and 4.54 times as much bandwidth as standard cloud gaming . Using speculative execution requires access to and modification of the source code of the game, so we could not directly compare to speculative execution. However, we tested the performance of a thin client, GamingAnywhere, an open source cloud gaming system .

In order to determine the bandwidth usage of a thin client model, we ran Nox, an Android emulator, and GamingAnywhere on a server and the GamingAnywhere client on a smartphone. We tested both Dead Effect 2 and Fire Emblem Heroes, and recorded the download bandwidth usage of the GamingAnywhere application on the smartphone. Data uploaded from the smartphone consists of encoded input events (such as swipes and taps), and data downloaded consists of audio and video of the game being streamed. The bandwidth usage of cloud gaming and are shown in Figure 5. We found that for Dead Effect 2, cloud gaming uses 3.01 Mb/s while only uses 706 Kb/s on average. For Fire Emblem Heroes, cloud gaming uses 3.20 Mb/s while only uses 745 Kb/s on average. This shows that traditional cloud gaming is a lot more bandwidth intensive than our file block streaming approach, with 4.3X higher bandwidth requirement for our two target games.

Compared to the baseline where the entire application is downloaded before it can be used, likely uses more bandwidth through the more costly cellular connection. This can be alleviated in two ways. First, as long as the necessary blocks are never evicted after they are downloaded, the total bandwidth usage cannot exceed the total size of the application. Even when space runs out, LRU eviction policy helps prioritize keeping blocks that are more likely to be accessed. Second, can be extended so that block prefetching is done more aggressively when the device is on a Wi-Fi connection, so that less fetching is needed when the device is on a cellular connection.

Comparison of bandwidth consumption between cloud gaming (left) and (right) for both games

As mentioned earlier, latency is a key measure of usability of cloud gaming tools. Latency can be very visible and annoying to users, as the time between every user input (, a screen tap) and the frame that shows the effect of that input is the network round-trip time, plus video/audio encoding and decoding delay. The network round-trip time depends largely on the distance between the cloud gaming server and the client. Based on typical placement of gaming servers, a typical round-trip time is 100 ms . On the other hand, is not as heavily affected by latency as much as cloud gaming approaches, since speculative block fetches are mostly in batches and can be easily pipelined. The urgent block fetches are affected by the latency, but the amount of urgent block fetches is typically small. As a back-of-the-envelope calculation, for Dead Effect 2, on average 84.05 out of 64,858 blocks accessed are fetched urgently. Fetching each block requires 100 ms + 4 KB / 17.4 Mbps = 101.8 ms. Thus, the overall delay is $`\frac{84.05}{64858}\times 101.8 = 0.13`$ ms, which is much smaller than the constant 100 ms in the cloud gaming approach. For Fire Emblem Heroes, ’s overall delay is $`\frac{416.96}{17731}\times 101.8 = 2.39`$ ms.

In addition to the cloud gaming approach, we also compare to a simple file access prediction algorithm that operates at the block granularity, which we call BlockPairLookup. In the training phase, it stores all pairs of blocks $`(B_i, B_j)`$ such that $`B_j`$ is read within the lookahead time $`L`$ after $`B_i`$ is read. In the online phase, when a block $`B_i`$ is accessed, it predicts all $`B_j`$’s where $`(B_i, B_j)`$ is in its memory.

We run the BlockPairLookup algorithm for both games and compute the delay and amount of unnecessary blocks downloaded using our simulator. We find that it has excessive memory utilization—20.5 GB with 30 seconds lookahead with Dead Effect 2 and 4.6 GB with 60 seconds lookahead with Fire Emblem Heroes. Both would be infeasible on today’s mobile devices. For Dead Effect 2, with BlockPairLookup, average delay per run is 6.32 seconds (8.4X of ), and 74.39 MB of unnecessary blocks are downloaded (1.1X of ). For Fire Emblem Heroes, average delay per run is 7.32 seconds (18.8X), and 64.10 MB of unnecessary blocks downloaded (1.1X). Because BlockPairLookup’s predictions are always a superset of ’s predictions, the higher delay is likely due to the unnecessary blocks that are put in the download queue delaying the download of necessary blocks and the inefficiency of requesting a single block at a time. This shows that models that operate on single block granularity incur too much memory and delay and are thus impractical.

Microbenchmarks

In this section, we evaluate how different parameters affect the results. The parameters studied are $`\delta`$, $`\tau`$, $`p_{stop}`$, $`L`$, $`minSuperblockSize`$, $`p_{download}`$, and $`B_{initial}`$, described in Section 5, as well as buffer size and network connection speed. The results are generated based on trace-based simulation. In the simulation, first training data is used to train a Markov model. Then, file reads from the test data is replayed and given as input to the Markov model. Blocks predicted by the model that are not already present on the phone are logically fetched from the storage server, with network speed fixed to a certain value to simulate real network conditions. In the case where buffer size is limited, we employ the LRU policy to evict blocks from the limited storage available.

Since there are many parameters, we conduct the microbenchmarks by varying one parameter at a time, and fixing the rest of the parameters to the optimal value. Optimal values are chosen by carefully weighing the tradeoff between delay and false positives, with higher weight given to delay, as it has a direct impact on the user experience. The values are $`\delta`$ = 0.1 second, $`\tau`$ = 0.9, $`minSuperblockSize`$ = 17, $`B_{initial}`$ = 122 MB (excluding APK), $`p_{stop}`$ = 0.01, $`L`$ = 60 seconds, and connection speed = 17.4 Mbps. By default, we do not set a limit on temporary storage used to store fetched blocks. The average length of each run is 1,653 seconds. Due to limited space and the fact that the results show the same trends, we omit the microbenchmark results for Fire Emblem Heroes, and show only results for Dead Effect 2. The output metrics are delay and false positives, defined as predicted blocks that are not read by the game within 8 minutes of being predicted. Delays that are long or frequent enough can ruin the user experience, while false positives incur extra network bandwidth and energy cost.

Microbenchmarks for Dead Effect 2

The results are shown in Figure 6. First, with the optimal parameter values, the amount of false positives is 66 MB. However, if the playing session were longer, the amount of false positives will not necessarily increase proportionately, because there is a limit to how much data is downloaded, namely the total size of the application.

Now, we look at how each parameter affects the results. In addition to average worldwide LTE speed of 17.4 Mbps, we also include average U.S. LTE speed of 13.95 Mbps and average worldwide WiFi speed of 10.8 Mbps. As expected, higher connection speed leads to lower delay. Even the lower WiFi speed of 10.8 Mbps is enough to keep the delay small, but speed lower than that will result in large delay. Connection speed has a negligible effect on the false positives. Next, we look at the amount of initial files cached on the phone, denoted $`B_{initial}`$. Higher value gives lower delay and false positives, at the cost of higher storage requirements. Delays are virtually eliminated at $`B_{initial} \geq`$ 175 MB. This represents a storage savings of 84%. At the higher end of 200 MB, the amount of false positives is also reduced.

Recall that when making predictions, our Markov model relies on two stopping criteria to keep the computation tractable: lookahead time, denoted by $`L`$, and probability stop threshold, denoted by $`p_{stop}`$. From the results, as long as the lookahead time is at least 30 seconds, the delay remains constantly low and the amount false positives is largely constant. When the lookahead time is too low, delay increases significantly. Probability stop threshold is somewhat similar. As long as the value is 0.02 or lower, delay remains relatively constant. Higher value leads to higher delay. The amount of false positives is lower when $`p_{stop}`$ is higher, as early stop means fewer blocks get predicted. The block fetching threshold, denoted by $`p_{download}`$, affects the final decision of whether or not to download blocks in the predicted merged cluster, based on predicted probability. It directly influences the amount of false positives, with higher threshold resulting in lower false positives. However, the delays are kept at an acceptable level only when $`p_{download}`$ is 0.02 or lower.

Time between partitions threshold, denoted $`\delta`$, controls how consecutive blocks are merged into the same partition. Lower value leads to more partitions that are smaller. The results clearly show that 0.12 is the optimal value with respect to delay. This amount represents the upper limit of the amount of computation (, image decoding) the application does between chunks of data in the same batch of read. Partition similarity threshold, denoted $`\tau`$, controls merging of two similar partitions within the same trace. A value of 1 means the two clusters need to contain the exact same blocks in order to be merged. The results show that values between 0.8 and 0.9 produce similarly low delay, while higher values result in higher delay.

Temporary storage limit sets a hard storage limit for storing blocks fetched speculatively. This does not include the APK and files that are always stored on the phone. In reality, this buffer can be shared by all applications as long as they do not run at the same time. The results show that a small 75 MB buffer is already as good as an infinitely large buffer. Thus, the amount of temporary space required by is very small.

The minimium superblock size serves as the stopping criterion of the first step of the process of generating superblocks. Lower value leads to more precise model and predictions, but incur longer training time. The results confirm that lower values are always better than higher values in terms of delay. However, we could not complete the benchmark using values lower than 17, as the training time suddenly jumps from a few minutes to several hours.