Human-in-the-Loop Wireless Communications: Machine Learning and Brain-Aware Resource Management

Human-centric applications such as virtual reality and immersive gaming will be central to the future wireless networks. Common features of such services include: a) their dependence on the human user's behavior and state, and b) their need for more …

Authors: Ali Taleb Zadeh Kasgari, Walid Saad, Merouane Debbah

Human-in-the-Loop Wireless Communications: Machine Learning and   Brain-Aware Resource Management
1 Human-in-the-Loop W ireless Communications: Machine Learning and Brain-A ware Resource Management Ali T aleb Zadeh Kasgari 1 , W alid Saad 1 , and M ´ erouane Debbah 2 1 W ireless@VT , Electrical and Computer Engineering Department, V irginia T ech, V A, USA, Emails: { alitk, walids } @vt.edu. 2 Mathematical and Algorithmic Sciences Lab, Huawei France R&D, P aris, France, and CentraleSupelec, Univ ersite Paris-Saclay , Gif-sur-Yv ette, France, Email: merouane.debbah@hua wei.com. Abstract Human-centric applications such as virtual reality and immersiv e gaming are central to future wireless networks. Common features of such services include: a) their dependence on the human user’ s behavior and state, and b) their need for more network resources compared to con ventional applications. T o successfully deploy such applications over wireless networks, the network must be made cognizant of not only the quality-of-service (QoS) needs of the applications, but also of the perceptions of the human users on this QoS. In this paper , by explicitly modeling the limitations of the human brain, a concrete measure for the delay perception of human users is introduced. Then, a learning method, called probability distribution identification, is developed to find a probabilistic model for this delay perception based on the brain features of a human user . Gi ven the learned model for the delay perception of the human brain, a brain-aw are resource management algorithm based on L yapuno v optimization is proposed for allocating radio resources to human users while minimizing the transmit power and taking into account the reliability of both machine type de vices and human users. Then, a closed- form relationship between the reliability measure and wireless physical layer metrics of the network is deriv ed. Simulation results sho w that a brain-aw are approach can yield savings of up to 78% in power compared to the system that only considers QoS metrics. The results also sho w that, compared with QoS-aware, brain-una ware systems, the brain-aware approach can save substantially more po wer in low-latency systems. A preliminary version of this work appeared in the proceedings of the 51th Asilomar Conference on Signals, Systems and Computers, P acific Grov e, CA, USA [1]. This research was supported by the U.S. National Science Foundation under Grants CNS-1460316 and IIS-1633363. 2 I . I N T RO D U C T I O N The ne xt generation of wireless services is expected to be highly human centric. Examples include virtual reality and interacti ve/immersi ve gaming [2]–[4]. T o cope with the quality- of-service (QoS) needs of such human-centric applications, in terms of data rate and ultra- lo w latency , wireless networks must e xploit substantially more radio resources by lev eraging heterogeneous spectrum bands [5]. Although allocating heterogeneous spectrum resources can potentially increase the ra w QoS, gi ven the human-centric nature of emerging applications, their users may not be able to perceiv e the improved QoS, due to human factors such as the cognitiv e limitations of the brain [6]. Indeed, many empirical studies (anecdotal and otherwise) have sho wn that the limitations on the human brain can be translated into a limitation on how wireless users translate QoS into actual quality-of-experience (QoE) [7]–[9]. For example, the human brain may not be able to percei ve any difference between videos transmitted with different QoS (e.g., rates or delays) [9], [10]. Hence, in order to deploy these services over wireless networks, such as 5G cellular systems, there is a need to enable the system to be strongly cognizant of the human user in the loop. In particular , to deliv er immersiv e, human-centric services, the network must tailor the usage and optimization of wireless resources to the intrinsic features of its human users such as their beha vior and brain processing limitations. By doing so, the netw ork can potentially sa ve resources, accommodate more users, and pro vide a more realistic QoE to its users. Moreo ver , the saved resources can be used to accommodate emerging applications in wireless networks such as drone communications [11], [12] and autonomous driving [13]–[16]. De veloping resource management mechanisms that can cater to intrinsic needs of wireless users and their context (e.g., de vice features or social metrics) has recently been studied in [5], [17]–[24]. In [17], a context-a ware scheduling algorithm for 5G systems is proposed. This algorithm e xploits the context information of user equipments (UEs), such as battery le vel, to sav e energy in the system while satisfying the QoS requirements of users. The authors in [18], proposed a user-centric resource allocation framework for ultra-dense heterogeneous networks. Context-a ware resource allocation for heterogeneous cellular networks is also studied in [5], [19], and [20]. In [5], a novel approach to context-aw are resource allocation in small cell networks is introduced. Both wireless physical layer metrics and the social ties of human users are exploited in [5] to allocate wireless resource blocks. Proactiv e caching using context information from 3 social networks is studied in [21]. The results in [21] show that such a socially-aware caching technique reduces the peak traffic in 5G networks. Other context-a ware resource allocation algorithms are also studied in [22], [25], and [24]. Howe ver , despite this surge in literature on context-a ware networking [5], [17]–[22], [24], and [25] this prior art is still reliant on de vice-le vel features. Moreover , the works in [5], [17]–[22], [24], and [25] are agnostic to the human users and their features (e.g., brain limitation or behavior). Hence, adapting these existing approaches can waste network resources due to the potential allocation of more resources to human users that cannot percei ve the associated QoS gains, due to cogniti ve brain limitations. A general framew ork for modeling the intelligence of communication systems which serve humans is proposed in [26]. The author defines intelligence in terms of predicting and serving human demands in adv ance. Howe v er , the work in [26] does not account for the cognitiv e limitations of a human brain. Moreov er , demand prediction, as done in [26], will not be sufficient to capture the full spectrum of the human user limitations and behavior . By being aw are of brain limitations of each user , the network can pro vide a unique experience for each user and optimize its performance. For example, an increase in the delay of a wireless system may have dif ferent effects on the QoE percei ved by dif ferent human users. In particular , such dif ferent delay perceptions can potentially be exploited by the cellular network to minimize po wer consumption and reduce the amount of wasted resources. T o our best knowledge, no existing work has studied the impact of such disparate brain delay perceptions on wireless resource allocation. systems. Furthermore, none of the prior studies on systems with humans-in-the-loop in other fields [27] and [28] ha ve analyzed the human brain limitations. The main contribution of the paper is a novel brain-awar e learning and r esour ce management frame work that explicitly factors in the brain state of human users during resource allocation in a cellular network. In particular , we formulate the brain-aw are resource allocation problem using a joint learning and optimization framew ork. First, we propose a learning algorithm to identify the delay perceptions of a human brain. This learning algorithm emplo ys both supervised and unsupervised learning to identify the brain limitations and also creates a statistical model for these limitations based on Gaussian mixture models. Then, using L yapunov optimization, we address the resource allocation problem with time varying QoS requirements that captures the learned delay perception. Using this approach, the network can allocate radio resources to human users while considering the reliability of both machine type devices and human users. W e then identify a closed-form relationship between system reliability and wireless physical layer metrics 4 and deriv e a closed-form expression for the reliability as a function of the human brain’ s delay perception. Simulation results using real data sho w that the proposed brain-aware approach can substantially sav e power in the network while preserving the reliability of the users, particularly in low latenc y applications. In particular , the results show that the proposed brain-aw are approach can yield po wer savings of up to 78 % compared to a con ventional, brain-unaware system. The rest of the paper is or ganized as follows. Section II introduces the system model. Sec- tions III and IV present the proposed learning algorithm and resource allocation framew ork, respecti vely . Section V presents the simulation results and conclusions are drawn in Section VI. I I . S Y S T E M M O D E L A N D P RO B L E M F O R M U L A T I O N Consider the do wnlink of a cellular network with humans-in-the-loop having a single base station (BS) serving a set H of N human users with their UEs and a set M of M machine type de vices (MTDs). W e assume that each human user uses one UE. Each UE or MTD can hav e a dif ferent application with dif ferent QoS requirements such as sending a command to an actuator (for an MTD) or playing a 3D interactiv e game (for a UE). W e consider a time-slotted system, and define K as the set of K resource blocks (RBs). In our model, the packets associated with user i ∈ H ∪ M arri ve at the BS according to independent Poisson processes with rate a i ( t ) . The lengths l i , ∀ i ∈ H ∪ M of the packets follo w an exponential distrib ution. Hence, each user’ s buf fer at the BS will follo w an M/M/1 queuing model. The total queuing and transmission delay of each user i is D i ( t ) = q i ( t ) + l i r i ( t ) , where q i ( t ) is the queuing delay . The data rate for each user is gi ven by: r i ( t ) = B K X j =1 ρ ij ( t ) log 2  1 + p ij ( t ) h ij ( t ) σ 2  , (1) where p ij ( t ) is the transmit power between the BS and user i ov er RB j at time t and h ij ( t ) is the time-varying Rayleigh fading channel gain. In (1), ρ ij ( t ) = 1 if RB j is allocated to user i at time slot t , and ρ ij ( t ) = 0 , otherwise. B is the bandwidth of each RB. σ 2 is the noise po wer which is defined as the po wer spectral density of the noise multiplied by the bandwidth B . W e define β i ( t ) as the delay perception threshold for any user i ∈ H ∪ M at time t . If the delay decreases below the threshold β i ( t ) , the user will not be able to discern the change in service quality . W e use the concept of delay perception β i ( t ) to measure time v arying delay requirements of UEs and MTDs. Since the delay perception for MTDs is constant, hereinafter , for simplicity , we use β i ( t ) to exclusiv ely denote the delay perception of human users, i.e, β i ( t ) , ∀ i ∈ H , unless 5 mentioned otherwise. This delay perception can be affected by multiple sources pertaining to the human brain such as conte xt, human attention, human fatigue or cogniti ve abilities and is determined by measuring the capabilities of the human brain at each time slot using machine learning methods. By explicitly accounting for the cognitive limitations of the human brain, the BS can better allocate r esour ces to the users that need it, when the y can actually use it . This is in contrast to con ventional brain-agnostic networks [5], [26] in which resources may be wasted, as they are allocated only based on application QoS without being aware on whether the human user can indeed process the actual application’ s QoS target. W e pose this resource allocation problem as a po wer minimization problem that is subject to a brain-aware QoS constraint on the latency: min ρ ( t ) , P ( t ) X j ∈K h X i ∈H ¯ P j i + X i ∈M ¯ P j i i , (2a) s.t. Pr  D i ( t ) ≥ D max i  β i ( t )  } ≤  i  β i ( t )  , ∀ i ∈ H ∪ M , (2b) p ij ( t ) ≥ 0 , ρ ij ( t ) ∈ { 0 , 1 } ∀ i ∈ H ∪ M , j ∈ K , (2c) X i ∈H∪M ρ ij ( t ) = 1 , ∀ j ∈ K , (2d) where ρ ( t ) is an ( M + N ) × K matrix having each element ρ ij ( t ) . P ( t ) is an ( M + N ) × K matrix with each element p i,j ( t ) representing the instantaneous po wer allocated to user i on RB j . The term ¯ P j i = lim t →∞ 1 t P t − 1 τ =0 ρ ij ( τ ) p ij ( τ ) is the time a verage of the po wer allocated to user i on RB j . D i ( t ) incorporates both transmission and queuing delays. D max i  β i ( t )  is the maximum tolerable delay . This delay depends on β i ( t ) because changes in human delay perception will change maximum tolerable delay for human users.  i ( β i ( t )) in equation (2b) denotes the maximum probability of the packet delay exceeding D max i ( β i ( t )) . Hence, we can define 1 −  i ( β i ( t )) as the reliability of the user i . W e define r eliability as the proportion of time during which the delay of a gi ven user does not exceed a threshold. For notational con venience, hereinafter , we use the terms D max i ( β i ( t )) and D max i interchangeably . Constraint (2b) takes into account the pack et size and the rate of the application implicitly in addition to the maximum tolerable delay and reliability . From a resource allocation perspective, we can consider any application using (2b). The key dif ference between our problem formulation and con ventional RB allocation problems [29] is seen in the QoS delay requirement in (2b). Constraint (2b) is with respect to two random processes D i ( t ) and β i ( t ) . In (2b), the network e xplicitly accounts for the human brain’ s (and 6 M T D 1 M T D M U E 1 F e e dba c k c ha nne l D ow nl i nk c ha nne l U E N Figure 1: Illustration of the system model. U nl a be l e d T r a i ni ng D a t a U ns upe r vi s e d L e a r ni ng ( EM ) G M M M ode l L a be l s S up e r vi s e d L e a r ni ng P r e di c t or I nput O ut put P D I M ode l i x ) ( i w c i w f ) ( i w p Figure 2: Graphical representation of building a PDI model. the MTDs’) delay needs. By taking into account the features of the brain of the human UEs, the network can av oid wasting resources. This waste of resources can stem from allocating more po wer to a UE, solely based on the application QoS, while ignoring how the brain of the human carrying the UE perceiv es this QoS. Clearly , ignoring this human perception can lead to inef ficient resource management. W e propose a machine learning algorithm to identify the human brain delay perception β i ( t ) . Each human user has d features, (e.g., age, occupation, location) assumed to be kno wn to the BS. This time-varying feature vector is denoted by x i ( t ) ∈ R d . W e dev elop a learning algorithm to build a model that maps these features to β i ( t ) for each user . W e then show that being aware of β i ( t ) can help the resource allocation algorithm to save a significant amount of resources for lo w-latency systems. W e assume that the BS has access to the user features x i ( t ) . In practice, the BS can collect such data whenev er a gi ven user registers in the network or by using the sensors of a user’ s mobile device. The system model is shown in Fig. 1. Also, T able I provides a list of our main parameters and notations. T o find the mapping β i ( t ) = f ( x i ( t )) between human features x i ( t ) and the delay perception of the brain, we introduce a novel supervised learning mechanism called the pr obability distrib ution identification (PDI) method . Here, function f ( . ) shows this mapping. Since reliability is a key factor in a communication system, we need a supervised learning algorithm that not only predicts 7 T able I: List of notations. Notation Description Notation Description p ij Power of user i on RB j ρ ij RB j allocation indicator to user i D i Packet delay for user i L T otal number of clusters, brain modes D max i ( . ) T arget delay for user  i ( . ) target reliability of user i β i ( t ) Delay perception of user i at time t V Balancing parameter for reliability constraint h ij ( t ) T ime-varying Rayleigh fading channel gain λ Lagrange multiplier B RB bandwidth σ 2 Noise power N Number of UEs k ( j ) Optimal allocation for RB j K Number of av ailable RBs Π Projection operator a i ( t ) Arriv al rate of user i in time t ζ i Subgradient of element i of λ b F i ( t ) V irtual queue for user i at time t Σ k Cov ariance matrix of mode k of human brain n # of training users, ψ ( . ) PDF of multiv ariate normal D max i T arget end-to-end latency for user i µ k Mean vector of mode k of human brain M set of MTDs H Set of UEs K Set of RBs χ 2 d +1 ( . ) chi-square distribution function π k Mixture weight of mode k z Mode indicator vector L Likelihood function d # of features for each user f Supervised learning model c ( w i ) , c i Cluster (mode) number of w i y Set of labels for supervised learning ξ ( . ) 0–1 loss function D min i (  0 ) Effecti ve delay of human user i Q d ( γ ) Quantile function of chi-square distribution with d DOF S Reliability ev ent of the system χ d ( . ) Chi-square function with d DOF L a Lagrange dual function e j Unit vector at j β i ( t ) as a function of x i ( t ) , b ut also gi ves a measure of reliability for this prediction. This measur e of r eliability is one of the ke y adv antages of PDI learning ov er other supervised learning methods [30], [31]. Although con ventional methods such as neural networks can be used to approximate the continuous function f ( . ) [31], these methods cannot quantify the reliability of this prediction. The reliability of predictions is defined as the probability that the prediction of β i ( t ) lies within a certain range of the true values for β i ( t ) . The PDI method can find the distrib ution of the prediction values. Although many e xisting supervised learning methods can build a model for predicting an output based on a giv en input, they fail to find a statistical model for this prediction. In addition, by using the statistical model for the predictions s resulting from our proposed PDI, the system designer can better design the system based on desired reliability . As discussed in [32], the delay perception of a human brain typically follows a multi-modal distribution. As a result, we design the proposed PDI approach to capture such a model and find the dif ferent modes of a human brain. Then, using the distrib ution of the brain delay , the PDI approach can find the effective delay of the human brain. This ef fectiv e delay determines relationship between β i ( t ) and x i ( t ) along with its reliability . 8 A. Building a PDI model Consider a dataset { x 1 ( t ) , · · · , x n ( t ) } , where x i ( t ) ∈ R d is one sample data vector . The elements of x i ( t ) are user features which can be both categorical (such as gender) and numerical (such as age). For each input vector x i ( t ) , we ha ve a corresponding output value of delay perception β i ( t ) . This data can be collected using experiments or surve ys such as those in [33]. Since we can remov e time dependency of the data using time-series techniques such as in [34], hereinafter , we use x instead of x ( t ) . Although we omit the time dependency from x i ( t ) for the training process, it is still implicitly a function of time. This dataset can be represented by a matrix X ∈ R n × d , where x T i is row i of X . Using PDI, we first create an n × ( d + 1) dataset matrix: W = [ X k β ] =      w T 1 . . . w T n      =      x T 1 β 1 ( t ) . . . . . . x T n β n ( t )      , (3) where w i ∈ R d +1 is a vector of the delay perception β i ( t ) and d other correlated features of the human brain. First, in the unsupervised learning step, we fit a Gaussian mixture model (GMM) to our dataset using the e xpectation-maximization (EM) algorithm [35] to obtain p ( x i , β i ( t )) . After finding p ( x i , β i ( t )) , we are able to cluster the data samples and find m brain modes in the data. Then, each data vector x i is labeled based on its cluster number c i so that each x i , i = 1 , · · · , n has a label in the cluster set c i ∈ C = { 1 , · · · , L } . Using this method we ha ve a labeled dataset which can be used for the supervised learning. These cluster numbers will corr espond to the modes of the human brain that determine its effective delay perception. Next, we describe the Gaussian mixture model that we use as underlying model for the human brain. W e use GMM for clustering (unsupervised learning) and statistical modeling of human brain because of its multimodal structure which resembles human brain acti vites [36], its scalibility , and its robustness and stability under high-noise levels compared to nonparametric methods. A multi-modal stochastic model is assumed for the brain features w i for user i . The proposed distribution for w i is giv en by [30]: p ( w i ) = X z p ( z ) p ( w i | z ) = L X k =1 π k ψ ( w i | µ k , Σ k ) , (4) where ψ ( w i | µ k , Σ k ) is the probability density function for a multiv ariate normal distribution 9 with mean vector µ k and cov ariance matrix Σ k . Σ k and µ k represent the cov ariance matrix and mean vector for mode k of the human brain, respectively . z is a binary random vector , in which a particular element z k is equal to 1 and all other elements are 0. z essentially indicates which mode is acti vated in the GMM, and π k is defined as p ( z k = 1) = π k . L is the total number of modes in the GMM. The human brain will be in mode k with probability π k , and its features are generated using a multiv ariate normal distribution with mean and cov ariance µ k and Σ k , respecti vely . The posterior probability , i.e. r esponsibility , for mode k will be: r i ( z k ) = π k ψ ( w i | µ k , Σ k ) P L j =1 π j ψ ( w i | µ j , Σ j ) . (5) This responsibility can be used for clustering the data as well. After fitting the GMM on the dataset, we can find the mode with highest responsibility for each data point and assign the data to this mode. The EM algorithm is used to find µ k , Σ k , and π k for all k = 1 , · · · , L , based on the real-time human brain beha vior [35]. The log likelihood function for our dataset can be written as: ln L ( Σ , µ, π | w ) = ln X i p ( w i | Σ , µ, π ) = X i ln L X k =1 π k ψ ( w i | µ k , Σ k ) . (6) The lik elihood function in (6) has singularities and, hence, it is infeasible to find parameters π k , Σ k , and µ k . The EM algorithm is proposed in [37] to maximize the likelihood function for a Gaussian mixture model. In the EM algorithm, we first initialize Σ k , µ k , and π k randomly . Next, we find the responsibility for each mode using (5). Then, we reestimate parameters using current responsibilities. Finally , the likelihood in (6) is maximized with respect to Σ k , µ k ,and π k . As a result of the EM algorithm (unsupervised learning), we now hav e a GMM of our dataset matrix W . Based on this GMM, the data will be labeled (clustered) as follows. For each data point w i , the most probable mode is assigned as the label of this data, i.e., c i = c ( w i ) = arg max k p ( z k = 1 | w i ) = arg max k r i ( z k ) . (7) In (7), we assign the most likely cluster to each data point w i . After b uilding a GMM model using unsupervised learning on the W dataset to obtain target vector y = h c ( w 1 ) · · · c ( w n ) i T , we use the pair { X , y } to train a supervised learning model. Thus, the output of the unsupervised learning step y is used for training the supervised learning model. Then, during the supervised learning step, we train a classifier so that it can find the mode c i using the human features x i as input. Gi ven the data matrix X and the output vector y , this supervised learning builds us a 10 Algorithm 1 Building PDI model Input: w i = [ x i β i ] , i = 1 , · · · , n Output: f , π k , µ k , Σ k , k = 1 = · · · , L Unsupervised Learning : 1: Apply EM algorithm to w i , find π k , µ k , Σ k , r i ( z k ) k = 1 , · · · , L, i = 1 · · · , n. 2: for i = 1 = · · · , n do 3: find c ( w i ) using (7). 4: end f or 5: Pass y =  c ( w 1 ) · · · c ( w n )  T to the supervised learning algorithm Supervised Learning : 6: Find f using (8). 7: return f , π k , µ k , Σ k , k = 1 = · · · , L model f such that c i = f ( x i ) , where f = ar g min ˆ f n X i =1 ξ  c ( w i ) , ˆ f ( x i )  , (8) where ξ ( . ) is a 0-1 loss function [38, Equation 7.5]. f is a function that is approximated using a set of points ( x i , c i ) and determines the relationship between the features of a user and its cluster . T o o vercome o verfitting, we use the elbow method for finding the optimal number of clusters in PDI, as discussed in Section IV. After approximating f , giv en each human user’ s feature vector x i , we find the modes c i using model f . Algorithm 1 summarizes b uilding a PDI learning model. Also, a graphical representation for building a PDI model is sho wn in Fig. 2. B. Deployment of the PDI learning model In this subsection, we bound D max i ( β i ( t )) based on its features x i . Note that the deployment and training of the PDI learning method are separated from each other . The deployment part only uses the model generated by the training part. Now that the system can identify the human users’ modes, we need to find a relationship between a human user’ s mode and the probabilistic model of its delay perception by defining the concept of ef fectiv e delay . Definition 1. Gi ven the statistical model for human delay perception β i ( t ) , D min i (  0 ) is the effective delay for human user i that satisfies: Pr  β i ( t ) < D min i (  0 )  <  0 . (9) T o find the effecti ve delay for human user i , we first find the probability that the delay perception of human user i is less than a threshold D min i (  0 ) . In other words, we want to find the relation between  0 and D min i (  0 ) in (9). The concept of effecti ve delay is defined using the fact that delays less than D min i (  0 ) cannot be sensed by a human with ( 1 −  0 ) certainty . The relation 11 Algorithm 2 Deploying PDI model for finding brain delay perception Input: x i , i = 1 , · · · , N Output: µ i , Σ i , i = 1 , · · · , N 1: for i = 1 = · · · , N do 2: find the cluster for x i : k i = f ( x i ) and hence µ k i , Σ k i . 3: find function D min i (  0 ) using (13) 4: using target reliability find  0 and then using function D min i (  0 ) find D min i . 5: set the D max i ( β i ( t )) = D min i , and using (16) and  0 find  ( β i ( t )) 6: end f or 7: return D max i ( β i ( t )) and  ( β i ( t )) for i = 1 = · · · , N between  0 and D min i (  0 ) in (9) is found in Theorem 1. For notational simplicity , hereinafter , we use D min i instead of D min i (  0 ) . Theorem 1. If brain mode k is identified for user i , then its delay perception will be bounded as follows: Pr n | β i ( t ) − µ k ( d + 1) | < q Q d +1 ( γ ) e T d +1 Σ k e d +1 o > γ , (10) where Σ k and µ k ( d + 1) represent, respecti vely , the co variance matrix and the ( d + 1) th element of the mean vector of the identified brain mode k . Q d ( γ ) is the quantile function of chi-square distribution with d degrees of freedom, and is defined as Q d +1 ( γ ) = inf n x ∈ R | γ ≤ Z x 0 χ 2 d +1 ( u ) du o , (11) and e j is a unit vector in R d +1 , whose j th element is 1 and all other elements are zero. d is number of features used for learning. χ 2 d +1 ( x ) is the probability density function of a chi-square random variable with d + 1 degrees of freedom. Pr oof: See Appendix A. Note that the bound in (10) is dif ferent from finding a bound using marginal distributions, as it is based on high probability density areas. The use of a marginal distribution is not possible here, since the Gaussian assumption is only v alid locally around the mean. Also, in case of data classification error which mostly happens when the data is located in the overlapping area between clusters, the bound in (10) will be either more conservati ve than the actual bound for delay perception or it will not change significantly compared to the actual bound. As seen from Theorem 1, in addition to the delay perception element µ k ( d + 1) , the only other parameter that af fects the delay is e T d +1 Σ k e d +1 , which is the ( d + 1) th diagonal element of Σ k , which is not assumed to be diagonal. Fig.3 shows the relationship between D min i and GMM random data 12 generated from a Gaussian mixture model. Fig.3 sho ws that, after finding the GMM for the dataset, one can find the predictiv e coverage of each Gaussian distribution and, then, we can determine the probability with which β i ( t ) for a user i will be higher than a threshold D min i . In order to find the ef fectiv e delay for human user i , we first find the probability with which the delay perception for human user i will be less than a threshold D min i . In other words, we will find the relationship between  and D min i in (9) using the following corollary that follows directly from Theorem 1. Corollary 1. As a direct result of Theorem 1, we can reduce (10) to Pr n β i ( t ) < µ k ( d + 1) − q Q d +1 ( γ ) e T d +1 Σ k e d +1 o < 1 − γ 2 . (12) Therefore, we find D min i (  ) and  in (9) as D min i (  ) = µ k ( d + 1) − q Q d +1 (1 − 2  ) e T d +1 Σ k e d +1 . (13) Since Q d +1 ( γ ) can only be calculated numerically , a closed-form relationship between D min i (  ) and  cannot be found. Ho we ver , we can numerically analyze this relationship, as shown in Fig. 4. Fig. 4 is found using a set of points generated with (13) for different v alues of µ k ( d + 1) and Σ k . From Fig. 4, we can first observe that D min i (  ) is an increasing function. This means that the pr obability of the human br ain noticing QoS differ ences for low delays will be much smaller than for higher delays , which is an intuiti ve fact. Furthermore, it can be inferred that, if the delay perception for a group of human users within a cluster is di verse, then the system’ s confidence on the delay perception of this group of humans will decrease, i.e., the estimation of the delay perception of this group of human users will be less reliable. Next, we determine constraint (2b) using D min i (  ) . As stated before, some delays are not perceptible to human users. T o capture this feature, we find D max i ( β i ( t )) and  ( β i ( t )) in problem (2) using D min i (  ) . Recall that D max i ( β i ( t )) is a parameter that will be used by the resource allocation system to represent the maximum tolerable delay for the reliable communication of user i with 1 −  ( β i ( t )) being the reliability of user i . There are three possible cases for D max i based on D min i of a human user i : 1) D max i > D min i : In this case, the system will not be reliable e ven if we satisfy Pr ( D > D max i ) <  . The reason is that the human user has a delay perception of less than the maximum delay D max i and hence, the system is not reliable. 13 0 0.2 0.4 0.6 0.8 1 1.2 Delay perception (ms) 0 10 20 30 40 50 60 70 Age of human user (year) Cluster 1 Cluster 2 Predictive coverage 2 Predictive coverage 1 D i min D i min Figure 3: Finding D min i using a GMM model for two different clusters. 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 Minimum perceptible delay (ms) 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 k (p+1)=5 ms, k =0.1 I k (p+1)=5 ms, k =I k (p+1)=6 ms, k =0.1 I k (p+1)=6 ms, k =I Figure 4: Relationship between  and D min i (  ) for different v alues of µ k ( d + 1) and Σ k . I is the identity matrix. 2) D max i < D min i : In this case, if the system is able to satisfy Pr ( D > D max i ) <  , then the system will be reliable, because user i cannot sense delays less than D min i and its service delay will not exceed D max i . 3) D max i = D min i : If this equality holds, the system will be reliable and it will also hav e pre vented a waste of resources. If any gi ven user cannot percei ve delays less than D min i , then it is not ef fecti ve to allocate more resources to this user . W e define S as the ev ent resulting from case 2 and case 3 while assuming e vents E 1 and E 2 satisfy D < D max i and β i ( t ) > D min i , respecti vely . W e know that for case 1, ev ent E 1 ∩ E 2 is a subset of ev ent S , and in case 2, ev ent S is a subset of e vent E 1 ∩ E 2 . Similarly , in case 3, e vent E 1 ∩ E 2 is same as e vent S . Since the probability of E 1 ∩ E 2 can be computed, if we set D min i to D max i (case 3), we can find S as follo ws: Pr ( E 1 ∩ E 2 ) = 1 − Pr  ( D > D max i ) ∪ ( β i ( t ) < D min i )  (14) = 1 −  Pr ( D > D max i ) + Pr ( β i ( t ) < D min i ) − Pr ( D > D max i ) Pr ( β i ( t ) < D min i )  . (15) (14) follows from De Morg an’ s law , and (15) is true since D and β i ( t ) are two independent random v ariables at each iteration. Therefore, if D min i = D max i for user i and  0 is small, we can see that 1 −  Pr ( D > D max i ) + Pr ( β i ( t ) < D min i )  ≥ 1 − (  +  0 ) , (16) and, hence, Pr ( S ) = Pr ( E 1 ∩ E 2 ) > 1 − (  +  0 ) , where Pr( S ) is the reliability of the system defined 14 in (2). Subsequently , as we design the system, we consider the reliability as a predetermined target design parameter for the system. Using this parameter , we can set  and  0 . Gi ven  0 and numerical function D min i (  0 ) deri ved in (12), D min i can be determined. Now , giv en  ( β i ( t )) and D max i ( β i ( t )) , we can fully characterize problem (2). W e note that, as the human user delay perception changes throughout a day , D max i ( β i ( t )) changes accordingly . Therefore, our solution should take into account changes in D max i ( β i ( t )) as well as changes in channel gains h ij ( t ) . D max i is a threshold and 1 −  denotes the reliability which is used in all cellular systems. Ho wev er , in a wireless system that explicitly takes into account human users in its loop, D max i will become a function of β i ( t ) and can be determined using the concept of effecti ve delay D min i (  0 ) defined in (9). I I I . B R A I N - A W A R E R E S O U R C E M A N AG E M E N T The notion of human-in-the-loop implies that human factors (such as brain limitations) will be part of the resource allocation frame work, i.e., in the loop of resource allocation. For such a system, resource management can dynamically adapt to the human user in its loop, as opposed to just the device. Therefore, our approach considers the human brain limitations as functions of time and adapts the system to the dynamic changes that can occur in the brain and its cogniti ve limitations, over time. Since the human brain state usually changes rapidly , our w ork is dif ferent from conte xt-aware or QoS-aw are works. The fast fluctuations in the cogniti ve acti vities of a human brain which hav e been v alidated in many works such as [39]–[41] requires the resource allocation frame work to be aw are of time-v arying brain-aware delay constraint in each time slot. T o solve problem (2), we propose a nov el brain-aw are resource management framew ork that takes into account the time-varying wireless channel and the time-v arying brain-aw are delay constraint (2b). W e transform this constraint into a mathematically tractable form. The relation between the packet length distribution and the service time distrib ution for a packet is shown next, in Corollary 2. Here, the packet service time is defined as the transmission time of a packet from the BS to the UE or MTD. Corollary 2. If a fixed rate r i is allocated to a user and the packet lengths follow an exponential distribution with parameter χ , then, the distribution of the service time s will also be exponential with parameter χr i . 15 Pr oof: The CDF of the exponential distribution is F l ( ψ ) = Pr ( l < ψ ) = 1 − e − χψ . Hence, F s ( S ) = Pr ( s < S ) = Pr ( l r i < S ) = Pr ( l < r i S ) = F r i S ( s ) = 1 − e − χr i S . (17) This means that the PDF for the service time is f S ( s ) = e − χr i s . the packet length distribution parameter χ is constant in our analysis, without loss of generality , we assume that the service time of each packet is an exponential random variable with parameter r i , which is the same as the rate allocated to the user . W e assume that for any giv en user, the packets arriv e according to a Poisson process with the rate a i ( τ ) , and the user data rate is exponential with parameter r i ( τ ) in slot τ = 1 , · · · , t . Next, we deri ve the probability with which the delay of a gi ven user i exceeds a threshold D max i . Theorem 2. Assume that user i has a time varying rate r τ i at time slot τ . If the duration of each time slot is long enough for the queue to reach its steady state, i.e., 1 r i ( τ ) − a i ( τ ) << δ τ , (18) then, the probability that the delay exceeds a threshold is Pr ( D > D max i ) = lim t →∞ 1 t t X τ =1 e −  r i ( τ ) − a i ( τ )  D max i , (19) under the condition that r i ( τ ) > a i ( τ ) for all τ > 0 . Pr oof: See Appendix B. Theorem 2 shows that constraint (2b) is satisfied if the network can satisfy the follo wing condition lim t →∞ 1 t t X τ =1 e −  r i ( τ ) − a i ( τ )  D max i < . (20) Fig. 5 sho ws the relationship between the theoretical result from Theorem 2 and simulation results. Clearly , simulation and analytical results are a near-perfect match with a maximum error of only 0 . 0146 . A. Optimal Resour ce Allocation with Guaranteed Reliability Constraint (20) is analogous to the drift-plus-penalty method in L yapunov optimization frame- work [42] which we use to solve (2). The problem has a time-varying nature since the human brain conditions and needs will change from time to time. The users’ processing state β ( t ) is also a function of time, and accordingly , the latency needs in (2b) will be time-varying. Therefore, we need to solve the optimization problem (2) during each time slot ef ficiently . W e propose an 16 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 D i max (ms) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Pr( D i > D i max ) Figure 5: Comparison between simulation results and the result of Theorem 2. algorithm with a lo w computational comple xity for solving problem (2). The drift-plus-penalty approach is used to stabilize a queue network while minimizing time av erage of a penalty function. T o satisfy constraint (2b) in all time slots, (19) must be smaller than  . For this reason, we use virtual queues to model the time av erage constraint (20) in the optimization problem. W e define a virtual queue: F i ( t + 1) = max { F i ( t ) + e −  r i ( t ) − a i ( t )  D max i − , 0 } . (21) W e can see that e −  r i ( t ) − a i ( t )  D max i −  < F i ( t ) − F i ( t ) . Consequently , we obtain: t X τ =1 e −  r i ( τ ) − a i ( τ )  D max i − t < F i ( t ) − F i (0) . (22) If F i (0) is bounded, we have: lim t →∞ 1 t t X τ =1 e −  r i ( τ ) − a i ( τ )  D max i −  < lim t →∞ F i ( t ) t . (23) If the queue F i ( t ) is mean-rate stable, that is, lim t →∞ F i ( t ) t = 0 , then we hav e: lim t →∞ 1 t t X τ =1 e −  r i ( τ ) − a i ( τ )  D max i < . (24) The L yapunov function is defined for all the queues in the base station as Y ( t ) = 1 2 P i ∈M∪ H F i ( t ) 2 . Then, we can find the drift function ∆ t = Y ( t + 1) − Y ( t ) as: Y ( t + 1) = 1 2 X i ∈M∪H F i ( t + 1) 2 ≤ 1 2 X i ∈M∪H F i ( t ) 2 + 1 2 X i ∈M∪H y i ( t ) 2 + X i ∈M∪H y i ( t ) F i ( t ) , (25) where y c i ( t ) = e −  r i ( t ) − a i ( t )  D max i − . (26) 17 Thus, ∆ t ≤ 1 2 X i ∈M∪H y c i ( t ) 2 + X i ∈M∪H y c i ( t ) F i ( t ) . (27) W e can form the drift-plus-penalty by adding V P i,j p ij ( t ) to both sides of inequality (27), where P i,j p ij is the total power of the BS which we want to minimize, and V is a parameter that determines how important minimizing the objecti ve function (2a) is in comparison with satisfying (2b). W e can balance the tradeof f between power and delay . The drift-plus-penalty inequality is ∆ t + V X i,j p ij ≤ 1 2 X i ∈M∪H y c i ( t ) 2 + V X i,j p ij ( t ) + X i ∈M∪H y c i ( t ) F i ( t ) . (28) Gi ven that we assumed r i ( t ) > a i ( t ) for all t , we kno w that | y c i ( t ) | < 1 ∀ t, i ∈ M ∪ H , and hence, we can rewrite (28) as ∆ t + V X i,j p ij ≤ U B + V X i,j p ij ( t ) + X i ∈M∪H y c i ( t ) F i ( t ) , (29) where U B is the upper bound of 1 2 P i ∈M∪H y c i ( t ) 2 , and is equal to |H| + |M| 2 . |H | is the cardinality of set H . Using the drift-plus-penalty algorithm [43], we know that, by minimizing the right hand side of equation (29), queue F i ( t ) will be mean-rate stable, and hence, the condition y c i ( t ) < 0 will be satisfied. As a result, constraint (2b) will also be satisfied. Furthermore, we know that by minimizing the right hand side of (29), cost function (2a) is also minimized, o wing to the fact that (2a) is defined as a penalty function. By minimizing the right hand side of (29), our optimization problem can be con verted to the follo wing time-varying problem: min ρ ( t ) , P ( t ) V X i,j p ij ( t ) + X i ∈M∪H y c i ( t ) F i ( t ) , (30a) s.t. r i ( t ) > a i ( t ) , ∀ i ∈ H ∪ M (30b) p ij ( t ) ≥ 0 , ρ ij ( t ) ∈ { 0 , 1 } , ∀ i ∈ H ∪ M , j ∈ K , (30c) X i ∈H∪M ρ ij ( t ) = 1 , ∀ j ∈ K . (30d) The cost function in (30a) is equi valent to (2a) and (2b) in the original optimization problem. Learning the effecti ve delay of each human user using our proposed PDI method determines the parameters y c i ( t ) and F i ( t ) in the problem (30a). Ho wev er , in order to satisfy (2b), we need to also satisfy (30b). The reason for adding (30b) is that if this constraint is not satisfied in 18 any time slot, the queue length will approach infinity . Constraints (30c) and (30d) are feasibility conditions and remain the same as (2). Hence, by solving (30) in each time slot, the original problem (2) will be solved. Nonetheless, problem (2a) is not a con ve x optimization problem, due to the fact that it is a mixed integer problem and its complexity increases e xponentially with the number of users. Since (2a) needs to be solv ed at each time slot, this exponential order of complexity makes the implementation infeasible. Consequently , we should use a dual decomposition method to break do wn optimization problem (30) to smaller subproblems, and find the optimal solution to (30) using a lo w complexity method. It is rather challenging to solve (30) using a dual decomposition method, as the structure of y c i ( t ) makes it infeasible to decompose the objectiv e function for each RB. In order to ov ercome this challenge, we con vert (30) to a decomposable form. Then, we will sho w that this con verted problem is equiv alent to (30). For this purpose, the Lagrangian for problem (30) is written as V X i,j p ij ( t ) + X i ∈M∪H y c i ( t ) F i ( t ) + X i ∈M∪H λ i  a i ( t ) − r i ( t )  , (31) where λ i is the Lagrange multiplier . As we know , y c i ( t ) = e −  r i ( t ) − a i ( t )  D max i −  . Therefore, the only decision variables are allocation of resource blocks to the users and allocating power to each RB. Although F i ( t ) is a function of y c i ( t ) , it is not a decision variable and is treated as a constant. Hence, (31) can be re written as V X i,j p ij ( t ) + X i ∈M∪H e −  r i ( t ) − a i ( t )  D max i F i ( t ) + X i ∈M∪H λ i  a i ( t ) − r i ( t )  . (32) The main optimization problem consists of two components. First, minimizing the total power of the BS with weight V , and second, minimizing the summation P i ∈M∪H e − r i ( t ) which has a weight F i ( t ) e − a i ( t ) D max i for each user i . As we can see, (32) is not decomposable for each RB. Here we will ha ve an approximation of (30) and then propose an algorithm to solve this approximation ef ficiently . In this C-additiv e ap- proximation, P i ∈M∪H e −  r i ( t ) − a i ( t )  D max i F i ( t ) in (32) is substituted with its linear approximation of exponential term e − x at x = 0 . X i ∈M∪H −  r i ( t ) − a i ( t )  D max i F i ( t ) . (33) In the original problem, if y c i ( t ) starts to become greater than zero for user i , then F i ( t ) will increase and it will gi ve more weight to the term e −  r i ( t ) − a i ( t )  D max i . As a result, the algorithm 19 allocates more resources to user i such that it minimizes e −  r i ( t ) − a i ( t )  D max i for user i , and accordingly , y c i ( t ) decreases. Hence, F i ( t ) e −  r i ( t ) − a i ( t )  D max i plays the role of feedback in the system. As we can see from (33), this approximation will not change this feedback mechanism and plays the same role in the system. Therefore, we can write min P ,ρ  V X i,j p ij ( t ) + X i ∈M∪H −  r i ( t ) − a i ( t )  D max i F i ( t )  < C + min P ,ρ  V X i,j p ij ( t ) + X i ∈M∪H e −  r i ( t ) − a i ( t )  D max i F i ( t )  . (34) Using this C-additi ve approximation, it can be easily prov ed that all terms are mean-rate stable. Hence, (2b) in the original problem is satisfied [42]. Finally , problem (2) can be presented as: min ρ ( t ) , P ( t ) V X i,j p ij ( t ) − X i ∈M∪H  r i ( t ) − a i ( t )  D max i F i ( t ) , s.t. r i ( t ) > a i ( t ) , (35a) p ij ( t ) ≥ 0 , ∀ i ∈ H ∪ M , j ∈ K , (35b) ρ ij ( t ) ∈ { 0 , 1 } , ∀ i ∈ H ∪ M , j ∈ K , (35c) X i ∈H∪M ρ ij ( t ) = 1 , ∀ j ∈ K . (35d) In order to solve this problem, we can decompose it into K subproblems. Since these subprob- lems are coupled through constraint (35d), we use the dual decomposition method for solving (35) [44]. First, the Lagrangian is written for problem (35), and in the second step, it is decomposed for each RB. Then, the resource block allocation and the power of each RB are found in terms of the Lagrange multiplier vector λ . Finally , λ is calculated using an ellipsoid method. The Lagrangian for problem (35) is L a ( P , ρ, λ ) = V X i,j p i,j ( t ) + X i ∈M∪H −  r i ( t ) − a i ( t )  D max i F i ( t ) − λ i  r i ( t ) − a i ( t )  = V X i,j p i,j ( t ) − X i ∈M∪H  λ i + D max i F i ( t )  r i ( t ) − a i ( t )  . (36) One major difference between our problem and con ventional po wer minimization problems is that there is an additional term D max i F i ( t ) added to the Lagrange multiplier (the shado w price). In this problem, D max i F i ( t ) plays the role of a bias term. Therefore, a new hypothetical Lagrange multiplier λ 0 i is assumed and defined as λ 0 i = λ i + D max i F i ( t ) . This means that adding constraint (2b) to the problem instead of constraint (35a) increases the shado w price by a factor of D max i F i ( t ) . Increasing the shado w price for a constraint makes it looser . As a result, in man y 20 Algorithm 3 Resource allocation algorithm 1: Obtain D max( t ) i ,  i ( t ) ∀ i ∈ H ∪ M using PDI algorithm (Algorithm 2). 2: Find F i ( t ) , ∀ i ∈ H ∪ M , using (21) 3: Initialize λ 4: while con vergence condition is not satisfied do 5: Find p ij , ∀ i ∈ H ∪ M , j ∈ K , using the updated λ (40) 6: for each RB j ∈ K , find k ( j ) by searching ov er all users i ∈ H ∪ M using (41) and then assign ρ ∗ ij and p ∗ ij for all i ∈ H ∪ M , j ∈ K , 7: Use the ellipsoid method to find λ 8: end while time slots, constraint (35a) will not be a tight constraint and the Lagrange multiplier will be set to λ i =  λ 0 i − D max i F i ( t )  + . the Lagrange dual function is g ( λ ) = min ρ ( t ) , P ( t ) L a ( P , ρ , λ ) . (37) The minimization problem (37) can be decomposed to K subproblems. g 0 j ( λ ) can be written as g 0 j ( λ ) = min P ( t ) V X i p i,j − X i ∈M∪H  λ i + D max i F i ( t )  W log 2 (1 + K h i,j p i,j σ 2 )  , (38) where D is a set of feasible p ij s in which for RB j , there is only one i that p ij 6 = 0 . Hence, g ( λ ) is g ( λ ) = X j g 0 j ( λ ) + X i ∈M∪H  λ i + D max i F i ( t )  a i ( t )  . (39) If λ is fixed, g 0 j ( λ ) is a con vex function of P . Therefore, P is found by taking a deri vate with respect to p ij and setting it to zero. This results in p ij = h  λ i + D max i F i ( t )  W V log 2 − σ 2 K h ij i + . (40) The optimal RB allocation for RB j is k ( j ) , and can be written as k ( j ) = argmin i V X i p i,j − X i ∈M∪H  λ i + D max i F i ( t )  W log 2 (1 + K h i,j p i,j σ 2 )  , (41) g 0 j ( λ ) = min i V X i p i,j − X i ∈M∪H  λ i + D max i F i ( t )  W log 2 (1 + K h i,j p i,j σ 2 )  . (42) Thus, ρ ∗ ij and p ∗ ij will be gi ven by: ρ ∗ ij =      1 , i = k ( j ) , 0 , otherwise . p ∗ ij =      p ij , i = k ( j ) , 0 , otherwise . (43) Hence, the optimal rate becomes r ∗ i = P j W log 2 (1 + K h i,j p ∗ i,j σ 2 ) . The only parameter that af fects this joint RB and power allocation is λ . As the number of RBs increases, the duality gap in this problem approaches zero [44]. W e know that the optimal v alue is found by maximization 21 of g ( λ ) with respect to λ . In order to find λ , we use the ellipsoid method [45], and to do so, we have to find the sub-gradient for the dual objectiv e g ( λ ) . The follo wing theorem will show that the subgradient for (36) is a vector with elements ζ i = a i − r i . Theorem 3. The subgradient of the dual optimization problem with dual objectiv e defined in (39), is the vector d whose elements ζ i , ∀ i ∈ H ∪ M are gi ven by: ζ i =      a i − r ∗ i , a i ≥ r ∗ i , 0 , a i < r ∗ i . (44) Pr oof: Since g ( λ ) = min P ,ρ L a ( P , ρ , λ ) = L a ( P ∗ , ρ ∗ , λ ) , (45) we hav e: g ( δ ) ≤L a ( P ∗ , ρ ∗ , δ ) = V X i,j p ∗ i,j ( t ) − X i ∈M∪H  δ i + D max i F i ( t )  r ∗ i ( t ) − a i ( t )  = V X i,j p ∗ i,j ( t ) − X i ∈M∪H  λ i + D max i F i ( t )  r ∗ i ( t ) − a i ( t )  + ( λ i − δ i )  r ∗ i ( t ) − a i ( t )  = g ( λ ) + ( λ − δ ) T ζ 0 , (46) where ζ 0 = h r ∗ 1 − a 1 · · · r ∗ N + M − a N + M i T . Ho wev er , because of the term D max i F i ( t ) , when λ i = 0 and a i < r ∗ i , the direction of ζ 0 will be infeasible. Using the projected subgradient method [46], we can transform this infeasible direction to a feasible one. The update rule for projected subgradient is: λ ( k +1) = Π( λ ( k ) − α k ζ 0 k ) where α k is the step size and Π is the Euclidan projection on the feasible set. Since the feasible set is λ i > 0 , we can see that Π( λ ( k ) − α k ζ 0 k ) = λ ( k ) − α k ζ k , (47) where: ζ i =      ζ 0 i , ζ 0 i ≥ 0 0 , ζ 0 i < 0 =      a i − r ∗ i , a i ≥ r ∗ i , 0 , a i < r ∗ i . (48) Algorithm 3 summarizes our proposed resource allocation algorithm. 22 B. Complexity Analysis Next, we find the complexity of our algorithm which needs to be run in each iteration. There are K RBs in our problem, for each of which (41) needs to be e valuated for M + N users. It takes O  ( M + N ) K  times to solve a primal problem. Subsequently , the dual problem will be solv ed, which gi ves us the optimal v alue of λ in an M + N dimensional space and has a complexity of O  ( M + N ) 2  . Therefore, the overall complexity should be O  ( M + N ) 3 K  . Ho wev er , as mentioned before, adding D max i F i ( t ) to the Lagrange multiplier sets a major part of it to zero, and as a result, the order of complexity will decrease to O  ( M + N ) K  . Giv en the low-comple xity of the proposed algorithm, in practice, it can be easily run periodically by the network at each time slot t , so as to ef fecti vely adapt to dynamic, time-varying changes in both the human delay perceptions of the users and the wireless channel. I V . S I M U L A T I O N R E S U LT S A N D A N A L Y S I S For our simulations, we consider the dataset in [33] to model the delay perception of a human user . In [33], the authors conducted human subject studies using 30 human users, where each subject is asked to rate the quality of 5 movies while the delay and packet loss in the system is being increased. W e used the a verage score of each human user to estimate their delay perception. In [33], The highest delay that the test subject is not able to sense is considered as the delay perception of this subject, and, this, matches our definition of delay perception. W e also used a v ariation of the bootstrap method [38] to increase the number of data points to 1000. W e can see the histogram of the delay perception for these 1000 data points in Fig. 7. T o the best of our kno wledge, no dataset which includes features for each human user as well as human delay perception currently exists. Hence, we attrib ute three continuous features to each user . The process of adding features starts by clustering the delay perceptions β i ( t ) for 1000 users. Then, we choose a random mean v ector and a random positiv e semidefinite co v ariance matrix for each cluster and use them to create multi variate random Gaussian features for each data in the cluster . In consequence the random features hav e: 1) a GMM structure and 2) a predicti ve ability for β i ( t ) . Hence, each user is associated with a vector w ∈ R 4 . W e consider a network with a bandwidth of 10 MHz, a i ( t ) = 1 Mbps, σ 2 = − 173 . 9 dBm, and  = 0 . 05 . W e use a circular cell with the cell radius of 1 . 5 km. W e set the path loss exponent to 3 (urban area) and the carrier frequency to 900 MHz. The packet length is an e xponential random v ariable with an av erage size of 10 kbits. W e use 5 MTD and 5 UE in the system 23 1 2 3 4 5 6 7 8 9 10 Number of clusters 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 Within cluster point scatter 10 8 Figure 6: W ithin point scatter for the EM clustering method on the datasest. 20 40 60 80 100 120 140 160 i 0 50 100 150 200 250 300 Number of users Figure 7: Distrib ution of β i ( t ) for the 1000 users in dataset. and we set D max i to 20 ms for them, unless otherwise mentioned. For the brain aware users, we arbitrarily select 5 UE in the system out of all data points. The brain-unaware case is QoS-aw are and power -aw are. In this case, D max i and  are not functions of β i ( t ) . Fig. 6 shows the within cluster point scatter for the EM algorithm in our dataset. This within cluster point scatter for a clustering C is defined as [38]: W ( C ) = 1 2 P n k =1 P c ( i )= k P c ( i 0 )= k d ( x i , x i 0 ) , where d is an arbitrary distance metric. In essence, the within cluster point scatter is a loss function that allo ws the determination of hyper -parameters in the clustering algorithm. The hyper - parameter that we seek to find here is the number of clusters in the dataset. As we can see from Fig. 6, after the number of clusters reaches 5 , increasing the number of clusters does not decrease the within cluster point scatter substantially . Hence, the optimal number of clusters is 5 . This method of model selection known as elbow method allows the algorithm to av oid ov erfitting. Fig. 8 shows the total BS power resulting from the proposed brain-aw are case and from a brain-unaw are case in which UEs hav e a fixed constraint (2b) with D max i between 10 ms to 60 ms. Here, the total po wer is the objectiv e of main optimization problem (2). Fig. 8 shows that, as the latency increases, the total power decreases, because it is easier to satisfy constraint (2b) at higher latencies. Also, at higher delays, being brain-aware will no longer yield substantial gains, since β i ( t ) and D max i become close to each other and learning β i ( t ) cannot sav e resources for the system. In contrast, in Fig. 8, we can see that for stringent low-l atency requirements, the proposed brain-aw are approach yields significant g ains in terms of sa ving power . In particular , for 10 ms delay in (2b), Fig. 8 sho ws that the BS in brain-una ware approach uses 44 % more 24 10 15 20 25 30 35 40 45 50 55 60 Maximum tolerable delay D i max (ms) 0.15 0.2 0.25 0.3 0.35 0.4 Average power (W) Figure 8: A verage po wer usage of the system as function of different latency requirements for the users. 0 5 10 15 20 25 30 0 0.5 1 1.5 2 2.5 3 3.5 Average power (W) Figure 9: A verage po wer usage of the system for different number of MTDs and 5 UEs with D max i = 20 ms. po wer compared to the brain-aw are case. These results stem from the fact that a brain-aware approach can minimize waste of resources and pro vide service to the users more precisely based on their real brain processing po wer . Fig. 9 sho ws a verage BS power for different number of MTDs. As we can see from Fig. 9, the brain-aware approach will alw ays outperform the brain- unaw are approach as the number of MTD increases. For the case of 30 MTD user , the BS in brain-unaw are approach uses 16 % more power compared to the brain-aware case. This is due to f act that brain-aware approach can allocate resources more ef ficiently in case of a shortage in resources. In Fig. 10, we show the av erage power usage of the system when the number of UEs increases from 2 to 30 with D max i set to 20 ms. As the number of users increases, the average po wer consumption of the system will also increase. This is due to the fact that increasing the number of users will decrease the bandwidth per user . Since the delay and rate requirements of each user are still unchanged, the system needs to use more power to compensate for the bandwidth deficiency . From Fig. 10, we can see that, in the case of 30 users, the brain-aw are system is able to sav e 6 . 7 dB ( 78 %) on av erage in the BS power . The brain-aware system can allocate resources based on each user’ s actual requirement instead of the predefined metrics and this leads to this significant saving in the po wer consumption of the BS. In Fig. 11, we sho w the a verage power consumed in the system for different number of virtual reality (VR) users. For the VR simulations, we assumed an arriv al rate of 25.31 Mbps for each user [47] and ha ve used bandwidths of 20 MHz and 40 MHz. W e can see that, the system is 25 0 5 10 15 20 25 30 16 18 20 22 24 26 28 30 32 34 36 Average power (dBm) Figure 10: A verage po wer usage of the system for different number of UEs with D max i = 20 ms. 12 13 14 15 16 17 18 19 20 Number of VR users 40 50 60 70 80 90 100 110 Average power (dBm) Brain-aw are,BW=20MHz Brain-aw are, BW=40MHz Brain-unaw are,BW=20MHz Brain-unaw are, BW=40MHz Figure 11: The effect of number of VR users with the rate of 25.31 Mbps and D max i = 20 ms on the po wer usage of the system. 20 40 60 80 100 120 140 160 180 200 Time slot 0 0.005 0.01 0.015 0.02 0.025 0.03 Total power (W) Figure 12: T ransmit po wer for 4 different users. The delay perception of two of the users is learned. Lo w and high delay perception users hav e delay perception of 26 . 8 ms and 133 . 73 ms, respecti vely . 20 40 60 80 100 120 140 160 180 200 Time slot 0 0.5 1 1.5 2 2.5 3 User rate (Mbps) Figure 13: T ransmission rate for four dif ferent users. The delay perception of two of the users is learned. Lo w and high delay perception users hav e delay perceptions of 26 . 8 ms and 133 . 73 ms, respecti vely . able to sav e power up to 40% and 15% compared to the brain-unaware scenario in the case of 20 MHz, and 40 MHz bandwidth, respectiv ely . Fig. 11 also shows that the proposed approach is able to allocate resources more efficiently when resources are scarce, i.e. in the 20 MHz case. Also, we can see that increasing the bandwidth will decrease the total power usage in the system which is an inherent feature of communication systems. In Fig. 12, Fig. 13, and Fig. 14, we consider the case of 7 UEs and 5 MTDs. T wo UEs are 26 chosen as brain-aware users and their delay perception is learned by the PDI method. One of the brain-aw are UEs has a delay perception of β i ( t ) = 133 . 73 ms, and the other one has β i ( t ) equal to 26 . 8 ms. The system does not learn the delay perception of the 5 remaining UEs and, hence, it allocates resources to them by using a predefined delay requirement (brain-unaware users). As we can see in Fig. 12, the po wer consumption of the first two brain-aware users will be less than that of the brain-una ware users. Moreov er , the power consumption for a user with higher delay perception will be less than that of a user with lo wer delay perception. This shows that the system can successfully allocate resources according to the delay perception of the users. Furthermore, the power consumption related to each user with predetermined delay requirements is dif ferent, due to their dif ferent channel gains. Howe ver , as we will see later , the system is robust to such dif ferences and can guarantee the reliability and rate requirements for users having dif ferent channel gains. In Fig. 13, we sho w the transmission rate for four different users. W e can see that the rate for brain-unaw are users with predetermined delay will con ver ge to 2 . 5 Mbps. This rate will ensure the reliability for these users. Howe ver , the rate of the users with learned delay perception will con ver ge to a smaller rate. This is due to the fact that these users’ actual requirements are kno wn to the system, and the system uses this knowledge to a void unnecessarily w asting resources. Howe ver , as we will see ne xt, this rate reduction does not change the reliability for these users. Fig. 14 sho ws the reliability for the four aforementioned users. As we can see, the reliability of all the users will con ver ge to 95 %, which is the target reliability v alue for the users. W e can see that the system is able to ensure reliability for the users with identified delay perceptions as well as the users with predefined delay requirements. Howe ver , as observed from Fig. 12, the system uses 45 % less power for those users for which the delay perception is learned. Finally , Fig. 15 in vestigates the ef fect of parameter V for the system with 5 MTDs and 5 UEs. W e can see that, as V increases from 1 to 1 . 9 , the con ver gence time decreases from 40 iterations to 15 iterations. Nevertheless, increasing V will make the algorithm unstable, and as we can see, increasing it to 2 . 2 will create an ov ershoot which is 11 % higher than the final v alue. Hence, parameter V , if adjusted correctly , can create a balance between stability and con ver gence rate of our algorithm. 27 10 20 30 40 50 60 70 80 90 100 Time slot 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Reliability Figure 14: Reliability for 4 dif ferent users. The delay perception of two of the users is learned. 5 10 15 20 25 30 35 40 45 50 Time slot 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 Total power (W) V =1 V =1.3 V =1.6 V =1.9 V =2.2 Figure 15: Ef fect of balancing parameter V in (30a) on the con ver gence of the resource allocation algorithm. V . C O N C L U S I O N In this paper , we ha ve introduced and formulated the notion of delay perception of a human brain, in wireless networks with humans-in-the-loop. Using this notion, we ha ve defined the concept of ef fectiv e delay of human brain. T o quantify this effecti ve delay , we hav e de veloped a learning method, named PDI, which consists of an unsupervised and supervised learning part. W e hav e then shown that PDI can predict the effecti ve delay for the human users and find the reliability of this prediction. Then, we hav e deriv ed a closed-form relationship between the reliability measure and wireless physical layer metrics. Next, using this relationship and the PDI method, we ha ve proposed a novel approach based on L yapunov optimization for allocating radio resources to human users while considering the reliability of both machine type devices and human users. Our results ha ve sho wn that the proposed brain-aware approach can sa ve a significant amount of power in the system, particularly for lo w-latency applications and congested networks. T o our best kno wledge, this is the first study on the ef fect of human brain limitations in wireless network design. This paper only scratched the surface of an emerging research area that admits se veral future e xtensions. On the one hand, we can extend the studied framework to accommodate other brain-related features beyond the mode of the brain. Examples of such features include perceptual memory and consistency constraints. On the other hand, we can de velop recurrent neural network models to capture how the sequence in the brain mode can dynamically change. Finally , another important future work is to conduct real-world experiments with actual users to gather empirical date on brain behavior so as to refine the dev eloped solution. 28 A P P E N D I X A P R O O F O F T H E O R E M 1 W e assume that a single brain mode is dominant for each user at each time. W e index this single mode as k . For each user i with this dominant mode, w i = [ w 1 , · · · , w d +1 ] has the follo wing probability density function: p ( w i ) = | 2 π Σ k | − 1 2 exp h − 1 2 ( w i − µ k ) T Σ − 1 k ( w i − µ k ) i . (49) W e w ant to find the smallest region D in R d +1 , in which the delay perception lies with probability γ , i.e., Z · · · Z D p ( w 1 , w 2 , . . . , w d +1 ) dw 1 · · · dw d +1 = γ . (50) D is not a unique region. Howe ver , the objectiv e is to find the smallest region. T o this end, we need to find the region where f ( w 1 , w 2 , . . . , w d +1 ) has the greatest value, i.e., if Z · · · Z D 1 p ( w 1 , w 2 , . . . , w d +1 ) dw 1 · · · dw n = Z · · · Z D 2 p ( y 1 , y 2 , . . . , y n ) dy 1 · · · dy n , (51) and also p ( y 1 , y 2 , . . . , y d +1 ) ≤ p ( w 1 , w 2 , . . . , w d +1 ) ∀ y ∈ D 2 , ∀ w i ∈ D 1 , (52) then Z · · · Z D 1 dw 1 · · · dw d +1 ≤ Z · · · Z D 2 dy 1 · · · dy d +1 , (53) which implies that the volume of the re gion D 1 is smaller than the volume of D 2 . Hence, if we find the region D for which (50) holds, and, using (52), sho w that all other regions for which (50) holds have greater v olumes, then, we would ha ve found the smallest re gion D , in which the human beha vior will stay with the probability γ . Since w i is distributed according to a multiv ariate Gaussian, we can find the region where it has the highest probability density , i.e.,  w i | p ( w i ) > C 1  . This region can be written as:  w i     | 2 π Σ k | − 1 2 exp h − 1 2 ( w i − µ k ) T Σ − 1 k ( w i − µ k ) i > C 1  , (54) which is equi v alent to D = n w i    ( w i − µ k ) T Σ − 1 k ( w i − µ k ) < C 2 o , (55) where C 2 is a positi ve constant and equals − ln | 2 π Σ k | 1 2 C 1 . Since Σ k is a positi ve definite matrix, (55) is the inner volume of an ellipsoid in a d dimensional space. W e no w conjecture that this ellipsoid D is the smallest region, in which the delay perception 29 lies with probability γ , i.e., the probability of w i being in this region is γ . W e use a proof by contradiction to show this. Consider that there exists any other space E which is smaller than D , and the probability of w i being in this region is γ . W e can partition E into two parts A = E ∩ D and E 2 = E ∩ D 0 , where D 0 is the complement of the set D . W e also define D 2 = D ∩ E 0 . W e kno w that Z D p ( w i ) d w i = Z A p ( w i ) d w i + Z D 2 p ( w i ) d w i (56) = Z E p ( w i ) d w i = Z A p ( w i ) d w i + Z E 2 p ( w i ) d w i = γ . (57) Hence, R D 2 f ( w i ) d w i = R E 2 f ( w i ) d w i . Since p ( w i ) < C 1 ≤ p ( y ) ∀ w i ∈ E 2 , y ∈ D 2 , (58) using (51) and (52) we ha ve R E 2 d w i < R D 2 d w i . This means that the set E has a bigger v olume than D , which is a contradiction to our first assumption. This proves that region D is the smallest region in R d +1 that has the probability γ . Next, we find the relation between C 2 and γ . γ can be defined as R D p ( w i ) d w i and can be calculated using chi-square distribution [48]. The region D can be written as D =  w i | ( w i − µ k ) T Σ − 1 k ( w i − µ k ) ≤ Q d +1 ( γ )  , (59) where Q d +1 ( γ ) is the quantile function of the chi-square distribution with d + 1 degrees of freedom. It is defined as Q d +1 ( γ ) = inf n x ∈ R | γ ≤ R x 0 χ 2 d ( u ) du o . Having defined the confidence region based on γ , we now must find the edges of this ellipsoid. W e kno w that the center of this ellipsoid is µ k . W e need to solv e the follo wing optimization problem: min w i or max w i e T j w i , subject to w i ∈ D , (60) where e j is a unit vector in R d +1 , ha ving 1 in its i th element and zero otherwise. Using KKT conditions for solving the above problem, we have: e j + λ Σ − 1 k ( w i − µ k ) = 0 , (61a) ( w i − µ k ) T Σ − 1 k ( w i − µ k ) ≤ Q d +1 ( γ ) , (61b) λ  ( w i − µ k ) T Σ − 1 k ( w i − µ k ) − Q d +1 ( γ )  = 0 , λ ≥ 0 . (61c) The inequality in (61b) is tight. W ith some algebraic manipulation, we ha ve w i − µ k ( j ) = 1 λ Σ e j , 30 and so, 1 λ 2 e T j Σ k Σ − 1 k Σ k e j = Q d +1 ( γ ) . Therefore w i = ± q Q d +1 ( γ ) e T j Σ k e j Σ k e j + µ k , λ = ± r e T j Σ k e j Q d +1 ( γ ) , and e T j w i = ± q Q d +1 ( γ ) e T j Σ k e j + µ k ( j ) . If λ is positi ve, we can find the maximum which is + q Q d +1 ( γ ) e T j Σ k e j + µ k ( j ) , and if λ is negati ve, we can find the minimum which is − q Q d +1 ( γ ) e T j Σ k e j + µ k ( j ) . Here, µ k ( j ) is the j th element of µ k . If we set j = d + 1 , then the delay perception of user j is in the follo wing range: − q Q d +1 ( γ ) e T d +1 Σ k e d +1 < β i ( t ) − µ k ( d + 1) < q Q d +1 ( γ ) e T d +1 Σ k e d +1 , (62) at least with probability γ . Hence, Theorem 1 is prov ed. A P P E N D I X B P R O O F O F T H E O R E M 2 Since the queuing delay is much smaller than the duration of each time slot, we can assume that each packet arriving at a specific time slot will be served at the same time slot. For analyzing the packet delay , we consider a pack et that just arriv es in the system in time slot τ k , and find Pr ( D > D max i ) for this packet. When this packet arriv es, there are m packets in the system. From lemma 2, we know that the serving time will be an e xponential random variable. Since the exponential distribution is memoryless, there is no distinction between a packet already in service and the other packets. Therefore, the waiting time for the packet that has just arrived is the summation of m exponential distributions. Also, the transmission delay for this packet will be another exponential random v ariable. Hence, the delay of a packet which arriv es at time slot τ k while there are m packets in the system can be written as: d ( τ k , m ) = t s + t 1 ( τ k ) + t 2 ( τ k ) + · · · + t m − 1 ( τ k ) + t c ( τ k ) , (63) where t i ( τ k ) is the service time for packet i in the queue, and t c ( τ k ) is the service time for packet already in service. Also, t s is the service time for the pack et that has just arri ved. we seek to find Pr ( d ( τ k , m ) > D max i ) which can be written as Pr  d ( τ k , m ) > D max i  = X m,k Pr ( D > D max i | m, τ k ) Pr ( m, τ k ) = X m,k Pr ( D > D max i | m, τ k ) Pr ( m | τ k ) Pr ( τ k ) . (64) The probability that there are m users in an M/M/1 queue at time slot τ k , i.e. Pr ( m | τ k ) , can be written as (see [49]): Pr ( m | τ k ) =  a i ( τ k ) r i ( τ k )  m  1 − a i ( τ k ) r i ( τ k )  . Since we assumed the time slots hav e 31 equal lengths, the packets arriv e at each time slot with equal probability of Pr ( τ k ) = 1 t , where t is the total number of time slots. The sum of m + 1 identically independent exponential random v ariables with the mean 1 r i ( τ k ) is a gamma random v ariable. Consequently , if the users arriv e at time slot τ k while there are m users in the system at the time of arriv al, the distribution of delay is f D ( φ | m, τ k ) = r i ( τ k ) m +1 Γ( m + 1) φ m e − r i ( τ k ) φ . (65) As a result, we can write the probability of delay exceeding a threshold D max i as Pr ( D > D max i ) = Z ∞ D max i X m,k f D ( φ | m, τ k ) Pr ( m | τ k ) Pr ( τ k ) dφ (66) = Z ∞ D max i 1 t X m,k r i ( τ k ) m +1 m ! φ m e − r i ( τ k ) φ ( a i ( τ k ) r i ( τ k ) ) m (1 − a i ( τ k ) r i ( τ k ) ) dφ (67) = 1 t t X k =1 Z ∞ D max i ( r i ( τ k ) − a i ( τ k )) e − r i ( τ k ) φ ∞ X m =0  φa i ( τ k )  m m ! dφ (68) = 1 t t X k =1 Z ∞ D max i ( r i ( τ k ) − a i ( τ k )) e −  r i ( τ k ) − a i ( τ k )  φ dφ (69) = 1 t t X k =1 e −  r i ( τ k ) − a i ( τ k )  D max i , (70) which proves the theorem. R E F E R E N C E S [1] A. T . Z. Kasgari, W . Saad, and M. Debbah, “Brain-aware wireless networks: Learning and resource management, ” in Pr oc. 51th Asilomar Conference on Signals, Systems and Computers, P acific Gr ove, CA, USA , Nov 2017. [2] W . Saad, M. Bennis, and M. Chen, “ A vision of 6G wireless systems: Applications, trends, technologies, and open research problems, ” arXiv preprint , 2019. [3] E. Gobbetti and R. Scateni, “V irtual reality: Past, present, and future, ” V irtual en vir onments in clinical psychology and neur oscience: Methods and techniques in advanced patient-therapist interaction , Nov 1998. [4] M. Chen, W . Saad, and C. Y in, “V irtual reality over wireless networks: Quality-of-service model and learning-based resource management, ” IEEE T ransactions on Communications , vol. 66, no. 11, pp. 5621–5635, Nov 2018. [5] O. Semiari, W . Saad, S. V alentin, M. Bennis, and H. V . Poor , “Context-aw are small cell networks: How social metrics improv e wireless resource allocation, ” IEEE T ransactions on W ir eless Communications , vol. 14, no. 11, pp. 5927–5940, Nov 2015. [6] H. Intraub, “Rapid conceptual identification of sequentially presented pictures, ” Journal of Experimental Psychology: Human P erception and P erformance , vol. 7, no. 3, pp. 604–610, 1981. [7] K. Ur Rehman Laghari, R. Gupta, S. Arndt, J. N. Antons, R. Schleicher , S. M ¨ oller , and T . H. Falk, “Neurophysiological experimental facility for quality of experience (QoE) assessment, ” in Pr oc. of IFIP/IEEE International Symposium on Inte grated Network Management (IM), Ghent, Belgium , May 2013, pp. 1300–1305. [8] I. W echsung and K. De Moor, “Quality of experience versus user experience, ” in Quality of Experience: Advanced Concepts, Applications and Methods , S. M ¨ oller and A. Raake, Eds. Springer International Publishing, 2014, pp. 35–54. [9] T . Zhao, Q. Liu, and C. W . Chen, “QoE in video transmission: A user experience-dri ven strategy , ” IEEE Communications Surve ys & T utorials , v ol. 19, no. 1, pp. 285–302, First quarter 2017. [10] Y . Chen, K. W u, and Q. Zhang, “From QoS to QoE: A tutorial on video quality assessment, ” IEEE Communications Surve ys & T utorials , v ol. 17, no. 2, pp. 1126–1165, Second quarter 2015. [11] A. Rahmati, X. He, I. Guvenc, and H. Dai, “Dynamic mobility-aw are interference av oidance for aerial base stations in cognitiv e radio networks, ” arXiv pr eprint arXiv:1901.02613 , 2019. 32 [12] M. Mozaf fari, A. T aleb Zadeh Kasgari, W . Saad, M. Bennis, and M. Debbah, “Beyond 5G with uavs: Foundations of a 3D wireless cellular network, ” IEEE T ransactions on W ir eless Communications , vol. 18, no. 1, pp. 357–372, Jan 2019. [13] Z. Zhou, H. Y u, C. Xu, Y . Zhang, S. Mumtaz, and J. Rodriguez, “Dependable content distribution in D2D-based cooperative vehicular networks: A big data-integrated coalition game approach, ” IEEE T ransactions on Intelligent T ransportation Systems , v ol. 19, no. 3, pp. 953–964, March 2018. [14] A. Ferdowsi, U. Challita, W . Saad, and N. B. Mandayam, “Rob ust deep reinforcement learning for security and safety in autonomous vehicle systems, ” Pr oc. of International Confer ence on Intelligent T ransportation Systems, Maui, HI, USA , Nov . 2018. [15] Z. Zhou, C. Gao, C. Xu, Y . Zhang, S. Mumtaz, and J. Rodriguez, “Social big-data-based content dissemination in internet of v ehicles, ” IEEE T ransactions on Industrial Informatics , vol. 14, no. 2, pp. 768–777, Feb 2018. [16] Z. Zhou, H. Liao, B. Gu, K. M. S. Huq, S. Mumtaz, and J. Rodriguez, “Robust mobile crowd sensing: When deep learning meets edge computing, ” IEEE Network , vol. 32, no. 4, pp. 54–60, July 2018. [17] M. Alam, D. Y ang, K. Huq, F . Saghezchi, S. Mumtaz, and J. Rodriguez, “T ow ards 5G: Context aware resource allocation for ener gy saving, ” Jour . of Signal Pr oc. Systems , v ol. 83, no. 2, pp. 279–291, May 2016. [18] Y . Lin, R. Zhang, C. Li, L. Y ang, and L. Hanzo, “Graph-based joint user-centric overlapped clustering and resource allocation in ultradense networks, ” IEEE T ransactions on V ehicular T echnology , vol. 67, no. 5, pp. 4440–4453, May 2018. [19] J. Zhao, Y . Liu, K. K. Chai, M. Elkashlan, and Y . Chen, “Matching with peer effects for context-a ware resource allocation in D2D communications, ” IEEE Communications Letters , vol. 21, no. 4, pp. 837–840, April 2017. [20] M. Zalghout, S. Abdul-Nabi, A. Khalil, M. Helard, and M. Crussiere, “Optimizing context-a ware resource and network assignment in heterogeneous wireless networks, ” in Proc. of IEEE W ir eless Communications and Networking Conference (WCNC), San F rancisco, CA, USA , March 2017, pp. 1–6. [21] E. Bastug, M. Bennis, and M. Debbah, “Living on the edge: The role of proactiv e caching in 5G wireless networks, ” IEEE Communications Magazine , vol. 52, no. 8, pp. 82–89, Aug 2014. [22] P . Makris, D. N. Skoutas, and C. Skianis, “ A surve y on context-aware mobile and wireless networking: On networking and computing en vironments’ integration, ” IEEE Communications Surveys & T utorials , v ol. 15, no. 1, pp. 362–386, First 2013. [23] Y . Nijsure, Y . Chen, C. Y uen, and Y . H. Che w , “Location-aware spectrum and power allocation in joint cognitiv e communication-radar networks, ” in 2011 6th International ICST Confer ence on Cognitive Radio Oriented W ireless Networks and Communications (CRO WNCOM) , June 2011, pp. 171–175. [24] M. Proebster, M. Kaschub, and S. V alentin, “Context-aw are resource allocation to improve the quality of service of heterogeneous traf fic, ” in Proc. of IEEE International Confer ence on Communications , Kyoto, Japan, Jul. 2011. [25] C. Perera, A. Zaslavsky , P . Christen, and D. Georgak opoulos, “Context aware computing for the Internet of Things: A surve y , ” IEEE Communications Surve ys & Tutorials , vol. 16, no. 1, pp. 414–454, First quarter 2014. [26] L. Huang, “System intelligence: Model, bounds and algorithms, ” in Pr oc. of the 17th ACM International Symposium on Mobile Ad Hoc Networking and Computing . New Y ork, NY , USA: A CM, 2016, pp. 171–180. [27] H. Modares, I. Ranatunga, F . L. Le wis, and D. O. Popa, “Optimized assistiv e humanrobot interaction using reinforcement learning, ” IEEE T ransactions on Cybernetics , vol. 46, no. 3, pp. 655–667, March 2016. [28] B. Qian, X. W ang, N. Cao, Y . Jiang, and I. Da vidson, “Learning multiple relati ve attributes with humans in the loop, ” IEEE T ransactions on Image Pr ocessing , vol. 23, no. 12, pp. 5573–5585, Dec 2014. [29] Q. Wu and R. Zhang, “Delay-constrained throughput maximization in U A V-enabled OFDM systems, ” in Pr oc. of 23r d Asia-P acific Conference on Communications (APCC), P erth, W A, Austr alia , Dec 2017, pp. 1–6. [30] C. Bishop, P attern Recognition and Machine Learning . Ne w Y ork, NY , USA: Springer Information Science and Statistics, 2007. [31] M. Chen, U. Challita, W . Saad, C. Y in, and M. Debbah, “Machine learning for wireless networks with artificial intelligence: A tutorial on neural networks, ” arXiv preprint , Oct. 2017. [32] S. Petkoski, A. Spiegler , T . Proix, and V . Jirsa, “Ef fects of multimodal distribution of delays in brain network dynamics, ” BMC Neur oscience , vol. 16, no. Suppl 1, p. P109, 2015. [33] Y . Y ang, L. T . Park, N. B. Mandayam, I. Seskar , A. L. Glass, and N. Sinha, “Prospect pricing in cogniti ve radio networks, ” IEEE T ransactions on Cognitive Communications and Networking , vol. 1, no. 1, pp. 56–70, Oct. 2015. [34] M. G. Baydogan and G. Runger , “Learning a symbolic representation for multi variate time series classification, ” Data Mining and Knowledge Discovery , vol. 29, no. 2, pp. 400–422, Mar 2015. [Online]. A vailable: https://doi.org/10.1007/s10618- 014- 0349- y [35] T . K. Moon, “The expectation-maximization algorithm, ” IEEE Signal Processing Magazine , vol. 13, no. 6, pp. 47–60, Nov 1996. [36] A. Jaimes and N. Sebe, “Multimodal human–computer interaction: A surve y , ” Computer vision and image understanding , vol. 108, no. 1-2, pp. 116–134, Oct. 2007. [37] G. Mclachlan and T . Krishnan, “The EM algorithm and extensions, ” vol. 382, 03 1998. [38] J. Friedman, T . Hastie, and R. Tibshirani, The elements of statistical learning . Springer series in statistics Ne w Y ork, 2001, v ol. 1, no. 10. [39] R. Hari, S. Levnen, and T . Raij, “Timing of human cortical functions during cognition: role of meg, ” T r ends in Cognitive Sciences , v ol. 4, no. 12, pp. 455 – 462, 2000. 33 [40] M. D. F ox and M. E. Raichle, “Spontaneous fluctuations in brain activity observed with functional magnetic resonance imaging, ” Natur e revie ws neur oscience , vol. 8, no. 9, p. 700, 2007. [41] H. Laufs, K. Krakow , P . Sterzer , E. Eger , A. Beyerle, A. Salek-Haddadi, and A. Kleinschmidt, “Electroencephalographic signatures of attentional and cognitive default modes in spontaneous brain acti vity fluctuations at rest, ” Pr oceedings of the national academy of sciences , v ol. 100, no. 19, pp. 11 053–11 058, 2003. [42] M. J. Neely , “Stochastic network optimization with application to communication and queueing systems, ” Synthesis Lectur es on Communication Networks , vol. 3, no. 1, pp. 1–211, 2010. [43] M. J. Neely , E. Modiano, and C. P . Li, “Fairness and optimal stochastic control for heterogeneous networks, ” IEEE/ACM T ransactions on Networking , vol. 16, no. 2, pp. 396–409, April 2008. [44] K. Seong, M. Mohseni, and J. M. Cioffi, “Optimal resource allocation for ofdma do wnlink systems, ” in Pr oc. of IEEE International Symposium on Information Theory , Seattle, W A, USA , July 2006, pp. 1394–1398. [45] W . Y u and R. Lui, “Dual methods for noncon vex spectrum optimization of multicarrier systems, ” IEEE T ransactions on Communications , v ol. 54, no. 7, pp. 1310–1322, July 2006. [46] S. Boyd and A. Mutapcic, “Subgradient methods, ” Lecture notes of EE364b, Stanfor d University , W inter Quarter , vol. 2007, 2006. [47] M. Chen, W . Saad, and C. Y in, “Resource management for wireless virtual reality: Machine learning meets multi-attrib ute utility , ” in In Pr oc. of IEEE Global Communications Confer ence (Globecom), Singapor e , Dec 2017, pp. 1–7. [48] J. Berger , “ A rob ust generalized Bayes estimator and confidence region for a multiv ariate normal mean, ” The Annals of Statistics , pp. 716–761, 1980. [49] A. Papoulis and S. U. Pillai, Pr obability , random variables, and stochastic pr ocesses . T ata McGraw-Hill Education, 2002.

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment