Self-Organizing mmWave Networks : A Power Allocation Scheme Based on Machine Learning
Millimeter-wave (mmWave) communication is anticipated to provide significant throughout gains in urban scenarios. To this end, network densification is a necessity to meet the high traffic volume generated by smart phones, tablets, and sensory device…
Authors: Roohollah Amiri, Hani Mehrpouyan
1 Self-Or ganizing mmW a v e Networks : A Po wer Allocation Scheme Based on Machine Learning Roohollah Amiri ∗ , Hani Mehrpouyan ∗ ∗ School of Electrical and Computer Engineering, Boise State University , { r oohollahamiri,hanimehrpouyan } @boisestate.edu Abstract —Millimeter -wave (mmW a ve) communication is antic- ipated to provide significant throughout gains in urban scenarios. T o this end, network densification is a necessity to meet the high traffic volume generated by smart phones, tablets, and sensory devices while overcoming large pathloss and high blockages at mmW aves frequencies. These denser networks are created with users deploying small mmW ave base stations (BSs) in a plug- and-play fashion. Although, this deployment method pro vides the requir ed density , the amorphous deployment of BSs needs distributed management. T o address this difficulty , we propose a self-organizing method to allocate power to mmW av e BSs in an ultra dense network. The proposed method consists of two parts: clustering using fast local clustering and power allocation via Q- learning . The important features of the proposed method are its scalability and self-organizing capabilities, which are both important features of 5G. Our simulations demonstrate that the introduced method, provides requir ed quality of service (QoS) for all the users independent of the size of the network. I . I N T RO D U C T I O N Millimeter-w ave (mmW ave) communication is one of the main technologies of the next generation of cellular networks (5G). The large bandwidth at mmW av e frequency has the potential to enhance network throughput by tenfolds [1]. Howe ver , large path loss and shadowing limit the performance of mmW av e systems and need to be dealt with. One approach to overcome this problem is based on increasing the density of access points [2], [3]. Ho wev er, as the number of access points increases, the complexity of network management increases. Keeping this in mind, one of the features of future mmW ave base stations (BSs) is self-deployment by users. In other words access points can be deployed in a plug-and-play fashion, and the network architecture may change frequently . Considering the above points, 5G needs self-organizing methods to con- figure, adapt, or heal itself when necessary . In this paper , a self-organizing algorithm is proposed to maximize the sum capacity in a dense mmW ave network while providing users with their required quality of service (QoS). The algorithm consists of clustering, based on fast local clustering (FLOC), and distributed power allocation, via Q- learning . Scalability and fast conv ergence of FLOC, adaptability and distributed nature of Q- learning , makes their combination a suitable tool to achieve self-organization in a dense network. I I . S Y S T E M M O D E L The System model considers a dense outdoor urban sce- nario as an important example of 5G, i.e., we consider the downlink of densely deployed mmW a ve BSs. T o this end let us consider N mmW av e BSs that are distributed based on the homogeneous spatial Poisson point process (SPPP) with density λ B S [4]. Each BS is associated to one user . BSs share a single frequency resource block (FRB) to support their associated users. W e assume a time in variant channel model, i.e. slow fading. The channel vector between the BS i and user k , can be written as follows H i , k = L i , k − 1 × g i , k , (1) where L i , k and g i , k denote the path loss and the path gain between the BS i and user k . The path loss between the BS i and its associated user i , L i , i , follows the free space propagation based on Frii’ s law [1]. Here, we consider that the majority of interferers hav e non-line-of-sight (NLOS) paths [5]. Hence, the path loss L i , k ( i , k ) can be written as [1] L i , k [ d B ] = β 1 + 10 β 2 log 10 d i , k + X ζ , (2) where β 1 and β 2 are factors used to achiev e best fit to channel measurements, d i , k is the distance between the BS i and the user k , X ζ denotes the logarithmic shadowing factor , where X ζ ∼ N 0 , ζ 2 , and ζ 2 denotes the lognormal shadowing variance. The receiv ed signal in the downlink at the k th user includes the desired signal from its associated BS (BS k ), interference from neighboring BSs, and also thermal noise. Hence, the signal-to-interference-noise-ratio (SINR) at the k th user is giv en by SINR k = P k H k , k Í i ∈ D k , i , k P i H i , k + σ 2 , (3) where P k denotes the power transmitted by the k th BS, D k is the set of interfering BSs, and σ 2 denotes the variance of the additiv e white Gaussian noise. Accordingly , the normalized capacity at the k th user is giv en by C k = log 2 ( 1 + SINR k ) . (4) I I I . P RO B L E M F O R M U L A T I O N The goal of the optimization problem is to find the best power distribution between mmW a ve BSs ( ¯ P ) in order to maximize the sum capacity of the network, while supporting all users with their required QoS. The optimization problem ( P 1 ) can be formulated as maximize ¯ P N Õ k = 1 log 2 ( 1 + SINR k ) (5a) subject to P k ≤ P m a x , k = 1 , . . . , N (5b) SINR k ≥ q k , k = 1 , . .. , N . (5c) 2 Here, the objecti ve (5a) is to maximize the sum capacity of the network while providing all users with their required QoS in (5c). The first constraint, (5b), refers to the po wer limitation of ev ery BS. The term q k in (5c) refers to the minimum required SINR for the k th user . Eq. (5a) contains the interference term in the denominator of SINR term. In a dense network the interference term cannot be ignored [6]. Due to the presence of the interference term, the objective function (5a) is a non-concav e function [7]. The solution to P 1 should have certain features. First, it should be distributed due to no central authority in this network. Second, the range of mmW a ve BSs is limited, so each user will receive interference from the BSs in its neighborhood. Therefore, the solution should consider local clustering to reduce the computation overhead. Third feature is self-healing. The number of BSs in the network changes sporadically , which means the solution should be adaptive to new possible architectures. Considering the abov e, in this paper , we propose a method which contains two parts : a fast local clustering method to locally cluster the BSs, and in each cluster , BSs will choose their transmitting power based on Q- learning [8]. Q- learning is model-free (adaptable) and giv es the BSs the ability to learn from their environment by interacting with it (self-organization). I V . C L U S T E R B A S E D D I S T R I B U T E D P OW E R A L L O C A T I O N U S I N G Q - L E A R N I N G ( C D P - Q ) In our proposed method, mmW a ve BSs are considered as the agents of Q- learning , so the terms agent and mmW ave BS are used interchangeably . CDP-Q is a distributed method in which multiple agents (mmW ave BSs) find a sub-optimal policy (power allocation) to maximize the network capacity . CDP-Q consists of two parts: (1) clustering, and (2) po wer allocation. Clustering is based on a local clustering method, and power allocation is based on Q- learning . In the following each part is detailed. A. MmW ave BSs Clustering Since mmW ave signals suffer from high pathloss and shadowing, only neighboring BSs that are close in distance interfere with each other . Consequently , we propose to use a clustering mechanism to divide BSs into clusters in which the interference of one cluster is negligible on other clusters’ users. In this paper , we propose to use Fast local clustering (FLOC) [9] to divide mmW ave BSs into clusters. FLOC is a distributed message-passing clustering method with O ( 1 ) complexity , which guarantees scalability , and produces non- ov erlapping clusters. Another feature of FLOC is local self- healing, which means re-clustering, due to addition of a new node or removing a node, does not propagate through all clusters. In order to apply FLOC in a mmW ave network, the following concepts are defined: • Cluster head (CH) : The mmW ave BS that is chosen as the head of the cluster . In our algorithm, there is no priority between a cluster head and other members of the cluster . • In-bound (IB), and out-band (OB) node : In FLOC, a node is in-bound if it is a unit distance from a CH. A unit distance is a set value, which in this case is the range of mmW av e links, i.e., 100 - 200 m [1]. Accordingly , we define in-bound as 100 m, which is an indication of strong interference, and out-band as 200 m, which indicates the edge of the cluster around a CH. Finally , if a node j is in out-band distance of a cluster i , and not in an in-bound distance of any other clusters, then node j will join the cluster i as an OB node. B. Distributed P ower Allocation Using Q-Learning The output of Q- learning is a decision policy (power allo- cation) which is represented as a function called Q-function. Here, the Q-function of agent k is represented as a table called a Q-table ( Q k ). The columns of a Q-table are the actions ( a k ), and the rows are the state ( s k ) of the agent k . In multi-agent Q- learning , agents can act independently or cooperati vely . In the independent learning, each agent interacts with the environment without communicating with other agents. In fact, it considers the other agents as part of the en vironment. Independent learning has shown good per- formance in many applications [10]. In independent learning, since the en vironment is not stationary , oscillation and longer con ver gence time might happen for the agents, but due to no communication overhead between agents compared to cooper- ativ e learning, we choose independent learning. Motiv ated by this fact, the agents will select their actions according to [11] a k t = arg max a Q k s k t , a , (6) in which, subscript t denotes time step t of Q- learning . The CDP-Q algorithm is presented in Algorithm 1. Algorithm 1 The proposed CDP-Q algorithm 1: Cluster formation based on Sec. IV -A 2: for all Clusters in Parallel do 3: for all Agents do 4: Initialize Q ( s k t , a t ) arbitrarily 5: Initialize s k t 6: for all episodes do 7: send Q k s k t , : to other agents of the cluster 8: receiv e Q j s j t , : , j ∈ D k , j , k 9: Choose a k t according to Eq. 6 10: T ake action a k t , observe R k t 11: Q k ( s k t , a k t ) ← ( 1 − α ) Q k ( s k t , a k t ) + α ( R k t + γ Q k ( s k t + 1 , a k t )) 12: end for 13: end for 14: end for In the following the actions, states, and the reward function of the proposed Q- learning method are defined. 1) Actions: The set of actions (powers) A is defined as A = a 1 , a 2 , . . . , a N power , which uniformly covers the range between minimum ( a 1 = P mi n ) and maximum ( a N power = P m a x ) power . 2) States: W e define N r equally spaced concentric circles around the cluster head (CH) of each cluster . These circles, define N r rings with r units of spacing, around the CH. The state of the agent k at time step t is defined as s k t = ( n ) which shows the ring number that the agent belongs to. Considering 3 the definition of the Q-table and the states at the beginning of this section, if the agents’ location is fixed, each agent will choose just one row of its Q-table to search for the best action decision. 3) Rewar d: R k t is the immediate rew ard incurred due to selection of the action a k t at state s k t by the agent k at time step t . The constraint in (5c), can be represented as: C k t ≥ log 2 ( q k ) , for k = 1 , . . . , N . C k t is the normalized capacity of agent k at time step t . Based on this, the normalized proposed reward function for the agent k at time step t is defined as R k t = 1 2 log 2 ( q k ) | {z } ( a ) © « C k t | {z } ( b ) − C k t − 2 log 2 ( q k ) | {z } ( c ) ª ® ® ® ® ¬ . (7) The rationale behind the proposed rew ard function is as follows • The term ( a ) normalizes the v alue of rew ard function. • The objecti ve of the optimization problem is to maximize the capacity of the network, so the term ( b ) results in a higher reward for higher capacity for an agent. • T o satisfy the QoS constraint for agent k , capacity de vi- ation of its associated user from the required QoS, term ( c ), should result in a lower reward. • There is a maximum reward ( + 1 ) for an agent to provide fairness between the agents which is sho wn in Fig. 1. • The proposed reward function is a first order function of C k t , which reduces each iteration’ s complexity . Fig. 1: Proposed rew ard function (RF). V . S I M U L A T I O N R E S U LTS In this section the simulation setup is detailed and then the results of the simulations are presented. A. Simulation Setup A dense mmW av e BS network, with approximately 120 BSs in a 1 k m 2 area is considered. The BSs are distributed based on SPPP and operate independently in the network. Each BS, supports one user equipment (UE), which is located in a radius of 10 m around the BS. The QoS for a user is defined as the required SINR to support the user’ s service. The v alue of q k = 2 . 83 is considered for all the users. T o perform Q- learning , the learning rate is considered as α = 0 . 5 , the discount factor as γ = 0 . 9 , N power = 31 , r = 50 m, and N r = 4 . The maximum number of iterations is set to 50 , 000 . The remaining parameters of the simulation are represented in T able I. T ABLE I: Simulation Parameters P aram. V alue P aram. V alue f 28 GHz P m i n -10 dBm ζ 8.7 dB P m a x 35 dBm β 1 72.0 β 2 2.92 σ 2 -120 dBm B. Clustering Results The implementation of clustering algorithm, is an event driv en, message-passing distributed program in C++. Every BS is simulated as an independent thread, and is added to the network randomly in [ 0 , 10 ] seconds. The clustering algorithm con ver ges in less than 15 seconds for the assumed value for the λ B S . The resulted clusters in two different distribution of BSs are shown in different colors in Fig. 2, and 3. Each cluster head (CH) is marked with a filled color . -500 0 500 x-axis -500 0 500 y-axis Fig. 2: 124 BSs in 1 k m 2 . -500 0 500 x-axis -500 0 500 y-axis Fig. 3: 122 BSs in 1 k m 2 . C. P ower Allocation Results According to [1], [2], the cov erage range of millimeter communication is in the range of 100 − 200 m, which means the maximum cov erage of 0 . 12 ( k m 2 ) for each mmW ave BS. Considering the interference-limited assumption and the value of λ B S = 120 ( BS / km 2 ) , a cluster might have 14 mmW av e BSs. Hence, the CDP-Q algorithm results in clusters that include 2 to 14 BSs. 4 The results of power allocation using the proposed reward function are compared to the exponential reward function proposed in [12], which are presented as EXP-Q in the simulations. For all possible cluster sizes, po wer allocation using the proposed re ward function is simulated, and the normalized capacity of all BSs in the clusters are plotted in Fig. 4. The same simulations for EXP-Q are presented in Fig. 5. 2 4 6 8 10 12 14 Cluster size 0 5 10 15 20 Capacity(b/s/Hz) Fig. 4: Capacity of clusters’ members. 2 4 6 8 10 12 14 Cluster size 0 5 10 15 20 Capacity(b/s/Hz) Fig. 5: Capacity of clusters’ members. As it is sho wn in these figures, while both reward functions satisfy all members for all sizes of clusters with their required QoS, the normalized capacity of users in the CDP-Q are close to each other , while in EXP-Q the normalized capacity of users are much div erse. The diversity of normalized capacity values in EXP-Q effects the fairness inde x. The fairness index in each cluster is measured using Jain’ s fairness index [13] and is shown in Fig. 6. In Fig. 6, the CDP-Q maintains fairness for all sizes of clusters, while EXP-Q fails to support users with fairness in large cluster sizes. On the other hand, total capacity of the clusters are sho wn in Fig. 7 with respect to the cluster size. According to Fig. 7, the CDP-Q provides higher capacity than the EXP-Q for all sizes of clusters. V I . C O N C L U S I O N In this paper, a self-organized distributed power allocation algorithm is presented. The proposed algorithm reduces the optimization complexity by using a distributed clustering method, and provides adaptability in po wer allocation by using Q- learning . The proposed rew ard function, satisfies required QoS for the users in all sizes of the resulted clusters, and outperforms the exponential based reward function. 2 4 6 8 10 12 14 Cluster size 0 0.2 0.4 0.6 0.8 1 1.2 Jain's Index For Fairness EXP-Q CDP-Q Fig. 6: Jain’ s fairness index. 2 4 6 8 10 12 14 Cluster size 0 20 40 60 80 Capacity(b/s/Hz) EXP-Q CDP-Q Fig. 7: Sum capacity of clusters. R E F E R E N C E S [1] S. Rangan, T . S. Rappaport, and E. Erkip, “Millimeter-wa ve cellular wireless networks: Potentials and challenges, ” Pr oceedings of the IEEE , vol. 102, no. 3, pp. 366–385, March 2014. [2] R. Baldemair, T . Irnich, K. Balachandran, E. Dahlman, G. Mildh, Y . Seln, S. Parkvall, M. Meyer, and A. Osseiran, “Ultra-dense networks in millimeter-wav e frequencies, ” IEEE Commun. Mag. , vol. 53, no. 1, pp. 202–208, January 2015. [3] T . Bai and R. W . Heath, “Coverage in dense millimeter wave cellular networks, ” in Asilomar Confer ence on Signals, Systems and Computers , Nov 2013, pp. 2062–2066. [4] D. P . Kroese and Z. Botev , “Spatial process generation, ” Aug 2013. [5] M. Rebato, M. Mezzavilla, S. Rangan, F . Boccardi, and M. Zorzi, “Understanding noise and interference regimes in 5G millimeter-wav e cellular networks, ” in 22th Eur opean W ireless Confer ence , May 2016, pp. 1–5. [6] S. Niknam and B. Natarajan, “On the Regimes in Millimeter wave Networks: Noise-limited or Interference-limited?” CoRR , Apr . 2018. [Online]. A vailable: https://arxiv .org/abs/1804.03618 [7] Z.-Q. Luo and W . Y u, “ An introduction to conv ex optimization for communications and signal processing, ” IEEE J. Select. Areas Commun. , vol. 24, no. 8, pp. 1426–1438, Aug 2006. [8] C. J. C. H. W atkins and P . Dayan, “Q-learning, ” Machine Learning , vol. 8, no. 3, pp. 279–292, 1992. [Online]. A vailable: http: //dx.doi.org/10.1007/BF00992698 [9] M. Demirbas, A. Arora, V . Mittal, and V . Kulathumani, “ A fault-local self-stabilizing clustering service for wireless ad hoc networks, ” IEEE T rans. P arallel Distrib . Syst. , vol. 17, no. 9, pp. 912–922, Sept 2006. [10] L. Panait and S. Luke, “Cooperative multi-agent learning: The state of the art, ” Autonomous Agents and Multi-Agent Systems , vol. 11, no. 3, pp. 387–434, Nov 2005. [Online]. A v ailable: https://doi.org/10.1007/s10458- 005- 2631- 2 [11] R. S. Sutton and A. G. Barto, Intr oduction to Reinforcement Learning , 1st ed. Cambridge, MA, USA: MIT Press, 1998. [12] H. Saad, A. Mohamed, and T . ElBatt, “Distributed cooperative Q- learning for power allocation in cognitive femtocell networks, ” in Proc. IEEE V eh. T echnol. Conf. , Sept 2012, pp. 1–5. [13] R. Jain, D. Chiu, and W . Hawe, “ A quantitativ e measure of fairness and discrimination for resource allocation in shared computer systems, ” CoRR , 1998. [Online]. A vailable: http://arxiv .org/abs/cs.NI/9809099
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment