Dynamic Graph Configuration with Reinforcement Learning for Connected Autonomous Vehicle Trajectories
Traditional traffic optimization solutions assume that the graph structure of road networks is static, missing opportunities for further traffic flow optimization. We are interested in optimizing traffic flows as a new type of graph-based problem, wh…
Authors: Udesh Gunarathna, Hairuo Xie, Egemen Tanin
Dynamic Graph Configuration with Reinforcement Learning for Connected Autonomous V ehicle T rajectories Udesh Gunarathna, Hairuo Xie, Egemen T anin, Shanika Karunasekara, Renata Borovica-Gajic pgunarathna@student.unimelb .edu.au, { xieh, etanin, karus, renata.borovica } @unimelb .edu.au School of Computing and Information Systems University of Melbourne Abstract —T raditional traffic optimization solutions assume that the graph structure of r oad networks is static, missing opportunities for further traffic flo w optimization. W e ar e in- terested in optimizing traffic flows as a new type of graph-based problem, where the graph structure of a road network can adapt to traffic conditions in real time. In particular , we focus on the dynamic configuration of traffic-lane directions, which can help balance the usage of traffic lanes in opposite directions. The rise of connected autonomous vehicles offers an opportunity to apply this type of dynamic traffic optimization at a large scale. The existing techniques for optimizing lane-directions are however not suitable f or dynamic traffic envir onments due to their high computational complexity and the static nature. In this paper , we propose an efficient traffic optimization solution, called Coordinated Learning-based Lane Allocation (CLLA), which is suitable for dynamic configuration of lane- directions. CLLA consists of a tw o-layer multi-agent architectur e, where the bottom-layer agents use a machine learning technique to find a suitable configuration of lane-directions around indi- vidual road intersections. The lane-direction changes proposed by the learning agents are then coordinated at a higher level to reduce the negative impact of the changes on other parts of the road network. Our experimental results show that CLLA can reduce the av erage travel time significantly in congested road networks. W e believe our method is general enough to be applied to other types of networks as well. Index T erms —Graphs, Spatial Database, Reinfor cement Learning I . I N T R O D U C T I O N The goal of traffic optimization is to improv e traffic flows in road networks. Traditional solutions normally assume that the structure of road networks is static reg ardless of ho w traf fic can change at real time [1], [2]. A less-common way to optimize traffic is by performing limited changes to road networks which when is-use are deployed in very small scale. W e focus on dynamic lane-direction changes, which can help balance the usage of traffic lanes in many circumstances, e.g. as soon as when the traffic lanes in one direction become congested while the traffic lanes in the opposite direction are underused [3], [4]. Unfortunately , the existing techniques for optimizing lane- directions are not suitable for dynamic traf fic en vironment at large scale due to their high computational complexity [5]–[7]. W e develop an ef ficient solution for optimizing lane-directions in highly dynamic traffic en vironments. Our solution is based on an algorithm that modifies the property of a road network graph for impro ving traf fic flow in the corresponding road network, introducing a new graph problem. (a) T raffic before lane- direction change (b) T raffic after lane- direction change Figure 1: The impact of lane-direction change on traffic flow . There are 20 vehicles moving in the north-bound direction and 2 vehicles moving in the south-bound direction. The impact of dynamic lane-direction configurations can be shown in the follo wing e xample, where 20 v ehicles are mo ving north-bound and 2 vehicles are moving south-bound (Figure 1) at a certain time. In Figure 1a, there are 4 north-bound lanes and 4 south-bound lanes. Due to the large number of vehicles and the limited number of lanes, the north-bound traffic is highly congested. At the same time, the south-bound vehicles are moving at a high speed as the south-bound lanes are almost empty . Figure 1b shows the dramatic change of traffic flow after lane-direction changes are applied when congestion is observed, where the direction of E, F and G is re versed. The north-bound vehicles are distributed into the additional lanes, resulting in a higher average speed of the vehicles. At the same time, the number of south-bound lanes is reduced to 1. Due to the low number of south-bound vehicles, the av erage speed of south-bound traffic is not affected. The lane-direction change helps improve the overall traffic efficiency in this case. This observation was used by traf fic engineers of certain road segments for many years and applied in a more static way . W e aim to scale this to extreme lev els in time and space. The benefit of dynamic lane-direction changes can also be observed in preliminary tests, where we compare the av erage trav el time of vehicles in two scenarios, one uses dynamic lane-direction configurations, another uses static lane-direction configurations. The dynamic lane-direction configurations are computed with a straightforward solution (Section IV). The result shows that lane-direction changes reduce travel times 100 200 300 5 10 The number of vehicles generated A verage trav el time (min) Dynamic configurations Static configurations Figure 2: The average tra vel time of vehicles when using static and dynamic lane-direction configurations. by 14% on average when the traffic increases (see Figure 2). In traffic engineering terms this is a dramatic reduction. Despite their potential benefit, dynamic lane-direction changes cannot be easily applied to existing traffic systems as they require additional signage and safety features [8]. The emergence of connected autonomous vehicles (CA Vs) [9] howe ver can make dynamic lane-direction changes a common practice in the future. Our previous work sho ws that CA Vs hav e the potential to enable innov ati ve traf fic management solutions [10]. Compared to human-dri ven vehicles, CA Vs are more capable of responding to a giv en command in a timely manner [6]. CA Vs can also provide detailed traffic telemetry data to a central traffic management system in real time. This helps the system to adapt to dynamic traffic conditions. W e formulate lane allocation based on real-time traffic as a new graph problem with the aim to find a new graph G 0 t from a road network ( G ) (i.e., dynamically optimize the graph) such that total tra vel time of all vehicles in the road network is minimized. In order to optimize the flow of the whole network, all the traffic lanes in the network must be considered. In many circumstances, one cannot simply allocate more traffic lanes at a road segment for a specific direction when there is more traffic demand in the direction. This is because a lane-direction change at a road segment can affect not only the flow in both directions at the road segment but also the flow at other road segments. Due to the complexity of the problem, the computation time can be very high with the existing approaches as they aim to find the optimal configurations based on linear programming [5]–[7], and hence are not suitable for frequent recomputation over large networks. T o address the abov e mentioned issues, we propose a light-weight and effecti ve framew ork, called a Coordinated Learning-based Lane Allocation (CLLA) framework, for op- timizing lane-directions in dynamic traffic en vironments. The CLLA approach finds the configurations that effecti vely im- prov e the traffic ef ficiency of the whole network, while keeping the computation cost of the solution lo w . The key idea is that traffic optimization can be decoupled into two processes: i) a local process that proposes lane-direction changes based on local traffic conditions around road intersections, and b) a global process that ev aluates the proposed lane-direction Figure 3: The hierarchical architecture of our traffic management solution based on lane-direction changes. changes based on their large scale impact. The architecture of our hierarchical solution is illustrated in Figure 3. The bottom layer consists of a set of autonomous agents that operate at the intersection lev el. An agent finds suitable lane-direction changes for the road segments that connect to a specific intersection. The agent uses reinforcement learning [11], which helps determine the best changes based on multiple dynamic factors. The agents send the proposed lane-direction changes to the upper layer , which consists of a coordinating agent. The coordinating agent maintains a data structure, named P ath Dependency Graph (PDG) , which is built based on the trip information of connected autonomous vehicles. W ith the help of the data structure, the coordinat- ing agent ev aluates the global impact of the proposed lane- direction changes and decides what changes should be made to the traffic lanes. The decision is sent back to the bottom layer agents, which will make the changes accordingly . The main contributions of our work are as follows: • W e formulate dynamic lane allocation as a new graph problem (Dynamic Resource Allocation problem). • W e propose a hierarchical multi-agent solution (called CLLA) for efficient dynamic optimization of lane- directions that uses reinforcement learning to capture dynamic changes in the traffic. • W e introduce an algorithm and innovati ve data structure (called path dependency gr aph ) for coordinating lane- direction changes at the global lev el. • Extensiv e experimental ev aluation shows that CLLA significantly outperforms other traffic management so- lutions, making it a viable tool for mitigating traf fic congestion for future traffic networks. I I . R E L AT E D W O R K A. T raf fic Optimization Algorithms Existing traf fic optimization algorithms are commonly based on traffic flow optimization with linear programming [2], [12], [13]. The algorithms are suitable for the situations where traffic demand and congestion levels are relativ ely static. When there is a significant change in the network, the optimal solutions normally need to be re-computed from scratch. Due to the high computational complexity of finding an optimal solution, the algorithms may not be suitable for highly dynamic traf fic en vironments. W ith the rise of reinforcement learning [14], a new gen- eration of traffic optimization algorithms hav e emerged [15]– [17]. In reinforcement learning, a learning agent can find the rules to achieve an objectiv e by repeatedly interacting with an en vironment. The interactive process can be modelled as a finite Marko v Decision Process, which requires a set of states S and a set of actions A per state. Given a state s of the en vironment, the agent takes an action a . As the result of the action, the en vironment state may change to s 0 with a re ward r . The agent then decides on the next action in order to maximize the re ward in the next round. Reinforcement learning-based approaches can suggest the best actions for traffic optimization giv en a combination of netw ork states, such as the queue size at intersections [18], [19]. They ha ve an advantage ov er linear programming-based approaches, since if trained well, they can optimize traffic in a highly dynamic network. In other words, there is no need to re-train the agent when there is a change in the network. For example, Arel et al. show that a multi-agent system can optimize the timing of adaptive traffic lights based on reinforcement learning [19]. Different to the existing approaches, our solution uses reinforcement learning for optimizing lane-directions. A common problem with reinforcement learning is that the state space can grow exponentially when the dimensionality of the state space grows linearly . For example, let us assume that the initial state space only has one dimension, the queue size at intersections. If we add two dimensions to the state space, traffic signal phase and traf fic lane configuration, there will be three dimensions and the state space is four times as large as the original state space. The fast growth of the state space can make reinforcement learning unsuitable for real deployments. This problem is known as the curse of dimensionality [20]. A common way to mitigate the problem is by using a function approximator such as a neural network. Such techniques have been mainly used for dynamic traffic signal control [21], [22], while we extend the use of the technique to dynamic lane- direction configurations. Many existing traf fic optimization solutions use model- based reinforcement learning, where one needs to know the exact probability that a specific state transits to another specific state as a result of a specific action [23], [24]. Nonetheless, such an assumption is unrealistic since the full knowledge of state transition probabilities can hardly be known for highly complex traffic systems. Dif ferent to model-based approaches, our optimization solution employs a model-free algorithm, Q- learning [25], which does not require such knowledge and hence is much more applicable to real traffic systems. Coordination of multi-agent reinforcement learning can be achiev ed through a joint state space or through a coordination graph [26]. Such techniques howe ver require agents to be trained on the targeted network. Since our approach uses an implicit mechanism to coordinate, once an agent is trained, it can be used in any road network. B. Lane-direction Configurations Research shows that dynamic lane-direction changes can be an effecti ve way to improve traffic efficienc y [3]. Howe ver , existing approaches for optimizing lane-directions are based on linear programming [5]–[7], [27], which are unsuitable for dynamic traffic environments dues to their high computational complexity . For example, Chu et al. use linear programming to make lane-allocation plans by considering the schedule of connected autonomous vehicles [6]. Their experiments show that the total trav el time can be reduced. Ho wev er , the compu- tational time grows e xponentially when the number of vehicles grows linearly , which can make the approach unsuitable for highly dynamic traffic en vironments. Other approaches per- form optimization based on two processes that interact with each other [5], [7], [27]. One process is for minimizing the total system cost by rev ersing lane directions while the other process is for making route decisions for individual vehicles such that all the vehicles can minimize their trav el times. T o find a good optimization solution, the two processes need to interact with each other iteratively . The high computational cost of the approaches can make them unsuitable for dynamic traffic optimizations. Furthermore, all these approaches assume exact kno wledge of traffic demand ov er the time horizon is known beforehand; this assumption does not hold when traffic demand is stochastic [28]. On the contrary , CLLA is lightweight and can adapt to highly dynamic situations based on reinforcement learning. The learning agents can find the ef fectiv e lane-direction changes for individual road intersections ev en when traffic demand changes dramatically . C. T raf fic Management with Connected Autonomous V ehicles Some recent dev elopment of traffic management solutions is tailored for the era of connected autonomous vehicles. T elebpour and Mahmassani de velop a traffic management model that combines connected autonomous vehicles and intelligent road infrastructures for improving traffic flo w [29]. Guler et al. de velop an approach to improv e traffic efficienc y at intersection using connected autonomous vehicles [30]. W e use the CA Vs as an opportunity for lane optimization. T o the best of our kno wledge, we are the first to study dynamic lane-direction changes at large scale networks in the era of connected autonomous vehicles. I I I . P R O B L E M D E FI N I T I O N In this section, we formalize the problem of traffic optimiza- tion based on dynamic configuration of lane directions. Our problem is similar to Network Design Problem [31], howe ver NDP is based on the assumption of knowledge of traffic demand for whole time horizon at time zero and the output network is designed for a common state. W e try to configure a graph (road network) at re gular time intervals based on real-time traffic, thus we name this problem, Dynamic Graph Resour ce Allocation problem. Let G ( V , E ) be a road network graph, where V is a set of vertices and E is a set of edges. Let us assume that edge e ∈ E connects between vertex x ∈ V and verte x y ∈ V . The edge has three properties. The first property is the the total number of lanes, n e , which is a constant number . The second property is the number of lanes that start from x and end in y , n e 1 . The third property is the number of lanes in the opposite direction (from y to x ), n e 2 . n e 1 and n e 2 can change but they are always subject to the following constraint. n e 1 + n e 2 = n e (1) W e assume that a CA V follows a pre-determined path based on an origin-destination (O-D) pair . Let the number of unique O-D pairs of the existing vehicles be k at a giv en time t . For the i th ( i < = k ) O-D pair , let d i,t be the traffic demand at time t , i.e., the number of vehicles with the same O-D pair at that time. The traffic demand can be stochastic. Let the travel time of vehicle j with the i th O-D pair be T T i,j , which is the duration for the vehicle to mov e from the origin to destination. For a giv en time t , the a verage tra vel time of all the v ehicles, which will reach their destination during a time period T after t , can be defined as AT T = k X i =1 m i X j =1 T T i,j / k X i =1 m i (2) where m i is the number of vehicles with the i th O-D pair that will complete their trips between t and t + T . W e define and solve a version of the problem where at frequent regular intervals we optimize travel time, while changing the lane arrangement in all edges. W e find a new graph G 0 t ( V , E 0 ) at a given time t from pre vious G at previous time step. Let e 0 1 , e 0 2 ∈ E 0 and e 0 1 connects from vertex x to verte x y and e 0 2 connects vertex y to vertex x . W e find for all edges the v alues of n e 0 1 and n e 0 2 , such that the average trav el time AT T is minimized. W e call this Dynamic Resource Allocation problem. I V . D E M A N D - B A S E D L A N E A L L O C A T I O N ( D L A ) When considering dynamic lane-direction changes, a straightforward solution can use a centralized approach to optimize lane-directions based on the full knowledge of traf fic demand, i.e., the number of vehicle paths that pass through the road links. W e call this solution Demand-based Lane Allocation (DLA) . Algorithm 1 shows the implementation (in pseudo code) of such an idea to compute the configuration of lane-directions. DLA allocates more lanes for a specific direction when the a verage traf fic demand per lane in the direction is higher than the av erage traf fic demand per lane in the opposite direction. T o specify the directions, we define two terms, upstream and downstream . The terms are defined as follows. Let us assume that all the vertices of the road network graph are ordered by the identification number of the vertices. Giv en two vertices, v 1 and v 2 , and a direction that points from v 1 to v 2 , we say that the direction is upstream if v 1 is lower than v 2 or downstr eam if v 1 is higher than v 2 . DLA first computes the traffic demand at the edges that are on the path of the vehicles (Line 1-6). The traffic demand is computed for the upstream direction ( up e ) and the do wnstream direction ( down e ) separately . Then it quantifies the dif ference of the av erage traf fic demand per lane between the two directions (Line 9-11). Based on the dif ference between the two directions, DLA decides whether the number of lanes in a specific direction needs to be increased (line number 11- 14). W e should note that increasing the number of lanes in one direction implies that the number of lanes in the opposite direction is reduced. DLA only reduces the number of lanes in a direction if the traffic demand in that direction is lower than a threshold (Line 12). The complexity of the algorithm is O ( k | E | ) , where | E | is the number of edges in G and k is the number of O-D pairs. While straightforward to implement and ef fectiv e, there are two notable drawbacks of DLA. First, the algorithm does not consider real-time traffic conditions, such as the queue length at a given time, during optimization; the only information used for optimization is (assumed apriori known) traffic demand and exact kno wledge of traffic demand is difficult to obtain in dynamic road networks [28]. This can make the lane-direction configuration less adapti ve (and less applicable) to real-time traffic conditions. Second, the lane-direction optimization for individual road se gments is performed indi vidually , not consid- ering the potential impact of a lane-direction change at a road segment on other road segments in the same road network. Therefore, a lane-direction change that helps improve traffic efficienc y at a road link may lead to the decrease of traffic efficienc y in other parts of the road network. V . C O O R D I N A T E D L E A R N I N G - BA S E D L A N E A L L O C A T I O N ( C L L A ) T o tackle the problems of the straightforward solution, we propose a fundamentally different solution, a Coordinated Learning-based Lane Allocation (CLLA) framework. CLLA uses a machine learning technique to help optimize lane- direction configurations, which allows the frame work to adapt to a high variety of real-time traffic conditions. In addition, CLLA coordinates the lane-direction changes by considering the impact of a potential lane-direction change on different parts of the road network. DLA, on the other hand, does not consider the global impact of lane-direction changes. Another difference between the two is that DLA requires the full path of vehicles to be kno wn for computing traf fic demand. As detailed later , CLLA only needs to know partial information about vehicle paths in addition to certain information about real-time traf fic conditions, such as intersection queue lengths and lane configuration road segments which can be obtained from inductiv e-loop traffic detectors. CLLA uses a two-layer multi-agent architecture. The bot- tom layer consists of learning agents that are responsible for optimizing the direction of lanes connected to specific intersections. Using the multi-agent approach can significantly boost the speed of learning. The lane-direction changes that are decided by the learning agents are aggregated and ev aluated by a coordinating agent at the upper layer , which will send the globally optimized lane-direction configuration to the bottom layer for making the changes. A more detailed o vervie w of CLLA is shown in Fig- ure 4. As the figure shows, an agent in the bottom layer only observes the local traffic condition around a specific intersection. Agents make decisions on lane-direction changes Algorithm 1: Demand-based Lane Allocation (DLA) Input: A road network graph G ( V , E ) . Input: The set of paths. A path is a sequence of edges on the shortest path between a specific Origin-Destination (O-D) pair . The set of paths includes the paths of all unique O-D pairs. Input: The demands associated with the paths, where a demand is the number of vehicles that follow a specific path. Input: th : demand threshold. Input: g : minimal gap in traffic demand that can trigger a lane-direction change. 1 foreach p ∈ paths do 2 for each e ∈ p do 3 if p passes e in the upstr eam dir ection then 4 up e += demand of p 5 if p passes e in the downstr eam dir ection then 6 dow n e += demand of p 7 foreach e ∈ E do 8 minLoad ← min( up e , down e ) 9 dow n 0 e ← down e / number of downstream lanes 10 up 0 e ← up e / number of upstream lanes 11 g ap ← dow n 0 e − up 0 e up 0 e + dow n 0 e 12 if minLoad < th then 13 if g ap > g then 14 mov e one upstream lane to the set of downstream lanes 15 if g ap < − g then 16 mov e one downstream lane to the set of upstream lanes Area obs erv ed by an RL agent O/D pa irs o f users Bottom lay er ents Upper layer Coor dinati ng agents ne direct i change ion ges P ath dependency gr aph d us Figure 4: An overvie w of the CLLA ’ s architecture independently . Whenever an agent needs to make a lane- direction change, it sends the proposed change to the co- ordinating agent in the upper layer . The agents also send certain traffic information to the upper layer periodically . The information can help indicate whether there is an imbalance between upstream traffic and downstream traf fic at specific road segments. The coordinating agent ev aluates whether a change would be beneficial at the global lev el. The ev aluation process inv olves a nov el data structure, P ath Dependency Graph (PDG) , to inform decisions sent from the bottom layer . The coordinator may allow or deny a lane-direction change request from the bottom-layer . It may also decide to make further changes to lane-directions in addition to the requested changes. After ev aluation, the coordinating agent informs the bottom-layer agents of the changes to be made. Agents Coor dinato r R oad Networ k T T T Time requests fr om agents lane directi on changes buffer re quest s Figure 5: The CLLA ’ s communication timeline between agents and the coordinator W e should note that the coordinator does not need to ev aluate a lane-direction change request as soon as it arrives. As sho wn in Figure 5, the coordinator ev aluates the lane- direction changes periodically . The time interval between the ev aluations is T . All the requests from the bottom-layer agents are buf fered during the interval. The exact value of the interval needs to be adjusted case by case. A short interv al may increase the computational cost of the solution. A long interval may decrease the effecti veness of the optimization. A. CLLA Algorithm Algorithm 2 shows the entire optimization process of CLLA. During one iteration of the algorithm, each learning agent finds the lane-direction changes around a specific road intersection using the process detailed in Section V -B. The proposed change is stored as an edge-change pair , which is buf fered in the system (Line 5). When it is time to ev aluate the proposed changes, the system uses the Dir ection-Change Evaluation algorithm (Section V -C) to quantify the conflicts between the proposed changes (Line 8). For example, when a learning agent proposes to increase the number of upstream lanes on road segment s 1 while another agent proposes a lane-direction change on a different road segment s 2 , which can lead to the increase of the downstream traf fic flow on s 1 , there is a conflict between the proposed changes for s 1 . The Change Ev aluation algorithm also expands the set of the proposed changes that may be beneficial. Upon returning from the Change Ev aluation algorithm, CLLA checks the expanded set of edge-change pairs. For each edge-change pair , if the number of conflicts for the edge are below a giv en limit, the change is applied to the edge (Line 12). B. Learning-based Lane-direction Configuration In the CLLA framew ork, the bottom-layer agents use the Q-learning technique to find suitable lane-direction changes based on real-time traf fic conditions. Q-learning aims to find a policy that maps a state to an action. The algorithm relies on an action value function , Q ( s, a ) , which computes the quality of a state-action combination. In Q-learning, an agent tries to find the optimal policy that leads to the maximum action Algorithm 2: Coordinated Lane Allocation (CLLA) Input: T , action ev aluation interv al Input: E C initial , set of edge-change pairs proposed by the learning agents Input: E C expanded , set of edge-change pairs giv en by the coordinator Input: mc , the maximum number of conflicts between lane-direction changes before a proposed lane-direction change can be applied to an edge. Input: G ( V , E ) , a road network graph. Input: l , the lookup distance for building Path Dependency Graph. Input: dp , the maximum search depth in Path Dependency Graph for ev aluating lane-direction changes. 1 while T rue do 2 for each ag ent ∈ Ag ents do 3 determine the best lane-direction change for all the edges (road segments) that connect to the verte x (road intersection) controlled by the ag ent 4 for each edge e that needs a lane-direction change do 5 E C initial .inser t ( { e, chang e } ) 6 if T = t then 7 t ← 0 8 E C expanded ← Direction-Change Evaluation ( E C initial , G , l , dp ) 9 E C initial ← ∅ 10 for each { e, change } in E C expanded do 11 if number of conflicts on e is less than mc then 12 apply the lane-direction change to e 13 t ← t + 1 value. Q-learning updates the action value function using an iterativ e process as shown in Equation 3. Q new t ( s, a ) = (1 − α ) Q t ( s, a ) + α ( r t +1 + γ m a axQ ( s t +1 , a )) (3) where s is the current state, a is a specific action, s t +1 is the next state as a result of the action, m a axQ ( s t +1 , a ) is the estimated optimal action value in the next state, v alue r t +1 is an observed rew ard at the next state, α is a learning rate and γ is a discount factor . In CLLA, the states, actions and re wards used by the learning agents are defined as follows. 1) States: A learning agent can work with four types of states. The first state represents the current traffic signal phase at an intersection. The second state represents the queue length of incoming vehicles that are going to pass the intersection without turning. The third state represents the queue length of incoming vehicles which are going to turn at the intersection. The fourth state represents the queue length of outgoing vehicles, i.e., the vehicles that have passed the intersection. Although it is possible to add other types of states, we find that the combination of the four states can work well for traffic optimization. 2) Actions: There are three possible actions: increasing the number of upstream lanes by 1, increasing the number of downstream lanes by 1 or keeping the current configuration. When the number of lanes in one direction is increased, the number of lanes in the opposite direction is decreased at the same time. Since a learning agent controls a specific road intersection, the agent determines the action for each individual road segment that connects with the intersection. An agent is allo wed to take a lane-changing action only when there is a traf fic imbalance on the road se gment (see Equation 4 for the definition of traffic imbalance). 3) Rewar ds: W e define the rew ards based on two factors. The first factor is the waiting time of vehicles at an intersec- tion. When the waiting time decreases, there is generally an improv ement of traffic efficienc y . Hence the rew ards should consider the difference between the current waiting time and the updated waiting time of all the vehicles that are ap- proaching the intersection. The second factor is the dif ference between the length of vehicle queues at different approaches to an intersection. When the queue length of one approaching road is significantly longer than the queue length of another approaching road, there is a higher chance that the traf fic becomes congested in the former case. Therefore we need to penalize the actions that increase the difference between the longest queue length and the shortest queue length. The fol- lowing re ward function combines the two factors. A parameter β is used to give weights for the two factors. W e normalized the two factors to stabilize the learning process by limiting rew ard function between 1 to -1. T o giv e equal priority to both factors, we set β to 0.5 in the experiments. R = (1 − β ) × Current wait time − Next wait time max(Next wait time, Current wait time) − β × Queue length difference Aggregated road capacity C. Coordination of Lane-direction Changes W e dev elop the coordinating process based on the observa- tion that a locally optimized lane-direction change may conflict with the lane-direction changes that happen in the surrounding areas. A conflict can happen due to the fact that the effect of a lane-direction change can spread from one road segment to other road segments. For example, let us assume that a constant portion of the upstream traffic that passes through road segment x will also pass through road segment y in the upstream direction later on. An increase of the upstream lanes on x can lead to a significant increase of upstream traffic on x due to the increased traffic capacity in the direction. Over time, the traffic volume change on x can lead to the increase of the upstream traf fic on y , which implies that the number of upstream lanes at y may need to be increased to suit the change of traf fic v olume. In this case, the lane-direction change at y can be seen as a consequential change caused by the change at x . Howe ver , the learning agent that controls the lane-directions at y may suggest an increase of do wnstream lanes based on the current local traffic condition at y . If this is the case, the locally optimized change will conflict with the consequential change. The key task of the coordinating process is quantifying such conflicts in road networks. If there are a large number of conflicts at a road segment, the locally optimized change should not be applied because it may have a negati ve impact on the traffic flows at the global le vel later on. This is a key idea behind the coordination process of our solution. As sho wn in Section V -A, our solution applies a proposed lane-direction change to a road segment only when the number of the conflicts is below a giv en threshold. T o help identify the conflicts between lane-direction changes, we develop a novel data structure, named P ath De- pendency Graph (PDG) . The data structure maintains several types of traffic information, including the path of traffic flow , the proposed lane-direction changes and the current traffic conditions. The coordinating agent uses PDG to search for the locations of consequential lane-direction changes. The conflicts between lane-direction changes are then identified by comparing the consequential lane-direction changes and the proposed lane-direction changes at the same locations. The coordinating agent also proposes additional lane-direction changes using PDG. A PDG ( P D G ( V P DG , E P DG ) ) consists of a number of vertices and a number of directional edges. A vertex v ∈ V P DG represents a road segment. The corresponding road segments of the two vertices must appear in the path of a vehicle. A verte x can connect to a number of out-degree edges and a number of in-degree edges. The direction of an edge depends on the order of traf fic flow that goes through the two road segments. An edge that starts from vertex v 1 and ends in verte x v 2 shows that the traf fic flow will pass through v 1 ’ s corresponding road segment first then pass through v 2 ’ s corresponding road segment later . W e should also note that the two road segments, which are linked by an edge, do not hav e to share a common road intersection, i.e., they can be disjoint. Giv en the path of all the vehicles, a PDG can be constructed such that all the unique road segments on the vehicle paths hav e corresponding vertices in the graph. For each pair of the road segments on a vehicle path, there is a corresponding edge in the graph. If the path of two or more vehicles goes through the same pair of road segments, there is only one corresponding edge in the graph. A verte x of PDG has the following properties. • Proposed Change: The proposed lane-direction change at the corresponding road segment. This may be submit- ted from a learning agent or created by the system during the coordinating process. The property value can be 1 , 0 and − 1 . A value of 1 means the upstream direction gets one more lane. A value of 0 means there is no need for a change. A value of − 1 means the downstream direction gets one more lane. • Consequential Changes: A list of potential lane- direction changes caused by lane-direction changes at other road segments. Similar to the Proposed Change property , the value of a consequential change can be 1 , 0 and − 1 . • Imbalance: The lane direction which has a considerably higher traffic load than the other direction. The property value can be upstr eam , dow nstream and none . In our implementation, the imbalance of traffic load is measured based on the queue length in the opposite directions. Let q up be the upstream queue length and q down be the downstream queue length. Let the total queue length in both directions be q total . Let P be a threshold percentage. The property value is computed as follows. I mbalance = upstr eam, if q up /q total > P dow nstream, if q down /q total > P 0 , otherwise (4) Due to the dynamic nature of traffic, imbalance v alue may change frequently , leading to frequent changes of lane- directions. This may not be ideal in practice. One can get a steady imbalance v alue by adding certain restrictions in the computation. For example, one may require that the ratio between upstream queue length and the total queue length must be above the threshold for a certain period of time before setting the imbalance value to upstream. • Current Lane Configuration: The number of upstream lanes and the number of downstream lanes in the corre- sponding road segment. An edge of PDG has a property called impact , which shows whether a lane-direction change at the starting vertex can lead to the same change at the ending vertex. The value of this property can be 1 or − 1 . A value of 1 means the change at both vertices will be the same. For example, if the change at the starting verte x is increasing the number of upstream lanes, the change at the ending vertex will also be increasing the number of upstream lanes. A value of − 1 means the changes at the vertices will be opposite to each other . The relationship between the changes and the property value is shown in Equation 5, where the starting vertex is v 1 and the ending vertex is v 2 . The property value is determined based on the path of the majority of the vehicles that mov e between the two corresponding road segments. If the path passes through both road segments in the same direction, the property value is 1 . Otherwise, the property value is − 1 . The impact property is key for finding the consequential change at the ending verte x gi ven the change at the starting vertex. As shown in Equation 6, the consequential change at the ending verte x can be computed based on the property value and the initial change at the starting verte x. impact ( v 1 ,v 2 ) = chang e v 1 × chang e v 2 (5) chang e v 2 = impact ( v 1 ,v 2 ) × chang e v 1 (6) When constructing a PDG, it may not be necessary to consider the full path of vehicles due to two reasons. First, the full path of vehicles can consist of a large number of road segments. The size of the graph can gro w exponentially when the length of path increases. Second, due to the highly dynamic nature of traffic, the coordination of lane-direction changes should only consider the traffic conditions in the near future. Therefore, in our implementation, we set an upper limit to the number of road segments in vehicle paths when building a PDG. The limit is called lookup distance in our experiments. W e show an example road network (Figure 6a) and its corresponding PDG (Figure 6b). The road netw ork has 12 roads segments (A to L). There are two paths going through the network, path α and path β . Path α passes through 4 edges (A, F , I, J). Path β passes through 3 edges (C, F , H). The 7 edges correspond to 7 vertices in the PDG. The PDG contains 3 edges starting from A (A-F , A-I, A-J) because path α passes through A, F , I and J in the road network. Similarly , the PDG contains 2 edges that starting from F . For each edge in the PDG, the value of its impact property is attached to the edge. As path α goes through the edges (A, F , I and J) in the upstream direction (Figure 6a), the impact value at all the edges between the corresponding vertices is 1 in the PDG (Figure 6b). Differently , the impact value of the edge C-H is -1 in the PDG. This is because path β goes through C in the upstream direction but it goes through H in the downstream direction. A D C L E F G B I J H K 3 4 1 2 5 6 9 8 7 10 12 11 P ath α P ath β (a) A simple road network with two paths (red and green) C H F A I J -1 1 1 1 1 11 (b) P ath dependency graph (PDG) based on the road net- work in Figure 6a . The coordinator uses the Direction-Change Evaluation al- gorithm (Algorithm 3) to quantify the conflicts between lane- direction changes. The algorithm trav erses through a PDG in a breadth-first manner in iterations. The number of iterations is controlled by a depth parameter (shown as dp in Algorithm 3). In the first round of iteration, the algorithm starts with the lane-direction changes that are proposed by the bottom-layer learning agents. For each vertex with a proposed change, its first-depth neighbours (out-degree nodes) are visited (Step 6). For each of the neighbours, the consequential change caused by the proposed change is computed. This can be done with the process shown in Equation 6. Then the algorithm updates the count of conflicts at the neighbour’ s corresponding road network edge. In the next iteration, the algorithm starts with all the neighbour vertices that are visited in the pre vious round. After each iteration dp is decremented. The algorithm stops when dp reaches zero. The Direction-Change Evaluation algorithm not only quan- tifies the conflicts between lane-direction changes but also expands the set of lane-direction changes for the road se gments that are visited during the trav ersal of the PDG. The rationale is that the bottom-layer learning agents may not propose lane- direction changes for road segments when they do not predict any benefit of the change based on local traf fic conditions. Howe ver , the lane-direction changes in other parts of the road network may ev entually affect the traffic conditions at these road segments, leading to traffic congestions. The algorithm pre-emptiv ely attempts lane-direction changes for these road segments when it predicts that there can be consequential changes caused by the changes in other parts of the road network. This can help mitigate incoming traffic congestions. As shown in Step 6 of Algorithm 3, a Direction-Change Cre- ation algorithm is used for proposing additional lane-direction changes. Details of the Direction-Change Creation algorithm are shown in Algorithm 4. Every time coordinator executes Direction-Change Ev aluation , a new lane configuration is computed and new G 0 t ( V , E 0 ) is generated. Complexity of Coordinating Process. Let us assume there are m number of requests from bottom layer agents. The degree of a node in PDG is deg ( v ) , where v ∈ V P DG . The algorithm traverses m BFTs throughout the PDG for certain depth dp . Then complexity of m BFTs is O ( m ( deg ( v ) dp )) . Howe ver , according to lemma A.1, deg ( v ) is independent of the road network size or number of paths for a giv en l . The depth dp is constant irrespecti ve of the road netw ork size. Then the algorithm complexity can be reduced to O ( m ) ; the algorithm complexity is linear in the number of requests from agents within the buf fering period. In the worst case, there can be requests for each road segment of the road network, G ( V , E ) , leading to the complexity of O ( | E | ) . Distributed version. Algorithm 3 can work with a set of distributed agents (coordinating agents) in the upper layer . In Algorithm 3, ex ecution is independent of the order of requests coming from the bottom layer agents. Therefore, requests can be processed in a distributed manner . Every coordinating agent traverses first depth and inform changes to other agents. Once ev ery agent finishes their first depth, all coordinating agents start their second depth, and so on. In such a setting, the complexity of the algorithm is O (1) . In this work, we implemented the centralized version, howe ver , when applied to larger road networks, the distributed version can be implemented. V I . E X P E R I M E N TA L M E T H O D O L O G Y W e compare the proposed algorithm, CLLA, against DLA and two other baseline algorithms using traf fic simulations. W e ev aluate the performance of the algorithms for two road networks based on real traffic data. The effects of individual parameters of CLLA and DLA are also ev aluated. The rest of the section details the settings of the experiments. A. Experimental setup Simulation Model. W e simulate a traffic system similar to the ones used for reinforcement learning-based traffic optimization in [23], [32]. In our implementation, vehicles Algorithm 3: Dir ection-Change Evaluation Input: E C initial , a set of edge-change pairs proposed by the learning agents Input: G ( V , E ) , A road network graph. Each edge in the graph has a property , conflict count , which has an integer value that is set to 0 initially . Input: l , the lookup distance of PDG Input: dp , the depth of search Output: E C expanded , a set of edge-change pairs giv en by the coordinator 1 Build a PDG based on the next l road segments on the path of vehicles. For each PDG verte x, its properties, proposed change and consequential changes , are set to empty values initially . 2 Create an empty set N . For each edge-change pair in E C initial , find the corresponding verte x v in PDG and update its proposed change property . Add v to N . 3 Set the current depth of search to dp . 4 If the current depth is above 0, do the following steps. Otherwise, jump to Step 8. 5 Create an empty set N 0 . 6 For each v in N , first check whether v has a proposed change. If not, get a proposed change for v using the Direction-Change Creation algorithm. Then for each of v ’ s neighbours at the end of its out-degree arcs, v o , identify the consequential change at the verte x that is caused by the proposed change at v . Add the consequential change to the consequential changes of v o if the change does not exist on the list. If v o already has a proposed change but the proposed change is different to the consequential change at v o , increase the conflict count of the corresponding road network edge by 1. Add v o to N 0 . 7 Decrease the current depth of search by 1. Replace the vertices in N with the vertices in N 0 . Go back to Step 4. 8 For each PDG verte x v with a proposed change, create a corresponding edge-change pair and add the pair to E A expanded . Exit the algorithm. on a road link are modelled based on travel time, which is the sum of two v alues, pure transmit time and waiting time . Pur e transmit time is the time taken by a vehicle to travel through the road link at the free-flo w speed. W aiting time is the duration that a vehicle w aits in a traffic signal queue. When the direction of a lane needs to be changed, all existing vehicles in the lane need to leave the lane and mov e into the adjacent lane in the same direction. V ehicles trav elling in the opposite direction can use the lane only after it is cleared of traffic. Road Networks. W e run experiments based on the real taxi trip data from New Y ork City [33]. The data includes the source, the destination and the start time of the taxi trips in the city . W e pick two areas for simulation (Figure 8a and Figure 8b) because the areas contain a larger number of sources and destinations than other areas. The road network of the simulation areas is loaded from OpenStreetMap [34]. For a Algorithm 4: Dir ection-Change Creation Input: v , a PDG vertex that corresponds to an edge in a road network graph. The value of the imbalance property is set to none initially . Output: chang e , the proposed lane-direction change for e , which can be 1( upstr eam ) , 0( none ) and − 1( dow nstream ) . The default v alue is 0 . 1 conseq uential up : whether the consequential changes at v include one that increases the number of upstream lanes. 2 conseq uential down : whether the consequential changes at v include one that increases the number of downstream lanes. 3 if imbalance = upstream then 4 if conseq uential up = T r ue and conseq uential down = F al se then 5 chang e ← 1 (change one lane from downstream to upstream) 6 if imbalance = downstream then 7 if conseq uential up = F al se and conseq uential down = T r ue then 8 chang e ← − 1 (change one lane from upstream to downstream) 9 if imbalance = none then 10 if ( conseq uential up = T r ue and downstream has mor e lanes than upstr eam then 11 chang e ← 1 (change one lane from downstream to upstream) 12 if ( conseq uential down = T r ue and upstream has mor e lanes than downstr eam then 13 chang e ← − 1 (change one lane from upstream to downstream) (a) Long Island (LI) (b) Midto wn Manhattan (MM) Figure 7: The road network of two simulation areas in New Y ork specific taxi trip, the source and the destination are mapped to the nearest OpenStreetMap nodes. The shortest path between the source and destination is calculated. The simulated v ehicles follow the shortest paths generated from the taxi trip data. Comparison baselines. Different to the proposed solution, CLLA, the existing approaches for optimizing lane-directions are based on linear programming, which makes them unsuit- able for large-scale dynamic optimization due to the high computation cost. Due to the lack of comparable solutions, we define three baseline solutions, which are used to compare against CLLA. In our experiments, the traffic signals use static timing and phasing, regardless of which solution is used. W e conduct comparativ e tests against the follo wing solutions: • No Lane-direction Allocations ( no-LA ): This solution does not do any lane-direction change. The traffic is controlled by static traffic signals only . • Demand-based Lane Allocations ( DLA ): In this so- lution, the lane-direction changes are computed with Algorithm 1. • Local Lane-direction Allocations ( LLA ): This solution uses multiple learning agents to decide lane-direction changes. The optimization is performed using the ap- proach described in Section V -B. LLA is similar to CLLA but there is no coordination between the agents. • Coordinated Learning-based Lane Allocations ( CLLA ): This is the two layer optimization frame work described in Section V -A. B. Evaluation Metrics W e measure the performance of the solutions based on the following metrics. A verage travel time : The trav el time of a vehicle is the duration that the vehicle spends on travelling from its source to its destination. W e compute the av erage trav el time based on all the vehicles that complete their trips during a simulation. A higher average travel time indicates that the traf fic is more congested during the simulation. Our proposed solutions aim to reduce the average tra vel time. More information about this metric is shown in Section III. Deviation from free-flo w travel time : The free-flow travel time of a vehicle is the shortest possible trav el time, achiev ed when the vehicle tra vels at the speed limit of the roads without slo wing down at traffic lights during its entire trip. Deviation from Free-Flo w travel T ime ( D F F T ) is defined as in Equation 7, where t a is the actual time and t f is the free- flow travel time. The lo west value of DFFT is 1, which is also the best value that a vehicle can achieve. D F F T = t a /t f (7) C. P arameter Sensitivity T esting W e ev aluate the effects of the hyper-parameters of CCLA and DLA , which are directly related to lane-direction changes in the simulation model. T o ev aluate the effects of a specific parameter , we run a set of tests varying the v alue of the parameter (while keeping the value of other parameters at their default reported in T able I). The a verage trav el time is reported for each of the tests. The detailed settings of the parameters are shown in T able I. W e describe the parameters as follows. Cost of lane-direction change in CLLA : The cost of a lane-direction change is the time spent on clearing the lane that needs to be changed. When the direction of a lane needs to be changed, all the existing vehicles in the lane need to leav e the lane before the lane can be used by the vehicles from the opposite direction. The time spent on clearing the lane can vary due to v arious random factors in the real world. For example, the vehicles in the lane may not be able to mov e Parameter Range Default value Cost of lane-direction change in CLLA and DLA (seconds) 40 - 480 120 Aggressiv eness of lane-direction change in CLLA (seconds) 100-1000 300 Depth in CLLA 1-5 2 Lookup distance in CLLA 3-7 5 Update period in CLLA (minutes) 0.3 - 20 0.3 Update period in DLA (minutes) 2.5-20 10 T able I: Settings used in the parameter sensitivity experiments to an adjacent lane immediately if the adjacent lane is highly congested. W e vary the v alue of this parameter in a large range, from 40 seconds to 480 seconds. Aggressi veness of lane-direction change in CLLA : This parameter affects the minimum interv al between lane-direction changes. A lane-direction change can only happen when there is a traffic imbalance between the two directions at a road segment. The imbalance is computed based on the model as shown in Equation 4 (Section V -C). Based on an existing study [3], we set the threshold percentage P of the model to 65% and require that the traffic imbalance must last for a minimum time period before a lane-direction change can be performed. W e define the aggressi veness of lane-direction changes in CLLA as the length of the period. When the period is short, the system can perform lane-direction changes at smaller intervals, and vice-versa. Depth in CLLA : This is the parameter dp used in Algo- rithm 3. When the depth is larger , CLLA can explore more vertices in the PDG, which allo ws it to detect the impact of a lane-direction change on the road segments that are further away from the location of the change. Lookup distance in CLLA : This is the parameter l used in Algorithm 3. It can affect the number of vertices and the number of edges in a PDG. W ith a higher lookup distance, the PDG needs to consider more road segments in the path of vehicles, which can help identify the impact of lane-direction changes at a longer distance but can increase the size of the graph at the same time. Update period in CLLA : This parameter controls the fre- quency at which coordinating agents decide on lane-direction changes. CLLA is suitable for highly dynamic traffic environ- ments. Hence the update period ∆ t can be set to a low value. W e vary the value of this parameter between 0.3 minute to 20 minutes with the default value set to 0.3 minute. Update period in DLA : This parameter affects the fre- quency at which DLA optimizes lane-direction changes. DLA decides on lane-direction changes based on the traf fic demand that is collected within the update period ∆ t prior to the optimization process. W e v ary the v alue of this parameter between 2.5 min to 20 min with the default value set to 10. V I I . E X P E R I M E N TA L R E S U LT S W e no w present experimental results when comparing CLLA against the baseline algorithms in the first part, and present the sensitivity analysis to the parameter values of the algorithms in the second part. Solution Long Island Midto wn Manhattan DLA 10.04% 49.52% LLA 7.78% 44.19% CLLA 7.77% 46.12% T able II: The percentage of vehicles with a DFFT of higher than 10 A. Comparison against the baselines This experiment compares the performance of the four solutions, which are described in Section VI-A. W e run a number of simulations in this experiment. For each simulation, we extract taxi trip information for one hour using the real taxi trip data from Ne w Y ork. Based on the real data, we generate traffic in the simulation. The experiment is done for two areas as shown in Figure 7a and Figure 7b. T o simulate a larger variety of traffic scenarios, we also up-sample the trip data to generate more vehicles. W e define an Up Sampled F actor , which is the number of vehicles that are generated based on each taxi trip in the taxi data. For LLA and CLLA, the learning rate α is 0.001 and the discount f actor used by Q-learning is 0.75. The parameter minLoad of DLA is set to 100. For other parameters of the solutions, we use the default values as sho wn in T able I. A verage trav el time: Figure 8a and Figure 8b show the av erage trav el time achieved with the four solutions. CLLA outperforms the other solutions in both simulation areas. W e can observe that the av erage trav el time of LLA and CLLA is significantly lower compared to the average travel time of no-LA , which sho ws the benefit of dynamic lane-direction changes. Although DLA achiev es lower travel times than no- LA , it does not perform well compared to CLLA for both areas. CLLA performs consistently better than LLA , because LLA only makes lane-direction changes based on local traf fic information without coordination. W e also test the performance of the solutions for a dif ferent scenario, where the traffic demand is static. V ehicles are generated at a constant rate during a 30-minute period. Under this setting, the traffic is less dynamic than the previous scenario, where the traffic demand is based on real data. Figure 9a and Figure 9b show the average trav el time achiev ed with the four solutions. Interestingly , DLA performs as good as CLLA . This is due to the fact that DLA optimizes traf fic based on the estimated traf fic demand. As the traffic demand is kept constant, the estimated demand can match the actual demand, resulting in the good performance of DLA . On the other hand, CLLA is de veloped for highly dynamic traffic en vironments. When the traffic is static, such as in this scenario, the advantage of the solution is limited. The results show that DLA can work well with static traf fic but does not work well with highly dynamic traffic. CLLA on the other hand works well in both en vironments: substantially outperforming the baselines in dynamic en vironments, and matching the performance of DLA in static en vironments. Deviation from fr ee-flow travel time (DFFT) : T able II shows the percentage of vehicles whose travel time is 10 times or more than their free-flo w trav el time. The results sho w that LLA and CLLA are able to achiev e a lower de viation from the free-flow trav el time compared to DLA . 3 4 5 Up sampled factor 6 8 10 12 14 Average travel time (min) no-LA DLA LLA CLLA (a) Long Island 3 4 5 Up sampled factor 5.0 7.5 10.0 12.5 15.0 17.5 20.0 Average travel time (min) no-LA DLA LLA CLLA (b) Midto wn Manhattan Figure 8: Performance of four solutions with dynamic traf fic 3 4 5 Up sampled factor 2 4 6 8 10 12 14 Average travel time (min) no-LA DLA LLA CLLA (a) Long Island 3 4 5 Up sampled factor 2 4 6 8 10 Average travel time (min) no-LA DLA LLA CLLA (b) Midto wn Manhattan Figure 9: Performance of four solutions with static traf fic B. P arameter sensitivity testing For ev aluating the effects of individual parameters, we run simulations in the area sho wn in Figure 7a. Each simulation lasts for one hour , during which the traffic is generated based on the real taxi trip data from the area. Figure 10 sho ws the effects of four parameters of CLLA . Figure 11 compares the effects of the update period between DLA and CLLA . Our result shows that the trav el time increases when the cost of a lane-direction change increases (Figure 10a). The result indicates that lane-direction changes may not be beneficial in all circumstances. When the cost of lane-direction changes is high, performing the changes can cause significant interruption to the traffic and negate the benefit of the changes. Figure 10b shows how the aggressiv eness of lane-direction changes can affect the trav el time of vehicles. The result shows that a low le vel of aggressiv eness and a high lev el of aggressiv eness have a negati ve impact on travel times. When the le vel of aggressiv eness is lo w , the lane-direction changes can only happen in large intervals. Hence the changes may not adapt to the dynamic change of traf fic. When the level of aggressiveness is high, the system changes the direction of lanes at a high frequency , which can cause significant interruption to the traffic attributed to taking the time to clear the lanes during the changes. Our result shows that the best depth for traversing the PDG is 2 (Figure 10c). When the depth changes from 1 to 2, we observe a decrease in travel time. Howe ver , when the depth is higher than 2, we do not observe a decrease of trav el time. When the depth is higher , the system can identify the impact of a lane-direction change that are further away . Howe ver , the impact can become negligible if the lane-direction change is far away . This is the reason there is no improvement of travel time when the depth is higher than 2. Figure 10d shows that a larger lookup distance can result 40 120 240 360 480 Cost of lane change (sec.) 11.00 11.25 11.50 11.75 Average travel time (min) (a) Cost of change 2 10 20 30 40 50 60 70 Imbalance factor (10 sec.) 10 11 12 13 14 Average travel time (min) (b) Aggressi veness of change 1 2 3 4 5 Depth(dp) 11.00 11.25 11.50 11.75 Average travel time (min) (c) Depth ( dp ) 1 2 3 4 5 6 Lookup distance 10 11 12 13 14 Average travel time (min) (d) Lookup distance ( l ) Figure 10: Effects of four parameters of CLLA 2.5 5.0 10.0 15.0 20.0 Update period (min) 11.00 11.25 11.50 11.75 12.00 Average travel time (min) (a) DLA update period 0.3 2.5 5.0 10.0 15.0 20.0 Update period (min) 10.5 11.0 11.5 12.0 12.5 Average travel time (min) (b) CLLA update period Figure 11: Effects of the update period of CLLA and DLA in a lower av erage travel time. When the lookup distance in- creases, CLLA considers more road se gments in a vehicle path when building the PDG. This helps identify the consequential lane-direction changes on the same path. The reduction in the av erage trav el time becomes less significant when the lookup distance is higher than 2. This is because the impact of a lane- direction change reduces when the change is further away . When the update period ∆ t of DLA is below 5 minutes or beyond 15 minutes, it is less likely to get a good estimation of traf fic demand, which can lead to a relatively high trav el time (Figure 11a). The av erage trav el time is at its minimum when ∆ t is set to 10 minutes. Different to DLA , the trav el time achiev ed with CLLA grows slowly with the increase of ∆ t until ∆ t reaches beyond 15 minutes. The relativ ely steady performance of CLLA shows that the coordination between lane-direction changes can help mitigate traffic congestion for a certain period of time in the future. If minimizing the av erage trav el time is of priority , one can set ∆ t to a very low value, e.g., 5 minutes. If one needs to reduce the computation cost of the optimization while achieving a reasonably good travel time, the ∆ t can be set to a larger value, e.g., 15 minutes. V I I I . C O N C L U S I O N W e hav e sho wn that ef fectiv e traffic optimization can be achieved with dynamic lane-direction configurations. Our proposed hierarchical multi-agent solution, CLLA, can help reduce trav el time by combining machine learning and the global coordination of lane-direction changes. The proposed solution adapts to significant changes of traffic demand in a timely manner, making it a viable choice for realizing the potential of connected autonomous vehicles in traf fic optimization. Compared to state-of-the-art solutions based on lane-direction configuration, CLLA runs more efficiently , and is scalable to large networks. There are many directions one can in vestigate further . An interesting extension would be to incorporate dynamic traffic signals into the optimization process. Currently we assume that the connected autonomous vehicles follow the pre-determined path during their trip. An exciting direction for further research is to dynamically change vehicle routes in addition to the lane- direction changes. The dynamic change of speed limit of roads can also be included in an extension to CLLA. R E F E R E N C E S [1] L. R. Ford and D. R. Fulkerson, “Maximal flow through a network, ” in Classic papers in combinatorics . Springer , 2009, pp. 243–248. [2] L. Fleischer and M. Skutella, “Quickest flows ov er time, ” SIAM J. Comput. , vol. 36, no. 6, pp. 1600–1630, 2007. [3] B. W olshon and L. Lambert, “Planning and operational practices for rev ersible roadways, ” Institute of T ransportation Engineer s. ITE J ournal , vol. 76, no. 8, pp. 38–43, August 2006. [4] L. Lambert and B. W olshon, “Characterization and comparison of traffic flow on reversible roadways, ” Journal of Advanced Tr ansportation , vol. 44, no. 2, pp. 113–122, 2010. [5] J. J. W u, H. J. Sun, Z. Y . Gao, and H. Z. Zhang, “Reversible lane-based traffic network optimization with an advanced traveller information system, ” Engineering Optimization , vol. 41, no. 1, pp. 87–97, 2009. [6] K. F . Chu, A. Y . S. Lam, and V . O. K. Li, “Dynamic lane rev ersal routing and scheduling for connected autonomous vehicles, ” in 2017 International Smart Cities Conference (ISC2) , Sep. 2017, pp. 1–6. [7] M. Hausknecht, T . Au, P . Stone, D. Fajardo, and T . W aller, “Dynamic lane rev ersal in traf fic management, ” in 2011 14th International IEEE Confer ence on Intelligent Tr ansportation Systems (ITSC) , Oct 2011, pp. 1929–1934. [8] Georgia Department of Transportation, “ Advantages and disadvantages of rev ersible managed lanes, ” http://www .dot.ga.gov/BuildSmart/ Studies/ManagedLanesDocuments/Emerging%20Issues- Re versible% 20Managed%20Lanes.pdf, 2010. [9] R. K. Narla, T . T echnology , and U. States, “The Evolution of Connected V ehicle T echnology : From Smart Driv ers to Smart Cars to . . . Self- Driving Cars, ” no. July , pp. 22–26, 2013. [10] K. Ramamohanarao, J. Qi, E. T anin, and S. Motallebi, “From how to where: Traf fic optimization in the era of automated vehicles, ” in Pr oceedings of the 25th ACM SIGSP ATIAL International Conference on Advances in Geogr aphic Information Systems , ser. SIGSP A TIAL ’17. New Y ork, NY , USA: A CM, 2017, pp. 10:1–10:4. [11] R. S. Sutton and A. G. Barto, Introduction to Reinforcement Learning , 1st ed. Cambridge, MA, USA: MIT Press, 1998. [12] L. R. Ford and D. R. Fulkerson, “Constructing maximal dynamic flows from static flows, ” Oper . Res. , vol. 6, no. 3, pp. 419–433, jun 1958. [13] E. K ¨ ohler , R. H. M ¨ ohring, and M. Skutella, Tr affic Networks and Flows over T ime . Berlin, Heidelberg: Springer Berlin Heidelberg, 2009, pp. 166–196. [14] N. R. Ravishankar and M. V . V ijayakumar , “Reinforcement learning algorithms: Survey and classification, ” Indian Journal of Science and T echnology , vol. 10, no. 1, 2017. [15] E. W alra ven, M. T . Spaan, and B. Bakker , “Traf fic flow optimization: A reinforcement learning approach, ” Engineering Applications of Artificial Intelligence , vol. 52, pp. 203–212, 2016. [16] K.-L. A. Y au, J. Qadir, H. L. Khoo, M. H. Ling, and P . Komisarczuk, “A Survey on Reinforcement Learning Models and Algorithms for Traf fic Signal Control, ” A CM Computing Surveys , vol. 50, no. 3, pp. 1–38, 2017. [17] P . Mannion, J. Duggan, and E. Howley , “An Experimental Revie w of Reinforcement Learning Algorithms for Adapti ve Traffic Signal Control, ” pp. 47–66. [18] M. Aslani, S. Seipel, M. S. Mesgari, and M. Wiering, “T raffic signal optimization through discrete and continuous reinforcement learning with robustness analysis in downto wn T ehran, ” Advanced Engineering Informatics , vol. 38, no. June 2017, pp. 639–655, 2018. [19] I. Arel, C. Liu, T . Urbanik, and A. K ohls, “Reinforcement learning-based multi-agent system for network traffic signal control, ” IET Intelligent T ransport Systems , vol. 4, no. 2, p. 128, 2010. [20] C. Boutilier, “Planning, learning and coordination in multiagent decision processes, ” in Pr oceedings of the Sixth Confer ence on Theor etical Aspects of Rationality and Knowledge, De Zeeuwse Str omen, The Netherlands, Mar ch 17-20 1996 , 1996, pp. 195–210. [21] W . Genders and S. N. Razavi, “Using a deep reinforcement learning agent for traffic signal control, ” CoRR , vol. abs/1611.01142, 2016. [22] E. V an der Pol and F . A. Oliehoek, “Coordinated deep reinforcement learners for traffic light control, ” in NIPS’16 W orkshop on Learning, Infer ence and Contr ol of Multi-Agent Systems , Dec. 2016. [23] M. Wiering, “Multi-agent reinforcement learning for traffic light con- trol, ” 17th International Conference on Machine Learning (ICML 2000) , pp. 1151–1158, 2000. [24] M. Steingr ¨ over , R. Schouten, S. Peelen, E. Nijhuis, and B. Bakker , “Reinforcement learning of traffic light controllers adapting to traffic congestion, ” in In Proceedings of the Belgium-Netherlands Artificial Intelligence Conference, BNAIC’05 , 2005. [25] C. J. C. H. W atkins and P . Dayan, “Q-learning, ” Machine Learning , vol. 8, no. 3, pp. 279–292, May 1992. [26] C. Guestrin, M. G. Lagoudakis, and R. Parr , “Coordinated reinforcement learning, ” in Pr oceedings of the Nineteenth International Conference on Machine Learning , ser. ICML ’02. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2002, pp. 227–234. [27] H. Zhang and Z. Gao, “T w o-way road network design problem with variable lanes, ” Journal of Systems Science and Systems Engineering , vol. 16, no. 1, pp. 50–61, Mar 2007. [28] M. W . Levin and S. D. Boyles, “ A cell transmission model for dynamic lane rev ersal with autonomous vehicles, ” T ransportation Resear ch P art C: Emer ging T echnologies , vol. 68, pp. 126 – 143, 2016. [29] A. T alebpour and H. S. Mahmassani, “Influence of connected and autonomous vehicles on traffic flow stability and throughput, ” T rans- portation Researc h P art C: Emer ging T ec hnologies , vol. 71, pp. 143 – 163, 2016. [30] S. I. Guler , M. Menendez, and L. Meier , “Using connected vehicle technology to improve the efficienc y of intersections, ” T ransportation Resear ch P art C: Emerging T echnolo gies , vol. 46, pp. 121 – 131, 2014. [31] R. T . W . T .L. Magnanti, “Network design and transportation planning : Models and algorithms, ” T ransportation Science , vol. 18, no. 1, pp. 1 – 55, 1984. [32] E. K ¨ ohler and M. Strehler, “T raffic signal optimization: combining static and dynamic models, ” T ransportation Science , vol. 53, no. 1, pp. 21–41, 2018. [33] [Online]. A vailable: https://www1.n yc.gov/site/tlc/about/ tlc- trip- record- data.page [34] [Online]. A v ailable: https://www .openstreetmap.org A P P E N D I X D E G R E E O F A V E RT E X I N P D G Lemma A.1. The maximum node density ∆( P D G ) is inde- pendent of the underline r oad network G ( V , E ) and number of vehicle paths. Pr oof. Let v ∈ V P DG , maximum vertex de gree ∆( P D G ) can be found as follows. A degree of a vertex v G ∈ G ( V , E ) (road network) depends on the number of roads connected to an intersection. A special property of a road graph is that the degree of a node does not increase with the network size. Let there be n number of roads on av erage, connected to one intersection in G . Then deg ( v G ) = n . Now let us take v ∈ V P DG is also v ∈ E , where v is a road in the road network. Starting from v , within l lookup distance, there can be maximum of n l roads. Since n does not increase with the network size n l also does not increase with the network size. Assuming the worse case, there can be paths from v to each of these n l roads. Let R be the set of roads in n l . According to the definition of P D G , if there is a path between v and r ∈ R , ∃ e v ,r ∈ V P DG ∀ r ∈ R . This means there are n l number of edges from v . Therefore deg ( v ) = n l . Then the maximum node density ∆( P D G ) = n l Note that ∆( P D G ) is independent of the size of G ( V , E ) and number of paths.
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment