INTERACTION Dataset: An INTERnational, Adversarial and Cooperative moTION Dataset in Interactive Driving Scenarios with Semantic Maps

INTERA CTION Dataset: An INTERnational, Adversarial and Cooperati ve moTION Dataset in Interactiv e Driving Scenarios with Semantic Maps W ei Zhan 1 , Liting Sun 1 , Di W ang 2 ,? , Haojie Shi 3 ,? , Aubrey Clausse 4 , Maximilian Naumann 5 ,? , Julius K ¨ ummerle 5 , Hendrik K ¨ onigshof 5 , Christoph Stiller 5 , Arnaud de La Fortelle 4 and Masayoshi T omizuka 1 Abstract —Interactive motion datasets of road participants are vital to the development of autonomous vehicles in both industry and academia. Research areas such as motion prediction, motion planning, repr esentation learning, imitation learning, beha vior modeling, behavior generation, and algorithm testing, require support from high-quality motion datasets containing interactive driving scenarios with different driving cultures. In this paper , we present an INTERnational, Adversarial and Cooperati ve moTION dataset (INTERA CTION dataset) in interactive driving scenarios with semantic maps. Five features of the dataset are highlighted. 1) The interactive driving scenarios ar e diverse, including urban/highway/ramp merging and lane changes, roundabouts with yield/stop signs, sig- nalized intersections, intersections with one/two/all-way stops, etc. 2) Motion data from different countries and different continents are collected so that driving preferences and styles in different cultures are naturally included. 3) The driving behavior is highly interactive and complex with adversarial and cooperative motions of various trafﬁc participants. Highly complex behavior such as negotiations, aggressi ve/irrational decisions and trafﬁc rule violations are densely contained in the dataset, while regular behavior can also be found from cautious car -follo wing, stop, left/right/U-turn to rational lane-change and cycling and pedes- trian crossing, etc. 4) The levels of criticality span wide, from regular safe operations to dangerous, near-collision maneuvers. Real collision, although relatively slight, is also included. 5) Maps with complete semantic information are pr ovided with physical layers, reference lines, lanelet connections and trafﬁc rules. The data is recorded from drones and trafﬁc cameras, and the processing pipelines for both are brieﬂy described. Statistics of the dataset in terms of number of entities and interaction density are also provided, along with some utilization examples in the areas of motion prediction, imitation learning, decision-making and planing, representation learning, interaction extraction and social behavior generation. The dataset can be downloaded via https://interaction- dataset.com. I . I N T RO D U C T I O N In order to enable fully autonomous driving in complex sce- narios, comprehensiv e understanding and accurate prediction 1 W . Zhan, L. Sun, and M. T omizuka are with the Mechanical Systems Control (MSC) Laboratory , Department of Mechanical Engi- neering, University of California, Berkeley , CA 94720 USA. (e-mail: wzhan@berkeley.edu). 2 D. W ang is with Xi’an Jiaotong Univ ersity , Xi’an, P .R. China. 3 H. Shi is with Harbin Institute of T echnology , Harbin, China. 4 A. Clausse and A. de La F ortelle are with MINES P arisT ech, Paris, France. 5 M. Naumann, J. K ¨ ummerle, H. K ¨ onigshof and C. Stiller are with FZI Research Center for Information T echnology and Karlsruhe Institute of T echnology , Karlsruhe, Germany . ? The work was conducted during their visit to the MSC Lab at University of California, Berkeley . Fig. 1: Examples of the detection and tracking results in highly interactiv e dri ving scenarios in the dataset. of the behavior and motion of other road users are required. Moreov er , autonomous vehicles need to behave like vehicles with human drivers to make themselves more predictable to others and thus, facilitate cooperation. These are two of the major challenges in the ﬁeld of autonomous driving. T o ov ercome these challenges, considerable amount of research efforts hav e been dev oted to: i) predicting the future intention and motion of other road users [1]–[3], ii) modeling and analyzing driving beha vior [4], [5], iii) clustering the motion and ﬁnding representation of the motion primitiv es [6], [7], iv) cloning and imitating human and expert behavior [8], [9], and v) generating human-like and social behavior and motion [10]–[12]. All the aforementioned research areas require interactiv e vehicle motion data from real-w orld dri ving scenarios, which is the most fundamental and indispensable asset. NGSIM dataset [13] is the most popular one used in the aforementioned areas, such as prediction [14]–[16], behavior modeling [5], social behavior generation and planning [10], and represen- tation learning [6], since it is publicly available with decent scale and quality . The recently released highD dataset [17] also greatly assists beha vior-related research such as prediction [18]. Public motion datasets such as NGSIM and highD facilitated, but also restricted behavior-related research due to limited diversity , complexity and criticality of the scenarios and behavior . Also, the importance of map information and completeness of interaction entities were under-addressed in most of the e xisting datasets. Ho wever , these missing points are crucial for behavior -related research, which will be dis- cussed in the following. 1) Diversity of inter active driving scenarios: Recent behavior -related research using public datasets was mostly restricted to highway scenarios due to the data av ailability . There are many more highly interacti ve driving scenarios to explore, such as roundabouts with yield/stop signs, (unsignal- ized) one/two/all-way stop intersections (shown in Fig. 1), sig- nalized intersections with unprotected left turn, zipper merge in cities, etc. 2) International diving cultur es: Most of the existing datasets only contain driving data in one speciﬁc country . Howe ver , driving cultures in dif ferent countries and different continents can be distinct for very similar scenarios. Without motion data in similar scenarios from different countries, it is not possible to incorporate the impact of dri ving cultures in different countries, such as dri ving styles, preferences, risk tolerance, understanding of trafﬁc rules, etc., for behavior modeling and analysis as well as the design of adaptive prediction and planning algorithms in dif ferent countries. 3) Complexity of the scenarios and behavior: Most of the scenarios in the existing public datasets are relatively simple and structured with explicit right-of-way . The beha vior of the driv ers is only occasionally impacted by others. There is very little social pressure (such as several v ehicles waiting behind and ev en honking) on the driv ers, so that their behavior is cautious without aggressiv e and irrational decisions. A motion dataset with much more complex and interactiv e behavior and scenarios is expected to facilitate the research tackling real and challenging problems. 4) Criticality of the situations: Critical situations (such as near-collision cases) are much more challenging and v aluable than others for beha vior-related research areas. For instance, [15] proposed a fatality-a ware prediction benchmark empha- sizing prediction inaccuracies in critical situations. Howe ver , critical situations are too sparse in e xisting motion datasets, and can hardly be identiﬁed. Therefore, a motion dataset with denser critical situations is necessary to facilitate the research efforts on those dif ﬁcult problems. 5) Map information: Map information with references and semantics such as lanelet connections and trafﬁc rules, are crucial for behavior-related research areas such as motion planning and prediction. It provides key information on input (features), such as route and goal point [9], distance to the merging point [14], [15], lateral position within the lane [10], etc., and mak es the algorithms generalizable to other scenarios. Such semantic maps are currently missing for most of the existing public motion datasets. 6) Completeness of interaction entities: In order to ac- curately model, predict and imitate the interactiv e vehicle behavior , it is crucial to provide motions of all surrounding entities which may impact their behavior in the dataset. This requirement was often ov erlooked when using motion data collected by onboard sensors due to occlusions and limited ﬁeld of view of the sensors. Although existing motion datasets collected from onboard sensors contain data collected from a wide range of areas for long time periods, complete and meaningful interaction pairs are relati vely sparse. In this paper, we will emphasize all the aforementioned aspects to construct an international motion dataset collected by drones and trafﬁc cameras. • Diverse and international: It contains a v ariety of highly interactiv e driving scenarios from dif ferent countries, such as roundabouts, signalized/unsignalized intersec- tions, as well as highway/urban merging and lane change. • Complex and critical: P art of the scenarios are relativ ely unstructured with inexplicit right-of-way . The driving be- havior in the dataset are highly impacted by other driv ers, whose behavior can be aggressiv e or irrational due to the social pressure. Near -collision or slight-collision scenes are contained in the dataset to facilitate the research for critical situations. • Semantic map and complete information: HD maps with semantics are pro vided to generated ke y features in the context. Motions of all entities which may inﬂuence the driving behavior are included in the dataset. The proposed dataset can signiﬁcantly facilitate behavior - related research such as motion prediction, imitation learning, decision-making and planning, representation learning, inter - action extraction and social behavior generation. Results from ex emplar methods in all these areas are pro vided utilizing the proposed dataset. I I . R E L A T E D W O R K A. Datasets fr om Bir d’ s Eye V iew As mentioned in Section I, NGSIM dataset [13] is the most popular vehicle motion dataset among the behavior -related research communities. The raw data was collected by cameras mounted on buildings and processed automatically [20]. The accuracy of the dataset is mostly acceptable. Ho wever , there may be steady errors, and the image projection can signiﬁ- cantly enlarge the size of the vehicles. Researchers proposed methods [21] to rectify the errors, but it can only improve the quality of a small part of the dataset. In view of the problems in NGSIM, highD dataset [17] was constructed by using a drone with more accurate v ehicle motions and larger amount of high way dri ving data than NGSIM. Other datasets [22], [23] from bird’ s eye view are more focused on pedestrian behavior without strong vehicle interactions. The driving scenarios presented in NGSIM and highD are quite limited. NGSIM contains highway driving (including ramp merging and double lane change) and signalized inter - section scenarios. In fact, signalized intersections are mostly controlled by the trafﬁc lights and interactions are very rare T ABLE I: Comparison with existing motion datasets highly interactive scenarios complexity of scenarios density of aggressiv e behavior near-collision situations and collisions HD maps with semantics completeness of interaction entities & viewpoint NGSIM [13] ramp merging, (double) lane change structured roads, explicit right-of-way low very few near-collision no yes, bird’s-e ye-view from a building highD [17] lane change structured roads, explicit right-of-way low very few near-collision no yes, bird’s-e ye-view from a drone Argo verse [19] unsignalized intersections, pedestrian crossing unstructured roads, inexplicit right-of-way low no yes, but partially only for the ego data-collection vehicle INTERA CTION roundabouts, ramp merging, double lane change unsignalized intersections unstructured roads, inexplicit right-of-way high yes yes yes, bird’s-e ye-view from a drone and slight. A small amount of lane changes are interactiv e, but most of them are neither interacti ve nor critical. Ramp mer ging and double lane change can be highly interactive when the trafﬁc is relativ ely dense, but the amount of interaction is still relativ ely limited in NGSIM. HighD only contains highway driving scenarios with car follo wing and lane change. Urban scenarios which contain densely and highly interactiv e behav- ior , such as roundabouts and unsignalized intersections are not included in either of the two public datasets of vehicle motions. B. Datasets fr om Onboar d Sensor s In addition to the bird’ s-eye-vie w motion datasets, two types of onboard-sensor-based ones are also publicly available. One includes motion data of surrounding entities from onboard LiD ARs and front-vie w cameras, such as Argo verse [19] and HDD dataset [24]. The other only contains motions of many data-collection v ehicles from onboard GPS, such as 100-car study [25]. There are two major advantages for datasets from onboard sensors. One is that a variety of driving scenarios with relativ ely long data recording time are usually included in those datasets, such as urban driving at signalized/unsignalized intersections and highway driving with ramp merging, etc. The other is that the occlusions of LiDARs and cameras are recorded so that the actual occlusions from perspectiv e of the ego vehicle can be partially recov ered. Completeness of interaction entities is a major problem when using datasets from onboard sensors for behavior-related research. For motion datasets with GPS-based ﬂeets, it is hard to determine whether the vehicles in an ”interactive” motion segment was actually interacting with each other since there is no motion recording of other surrounding vehicles (or ev en pedestrians) without GPS devices installed. For motion datasets constructed from onboard LiDARs and cameras, it is hard to guarantee that all the surrounding objects impacting the behavior of other vehicles are included in the dataset when predicting the motions of others. Therefore, complete interactions are relativ ely sparse in such kind of datasets. If the sensors cannot cover the full ﬁeld of view , it will be ev en impossible to guarantee the completeness of information for the surrounding entities of the e go data collection vehicle. Also, the data collected in a lar ge area may lead to very few repetitions at the same location. It is hard to learn multi- modal driving beha vior for prediction or planning since only one sequence of motions can be found with similar features at the same location. Map information is also missing in most of the motion datasets. T o the best of our kno wledge, Argo verse is the only motion dataset providing relatively rich map information. Physical layer (locations of curbs, road markings, etc.) is contained and semantic information (lane bounds and turn directions, etc.) required by prediction and planning is partially included. T able I provides a comparison of the three most useful public vehicle motion datasets as well as the one presented in this article. The proposed dataset contains much more di verse, complex and critical scenarios and vehicle motions comparing to the other three. In addition, HD maps with full semantic information are pro vided, and the completeness of interaction entities is superior to datasets from onboard sensors. I I I . F E A T U R E S O F T H E D A TA S E T In this section, we will illustrate the features of the proposed dataset by highlighting the div ersity , internationality , complex- ity , criticality , and semantic map. A. Diversity Fig. 2 illustrates a variety of highly interactiv e dri ving scenarios from trafﬁc cameras and drones in our dataset, including zipper merging in a city (Fig. 2 (a)), ramp merging and lane change on a highway (Fig. 2 (b)), ﬁv e roundabouts with yield and stop signs (Fig. 2 (c) - (g)), several unsignalized intersections with one/two/all-way stops (Fig. 2 (h) - (j)), and unprotected left turn at a signalized intersection (Fig. 2 (k)). In Fig. 2, the ﬁrst two letters of the names represent the sources of the data (drone as DR and trafﬁc camera as TC ), while next three letters represent the corresponding country and the last two represent the scenario code in the dataset. The numbers in circles denote the branch ID for each scenario. Fig. 2 (b) contains sev eral subscenarios. The subscenario with the upper two lanes (that merge into one ﬁnally) is a zipper merging which is similar to the urban counterpart in Fig. 2 (a), where vehicles strongly interact with each other . It is also a ramp for the middle two lanes. The subscenario with the lo wer three lanes (that mer ge into two ﬁnally) is a forced merging and vehicles have to change their lanes. The roundabout in Fig. 2 (f) is an e xtremely b usy 7-way roundabout with one “yield” branch and six “stop” branches. (a) DR_DEU_Mer ging_MT (b) DR_CHN_Mer ging_ZS (d) DR_CHN_Roundabout_LN (e) DR_DEU_Roundabout_OF (f) DR_USA_Roundabout_FT (g) DR_USA_Roundabout_EP (h) DR_USA_Intersection_EP (i) DR_USA_Intersection_MA (j) DR_USA_Intersection_GL ! ! ! ! ! " " " " " # # # # ! # " # $ $ $ $ (c ) D R _U S A _R oundabout _SR ! # " $ % & $ % & ' (k) TC_BGR_Intersection_V A ! " # $ ! " # $ % & ' Fig. 2: A v ariety of highly interactiv e dri ving scenarios recorded by drones in the dataset, including: (a) urban mer ging, (b) highway ramp merging and lane change, (c)-(g) ﬁ ve roundabouts, and (h)-(j) unsignalized intersections, and (k) unprotected left turn at a signalized intersection. Lots of vehicles enter the roundabout at the same time with intensiv e interactions and relatively high speeds. The branches of the roundabouts in Fig. 2 (c)-(e) are controlled by yield signs, while all branches of the roundabout in Fig. 2 (g) are controlled by stop signs. Figure 2 (i) sho ws an extremely busy all-way-stop intersec- tion with 9 lanes controlled by stop signs. Multiple vehicles are interactiv ely inching to compete. The scenario shown in Fig. 2 (j) contains three branches (Branch 1, 2, 5) controlled by stop signs, while vehicles from Branch 3 and 6 have the right-of- way (RoW). Lots of vehicles are entering the intersections from all branches (except Branch 4), and vehicles holding RoW on the straight road are with relatively high speed. A busy all-way-stop T -intersection is sho wn in Fig. 2 (h), while three other branches (Branch 4-6) are also controlled by stop signs. B. Internationality The motion data was collected from three continents (North America, Asia and Europe). Motion data collected by drones are from four countries, namely , the US, China, Germany and Bulgaria, as indicated in the names of the scenarios ( USA/CHN/DEU/BGR ). V ehicles in all these countries are driv en on the right-hand side of the road. Howe ver , driving culture in these countries is with remarkable distinctions. W e provide motion data from three roundabouts with similar trafﬁc rules, namely , SR from the US, OF from German y and LN from China. All the three roundabouts do not have stop signs, and the nominal trafﬁc rule is that the vehicles entering the roundabout should yield the ones which is already in the roundabout. W e also provide motion data from two zipper merging scenarios, those are, MT from Germany and ZS from China (the upper two lanes in Fig. 2 (b)). Although MT is urban road and ZS is the entrance of highway , the “zipper” rule remains the same, and the speeds are similar when the trafﬁc is heavy . C. Complexity In addition to regular dri ving behavior such as car- following, lane change, stop and left/right/U-turn, our dataset emphasizes highly interactiv e and complex dri ving behavior with cooperative and adversarial motions of the vehicles. By carefully choosing the locations and corresponding rush hours for the data collection, we were able to gather large amounts of strong interactions within relativ e short period of time. Strongly interactiv e pairs of vehicles can ev en appear ev ery few seconds from time to time for scenarios such as the ramp in ZS , the entrance branches in FT , the all-way- stop intersections in EP and MA as well as the two-way-stop intersection in GL . t = 0 t = 1 s t = 3.2 s V0 V0 V0 V1 V1 V1 V2 V2 V2 Fig. 3: A sequence of images of a dangerous insertion in GL in the proposed dataset. Also, scenarios in FT and GL are relativ ely unstructured since there is no explicit lane restrictions in the roundabout or intersections. V ehicles can exploit the space to achieve their goals, sometimes showing irrational and highly dangerous behavior . For instance, Fig. 3 shows a dangerous insertion of V0 between two v ehicles ( V1 and V2 ) stopping and making left turns from Branch 3 to Branch 5 in GL (refer to Fig. 2). The driv er of V0 intended to driv e from Branch 1 to Branch 4 b ut there was no e xplicit road structure for the dri ver . Moreov er , aggressiv e or irrational behavior can often be found due to inexplicit nominal or practical RoW . V ehicles may arrive at the stop bars almost at the same time and driv ers may negotiate with each other by inching or ev en accelerating in MA and EP . The trafﬁc in FT and GL can be very busy and t = 0 t = 1 s t = 2 s V0 V0 V0 V1 V1 V1 Fig. 4: A sequence of images of a violation for the right-of-way in a roundabout in the proposed dataset. it may take ev en minutes for the vehicle without nominal RoW to enter and pass, making the driver impatient. Also, there may be a queue of vehicles waiting behind and e ven honking to put social pressures to the one in the front of queue. Although there are explicit traf ﬁc rules on who goes ﬁrst for roundabouts or 2-way-stop intersections, vehicles without nominal RoW may be aggressiv e, and vehicles with nominal RoW are mostly aware of such potential violations and are ready to react. F or example, V0 in Fig. 4 was entering the roundabout in FT from Branch 3, while V1 w as in the roundabout holding the RoW . Howe ver , V0 violated the rule and forced V1 to stop and yield. Those factors signiﬁcantly increase the complexity of the motions in the dataset and bring forward lots of challenging but valuable research topics for the community . D. Criticality As discussed in Section III-C, vehicles holding the nominal RoW (in the roundabout of FT or on the straight road of GL ) may often encounter slight violations from vehicles without nominal RoW (entering the roundabout or intersection from branches controlled by stop signs). Moreover , the vehicles holding the RoW may hav e relatively high speed (40 km/h or even higher). Therefore, critical situations can be observed in the dataset where time-to-collision-point (TTCP) can be extremely low . A slight collision can even be found in the dataset. t = 0 t = 0.2 s t = 0.4 s V0 V0 V0 V1 V1 V1 Fig. 5: A sequence of images of a near-collision case in the proposed dataset. Fig. 5 shows a near-collision case in GL . V0 was making a left turn from Branch 5 (with a stop sign) to Branch 6, while V1 (with the RoW) was going straight forward from Branch 6 to Branch 3 with a relati vely high speed. V1 had to execute emergenc y swerve to a void the collision with V0 , which was very dangerous. Besides the critical, near-collision cases, a slight collision shown in Fig. 6 can also be found in the dataset in GL . V0 was making a right turn from Branch 5 (with a stop sign) to t = 0 t = 0.17 s t = 0.27 s V0 V0 V0 V1 V1 V1 Fig. 6: A sequence of images of a slight collision in the proposed dataset. Branch 3, while V1 (with the RoW) w as making a right turn from Branch 6 to Branch 4. In this situation, the driv er of V0 might ha ve predicted that V1 was going straight to Branch 3, so that V0 could accelerate in advance. E. Semantic Map Map information is crucial for behavior -related research areas. The information required is twofold. The basic require- ment is the physical layer containing a set of points or curves representing curbs, road markings (lane markings, stop bars, etc.) and other ke y features. In addition to the physical layer , semantic information is also necessary , which includes but is not limited to, 1) reference paths, 2) lanelets as well as their connections and turn directions, 3) trafﬁc rules and RoW asso- ciated, etc. Moreover , such information needs to be organized with consistent format and toolkit to facilitate the users when utilizing the map. All the aforementioned requirements are met in our dataset, and more detailed information on map construction can be found in Section IV -C. I V . C O N S T R U C T I O N O F M OT I O N D AT A A N D M A P S In this section, we will discuss the pipeline for constructing the motion data from both drones and traf ﬁc cameras, as well as the corresponding semantic maps. A. Motions fr om Dr one Data W e used drones such as DJI Ma vic 2 and DJI Phantom 4 to collect the raw video data. The raw videos were 4K (3840x2160) by 30 Hz . W e downsampled the video to 10 Hz and process the data. The processed results are partially illustrated in Fig. 1. The bounding boxes are very accurate and the paths are smooth after going through out processing pipeline with the following three steps. • V ideo stabilization and alignment: Due to gradual or sud- den drift and rotation of drones, the collected videos need to be stabilized via video stabilization algorithms with transformation estimator . Also, similarity transformation is applied to project all the frames to the ﬁrst one and aligned with the map. • Detection: In order to obtain accurate bounding boxes of the moving obstacles, Faster R-CNN [26] is applied. The boxes are highly accurate, and v ery few inaccurate detections are rectiﬁed manually . • Data association, tracking and smoothing: Kalman ﬁlter is applied for data association and tracking. T o obtain smooth motions of the vehicles, a Rauch-T ung-Striebel (R TS) smoother [27] is also incorporated. Fig. 7: An ex emplary physical layer of a lanelet2 map [34]. B. Motions fr om T raf ﬁc Camera Data The data processing pipeline for motions from trafﬁc camera data mainly contains the following steps, and more details, including the camera parameter estimation, can be found in [28]. • Detection: T o detect vehicles and pedestrians in each frame, we use a state-of-the-art object detector [29], which provides detections with 2D bounding box, in- stance mask and instance type. • Data association: Detections are grouped into tracks using a combination of an Intersection-over -Union [30] tracker which associates detections with high mask over - lap in successive frames, and a visual tracker [31] to compensate for miss detections. • T rac king and smoothing: Once detections are grouped into tracks, trajectories on the ground plane are estimated using a R TS smoother . For the observation model, we use a pin-hole camera model [32]. This allows to incorporate measurements and uncertainty directly in pixels, captur- ing the uncertainty due to the resolution, position and orientation of the camera. For vehicles, the R TS smoother uses a bicycle model [33] as process model, allowing to capture the kinematics constraints of v ehicles. C. Construction of the High Deﬁnition Maps As public roads are structured en vironments, the particular road layout of a certain area strongly affects the motion of all trafﬁc participants. The structure for v ehicles mostly starts by subdi viding the road into lanes, and later combining them to create junctions, roundabouts, on ramps and so on. Further , mov ement within this structured area is guided by trafﬁc rules, such as speed limits or prioritizing one road over another . In order to model such coherence, simply mapping center -lines of all lanes is not suf ﬁcient anymore. T ABLE II: Summary of the dataset. Scenarios Locations V ideo length (min) number of vehicles T otal video length (min) T otal number of vehicles roundabout USA Roundabout SR 40.90 965 365.1 10479 CHN Roundabout LN 24.24 227 DEU Roundabout OF 55.04 1083 USA Roundabout FT 207.62 7496 USA Roundabout EP 37.30 708 unsignalized intersection USA Intersection EP 66.53 1367 433.33 14867 USA Intersection MA 107.37 2982 USA Intersection GL 259.43 10518 merging and lane change DEU Merging MT 37.93 574 132.55 10933 CHN Merging ZS 94.62 10359 signalized intersection TC Intersection V A 60 3775 60 3775 Thus, in order to allow for a thorough analysis of the recorded trajectories, we provide centimeter-accurate high deﬁnition maps in the lanelet2 format [34]. W ithin lanelet2, the physical layer of the road network, such as road borders, lane markings and trafﬁc signs is stored. An ex emplary physical layer is visualized in Figure 7. From this layer , atomic lane elements, called lanelets , are created. They describe the course of the lane and form the basis for so called regulatory elements, which determine trafﬁc regulations such as the right of way or the speed limit. When used alongside the recorded trajectories, these lanelet2 maps facilitate the reasoning about why some vehicles decelerate while approaching a junction, or why others do not, depending on the right of w ay but also on the presence of other trafﬁc participants that potentially interact. V . S TA T I S T I C S O F T H E D A TA S E T A. Scenarios and V ehicle Density The dataset contains motion data collected in four cate gories of scenarios: roundabout, unsignalized intersection, signalized intersection, merging and lane change, as shown in Fig. 2. A detailed summary of the dataset is listed in T able II. In the roundabout scenarios, 10479 trajectories of v ehicles from ﬁv e different locations were recorded for around 365 minutes. Similarly , in the unsignalized intersection scenarios, three locations were included and 14867 trajectories were collected for around 433 minutes. In the merging and lane change scenarios, 10933 trajectories were recorded at two locations for around 133 minutes. Finally , one location was selected for the signalized intersection, which provided 3775 trajectories for around 60 minutes. B. Metrics for Interactive Behavior Identiﬁcation T o represent the density of the interactive behavior of the proposed dataset, we use the metric - number of interaction pairs per v ehicle (IPV) as in proposed in [35]. T o calculate the IPV , a set of rules were proposed in [35] to e xtract the interactiv e beha vior under different spatial representations of vehicle paths. The set of rules and metric are brieﬂy re viewed below . 1) Minimum time-to-conﬂict-point dif ference ( 4 T T C P min ): 4 T T C P min is a metric to describe the relativ e states of two moving vehicles in a scenario where the paths of the two vehicles share a conﬂict point but without any forced stop. As shown in Fig. 8, such vehicle paths include two categories: (1) paths with static crossing or mer ging points such as intersections (Fig. 8 (a)-(b)), and (2) paths with dynamic crossing or merging points such as ramping and lane-changing, as shown in Fig. 8 (c)-(d). In such scenarios, merging can happen anywhere in the shaded area. W e deﬁne 4 T T C P min as 4 T T C P min = min t ∈ [ T start ,T end ] 4 T T C P t = min t ∈ [ T start ,T end ] ( T T C P t 1 − T T C P t 2 ) (1) where T T C P t i = 4 d t i /v t i , i = 1 , 2 is the traveling time to the conﬂict point of each vehicle in the interactiv e pairs. v t i and 4 d t i are, respectively , the speed of the i -th vehicle and its distance to the conﬂict point along the path at time t . For the scenarios with dynamic mer ging points, we use the actual merging points of the vehicle trajectories as the conﬂict points. In (1), T start and T end are set to be long enough to cover the interaction period between vehicles. If T T C P min ≤ 3 s , then it is deﬁned that interaction exists. 2) W aiting Period (WP): WP is a metric for vehicles with forced stops along their paths. In [35], the default waiting period at stops was set as 3 s , and the behavior deviation from the default one w as used as an indicator of the interacti vity , i.e., interaction e xists when WP > 3 s . C. Distribution of Interactivity Based on the set of rules, there are 13375 interactive pairs of vehicles in the proposed dataset. W e compare the inter- activity among three datasets: the proposed INTERA CTION dataset, the highD dataset, and the NGSIM dataset. Results are shown in Fig. 9, where the x-axis represents the length of 4 T T C P min in seconds, and the y-axis are the number of vehicles (Fig. 9 (a)) and the density of vehicles 1 (Fig. 9 (b)), respectiv ely . W e can see that the INTERACTION dataset contains more intensive interactions with 4 T T C P min ≤ 1 s. 1 The density is given by: density = number of vehicles with particular 4 T T C P min total number of vehicles in the dataset . (a) static crossing/merging points (b) dynamic crossing/merging points Fig. 8: Geometry of different interactive paths. In (a), the cross- ing/merging points between two paths are static and ﬁxed, while in (b), the crossing/merging points are dynamic. W e also summarized the distributions of 4 T T C P min and WP of all vehicles in the dataset ov er dif ferent dri ving sce- narios. The results are sho wn in Fig. 10. Similarly , the x-axis represents the length of 4 T T C P min and WP in seconds, and the y-axis is the density of vehicles in each scenario. W e can see that the dataset contains highly interactive trajectories with a high density of 4 T T C P min ≤ 1 s , and WP greater than 3 s . V I . U T I L I Z A T I O N E X A M P L E S The proposed dataset is intended to facilitate researches related to driving behavior , as mentioned in Section I. In this section, we provide several utilization examples of the pro- posed dataset, including motion/trajectory prediction, imitation learning, motion planning and validation, motion clustering and representation, interaction extraction and human-like be- havior generation. A. Motion Pr ediction and Behavior Analysis Motion/trajectory prediction is of vital importance for au- tonomous v ehicles, particularly in situations where intensiv e interaction happens. T o obtain an accurate probabilistic pre- diction model of vehicle motion, both learning- and planning- based approaches have been extensiv ely explored. By pro- viding high-density interactiv e trajectories along with HD semantic maps, the proposed dataset can be used for both approaches. For instance, [36] proposed a deep latent v ariable model based on W asserstein auto-encoder (W AE) to improve the interpretability . It incorporated the structure of recurrent neural network with vehicle kinematic model such that the output can be constrained. The motion data in FT was utilized to train and test the model in comparison with other state-of-the-art models such as variational auto-encoder (V AE), auto-encoder, and generati ve adv ersarial network (GAN). Quantitativ e results shown in Section VI-A demonstrated that the proposed W AE- based method can outperform other state-of-the-art models, T ABLE III: Comparisons of prediction accuracy from [36]. Methods features RMSE MAE W AE-based approach x 0.013 / 0.011 0.046/0.035 y 0.006/0.014 0.019/0.041 ψ 0.006/ 0.008 0.018/0.042 V AE x 0.018/0.016 0.25/0.22 y 0.006 / 0.003 0.14/0.22 ψ 0.006/ 0.008 0.13/0.21 Auto-encoder x 0.315/0.044 1.026/0.315 y 0.057/0.141 0.182/0.479 ψ 0.011/0.066 0.037/0.078 GAN x 0.024/0.020 0.324/0.273 y 0.007/0.017 0.188/0.241 ψ 0.005 /0.048 0.107/0.286 when comparing the root mean square error (RMSE) and mean absolute error (MAE) of the prediction for position and yaw angle. On the other hand, [37] took advantage of the HD semantic maps and combined the learning-based and the planning- based prediction methods. A deep learning model based on conditional v ariational auto-encoder (CV AE) and an optimal planning framew ork based on in verse reinforcement learning are dynamically combined to predict both irrational and ra- tional behavior of the vehicles. Beneﬁting from the the HD semantic information, features for the deep learning model were deﬁned in Frenet frame, which generated much better prediction performance in terms of generalization. Some ex- emplar results are giv en in Fig. 11. B. Imitation Learning The dri ving behavior in the proposed dataset can also be used for imitation learning which directly imitates how human driv e in complicated scenarios. W e extended the fast inte grated learning and control framew ork proposed in [38] in the FT roundabout scenario. As shown in Fig. 12, both the semantic HD map information and the states of surrounding vehicles (the red box es) were included as the features. The grey box represents the current position of the ego vehicle. The green boxes and blue boxes, respectively , are the ground truth future positions and generated future positions of the ego vehicle via the imitation network. C. V alidation of Decision and Planning Besides motion prediction and imitation, the motion data and maps in the dataset can also be used for testing dif ferent decision making and motion planning algorithms. The data- replay motions in the dataset are more suitable to test the performances of the decision-maker and planner when the motions of surrounding entities are independent of the ego motions. For example, the motion of the e go vehicle may not effect others when it does not hav e the RoW , or it has the RoW b ut others violate the rules or ignore the ego motion. The en vironmental representation and motion planning methods proposed in [39] were tested in the FT roundabout Fig. 9: Distribution of the 4 T T C P min in three vehicle motion datasets: the proposed INTERACTION dataset, the HighD dataset and the NGSIM dataset. Fig. 10: Distribution of the 4 T T C P min , and WP across dif ferent locations and scenarios in the dataset. collision_rate = 0.3 collision_rate = 0.7 collision_rate = 0.7 (c) pure learning-based method (d) the new method (b) satisfied samples (a) original samples Fig. 11: Some ex emplar prediction results from [37]. (a) (b) Fig. 12: T wo examples of the imitation learning results by employing the method in [38]. scenario. Fig. 13 is a bird’ s-eye-vie w screen-shot of the simula- tion. The red rectangle represents the autonomous vehicle with the planner in [39]. It was decelerating to avoid the collision with a vehicle entering the roundabout although it has the RoW . W e also combined the integrated decision and planning framew ork proposed in [40] and the sample-based motion planner proposed in [41] to design the decision-maker and planner under uncertainty . The predictor was designed accord- ing to [42] based on dynamic Bayesian network (DBN) to provide the probabilities of the intentions of others. Figure 14 sho ws the results of the planned speed proﬁle with corresponding bird’ s-eye-vie w screenshots of the situations at speciﬁc time steps. The host autonomous vehicle was entering the FT roundabout, and the vehicle in the roundabout, retrieved from the proposed dataset, was exiting. When it was not clear whether the target vehicle w as going to exit or not, such as the time step in Fig. 14 (a), the predictor returned P ( exit ) as 0 . 626 . With the non-conservati vely defensi ve strategy pro- posed in [40], the ego vehicle was able to keep accelerating to enter the roundabout as planned for the next 0 . 5 s , so that the potential threat with low probability (the target vehicle stays Fig. 13: A screen-shot of simulation when testing the motion planner in [39] with the proposed dataset. in the roundabout) did not affect the ef ﬁciency and comfort of the ego vehicle. The long-term planning corresponding to yielding case (red curve in Fig. 14 (b)) guaranteed that the ego vehicle was able to fully stop for the w orst case. P (e xi t ) = 0.626 P (exit) = 1 (a) (b) (c) (d) Fig. 14: Screenshots of the situations and corresponding planned speed proﬁles by implementing decision and planning methods in [40], [41] with predictor in [42] by utilizing the proposed dataset. D. Motion Clustering and Representation Learning The X-means algorithm [43] was employed to cluster the trajectories and obtain motion patterns with results sho wn in Fig. 15. W e constructed a feature space with vehicle motions in Fren ´ et Frame based on map information. Fig. 15 (a) shows the clustered trajectory segments in dif ferent colors with the map. Fig. 15 (b) and (d) demonstrate the cluster results with longitudinal positions and speeds of the two interacting vehicles as the coordinates. The clustering results with the ﬁrst and second components of principle component analysis (PCA) for the feature space are shown in Fig. 15 (c). In the ﬁgures we can see that different interacti ve motions are separated and similar ones are clustered, which are desirable results to obtain motion patterns. (a) (b) (c) (d) Fig. 15: Results of X-means [43] motion clustering using the pro- posed dataset. E. Extraction of Interactive Ag ents and T rajectories. The proposed dataset can also be used to learn the in- teraction relationships between agents. W e implemented the learning method and network structure proposed in [44] to extract the interaction frames of two agents. Some e xample results are gi ven in Fig. 16, where Fig. 16 (a) and (b) provide one exemplar pair of interacting cars in the FT scenario, while Fig. 16 (c) and (d) represent another pair . In Fig. 16 (a) and (c), the paths of both of the interacting cars are provided, and in Fig. 16 (b) and (d), the trajectories along longitudinal directions are shown. W e can see that the extracted interaction frames (purple circle) align quite well with the ground truth frames (blue star). F . Human-like Decision and Behavior Gener ation W e can also learn decision-making models that generate human-like decisions and behaviors with the proposed dataset. In [45], an interpretable human behavior model was proposed based on the cumulative prospect theory (CPT). As a non- expected utility theory , CPT can well explain some systemat- ically biased or “irrational” behavior/decisions of human that cannot be explained by the expected utility theory . Parameters of three dif ferent models were learned and tested using the data in the FT roundabout scenario: a predeﬁned model based on time-to-collision-point (TTCP), a learning-based model based on neural networks, and the proposed CPT -based model. The results (Fig. 17) sho wed that the CPT -based model outper- formed the TTCP model and achieved similar performance as the learning-based model with much less training data and better interpretability . V I I . C O N C L U S I O N In this paper, we presented a motion dataset in a variety of highly interactiv e dri ving scenarios from the US, Germany , China and other countries, including signalized/unsignalized intersections, roundabouts, ramp merging and lane change from cities and highway . Complex interacti ve motions were captured, featuring inexplicit right-of-way , relativ ely unstruc- tured roads, as well as aggressive and irrational behavior caused by impatience and social pressure. Critical (near- collision and slight-collision) situations can be found in the dataset. W e also included high-deﬁnition (HD) maps with semantic information for all scenarios in our dataset. The data was recorded from drones and trafﬁc cameras and the data processing pipeline was brieﬂy described. Our map- aided dataset with di versity , internationality , complexity and criticality of scenarios and behavior can signiﬁcantly facilitate driving-beha vior-related research such as motion prediction, imitation learning, decision-making and planning, representa- tion learning, interaction e xtraction, and human-like behavior generation, etc. Results from v arious kinds of methods of these research areas were demonstrated utilizing the proposed dataset. V I I I . A C K N O W L E D G E M E N T The authors also would like to thank the Karlsruhe House of Y oung Scientists (KHYS) for their support of Maximilian’ s research visit at MSC Lab . R E F E R E N C E S [1] S. Lef ` evre, D. V asquez, and C. Laugier, “A survey on motion prediction and risk assessment for intelligent vehicles, ” ROBOMECH Journal , vol. 1, no. 1, pp. 1–14, Jul. 2014. [2] A. Rudenko, L. Palmieri, M. Herman, K. M. Kitani, D. M. Gavrila, and K. O. Arras, “Human motion trajectory prediction: A survey , ” arXiv pr eprint arXiv:1905.06113 , 2019. [3] W . Zhan, A. de La Fortelle, Y .-T . Chen, C.-Y . Chan, and M. T omizuka, “Probabilistic prediction from planning perspecti ve: Problem formula- tion, representation simpliﬁcation and ev aluation metric, ” in Intelligent V ehicles Symposium (IV), 2018 IEEE , 2018, pp. 1150–1156. [4] H. Okuda, N. Ikami, T . Suzuki, Y . T azaki, and K. T akeda, “Modeling and Analysis of Driving Behavior Based on a Probability-W eighted ARX Model, ” IEEE T ransactions on Intelligent Tr ansportation Systems , vol. 14, no. 1, pp. 98–112, Mar . 2013. [5] K. Driggs-Campbell, V . Govindarajan, and R. Bajcsy , “Integrating Intu- itiv e Driv er Models in Autonomous Planning for Interactiv e Maneuvers, ” IEEE T ransactions on Intellig ent T ransportation Systems , v ol. 18, no. 12, pp. 3461–3472, Dec. 2017. [6] Q. Lin, Y . Zhang, S. V erwer, and J. W ang, “MOHA: A Multi-Mode Hybrid Automaton Model for Learning Car-Follo wing Behaviors, ” IEEE T ransactions on Intelligent T ransportation Systems , pp. 1–8, 2018. [7] W . W ang, W . Zhang, and D. Zhao, “Understanding V2V Driving Scenarios through Traf ﬁc Primitives, ” arXiv:1807.10422 [cs, stat] , Jul. 2018, arXiv: 1807.10422. [8] M. Kelly , C. Sidrane, K. Driggs-Campbell, and M. J. Kochenderfer , “HG-D Agger: Interactive Imitation Learning with Human Experts, ” to appear in IEEE International Conference on Robotics and Automation (ICRA) , 2019. (a) Paths of two interacting cars (b) trajectories along longitudinal direc- tions (c) Paths of two interacting cars (d) trajectories along longitudinal direc- tions Fig. 16: T wo examples of the extracted interaction pairs by implementing the learning method and network structure in [44]. Fig. 17: Results of interpretable human behavior model based on the cumulativ e prospect theory (CPT) [45] using the proposed dataset. [9] N. Rhinehart, R. McAllister, and S. Levine, “Deep Imitativ e Models for Flexible Inference, Planning, and Control, ” Oct. 2018. [10] L. Sun, W . Zhan, M. T omizuka, and A. D. Dragan, “Courteous au- tonomous cars, ” in 2018 IEEE/RSJ International Conference on Intelli- gent Robots and Systems (IR OS) . IEEE, 2018, pp. 663–670. [11] C. Guo, K. Kidono, R. T erashima, and Y . Kojima, “T oward Human-like Behavior Generation in Urban Environment Based on Markov Decision Process W ith Hybrid Potential Maps, ” in 2018 IEEE Intelligent V ehicles Symposium (IV) , Jun. 2018, pp. 2209–2215. [12] M. Naumann, M. Lauer, and C. Stiller , “Generating Comfortable, Safe and Comprehensible Trajectories for Automated V ehicles in Mixed T rafﬁc, ” in Pr oc. IEEE Intl. Conf. Intelligent T ransportation Systems , Hawaii, USA, Nov 2018, pp. 575–582. [13] V . Alexiadis, J. Colyar , J. Halkias, R. Hranac, and G. McHale, “The Ne xt Generation Simulation Program, ” Institute of T ransportation Engineers. ITE Journal; W ashington , vol. 74, no. 8, pp. 22–26, Aug. 2004. [14] L. Sun, W . Zhan, and M. T omizuka, “Probabilistic Prediction of Interac- tiv e Driving Beha vior via Hierarchical Inverse Reinforcement Learning, ” in 2018 21st International Conference on Intelligent T ransportation Systems (ITSC) , Nov . 2018, pp. 2111–2117. [15] W . Zhan, L. Sun, Y . Hu, J. Li, and M. T omizuka, “T owards a Fatality- A ware Benchmark of Probabilistic Reaction Prediction in Highly In- teractiv e Driving Scenarios, ” in 2018 21st International Confer ence on Intelligent T ransportation Systems (ITSC) , Nov . 2018, pp. 3274–3280. [16] F . Altch ´ e and A. de La Fortelle, “An LSTM network for highway trajectory prediction, ” in 2017 IEEE 20th International Conference on Intelligent T ransportation Systems (ITSC) , Oct. 2017, pp. 353–359. [17] R. Krajewski, J. Bock, L. Kloeker , and L. Eckstein, “The highD Dataset: A Drone Dataset of Naturalistic V ehicle Trajectories on German High- ways for V alidation of Highly Automated Dri ving Systems, ” in 2018 21st International Confer ence on Intelligent Tr ansportation Systems (ITSC) , Nov . 2018, pp. 2118–2125. [18] K. Messaoud, I. Y ahiaoui, A. V erroust-Blondet, and F . Nashashibi, “Relational recurrent neural networks for vehicle trajectory prediction, ” in 2019 IEEE Intelligent T ransportation Systems Confer ence (ITSC) , 2019. [19] M.-F . Chang, J. Lambert, P . Sangkloy , J. Singh, S. Bak, A. Hartnett, D. W ang, P . Carr , S. Lucey , D. Ramanan, and J. Hays, “ Argoverse: 3d Tracking and Forecasting With Rich Maps, ” in Proceedings of the IEEE Conference on Computer V ision and P attern Recognition , 2019, pp. 8748–8757. [20] Z. Kim, G. Gomes, R. Hranac, and A. Skabardonis, “ A machine vision system for generating vehicle trajectories ov er extended freeway seg- ments, ” in 12th W orld Congr ess on Intelligent T ransportation Systems , 2005. [21] B. Coifman and L. Li, “ A critical ev aluation of the Next Generation Sim- ulation (NGSIM) vehicle trajectory dataset, ” T ransportation Resear ch P art B: Methodological , vol. 105, pp. 362–377, Nov . 2017. [22] D. Y ang, L. Li, K. Redmill, and U. ¨ Ozg ¨ uner , “T op-view Trajectories: A Pedestrian Dataset of V ehicle-Crowd Interaction from Controlled Experiments and Crowded Campus, ” arXiv:1902.00487 [cs] , Feb . 2019, arXiv: 1902.00487. [23] A. Robicquet, A. Sadeghian, A. Alahi, and S. Sa varese, “Learning Social Etiquette: Human Trajectory Understanding In Crowded Scenes, ” in ECCV 2016 . Springer International Publishing, 2016, pp. 549–565. [24] V . Ramanishka, Y .-T . Chen, T . Misu, and K. Saenko, “T oward Driving Scene Understanding: A Dataset for Learning Driver Behavior and Causal Reasoning, ” in Pr oceedings of the IEEE Confer ence on Computer V ision and P attern Recognition , 2018, pp. 7699–7707. [25] V . L. Neale, T . A. Dingus, S. G. Klauer, J. Sudweeks, and M. Goodman, “ An overview of the 100-car naturalistic study and ﬁndings, ” National Highway Tr afﬁc Safety Administration, P aper , vol. 5, p. 0400, 2005. [26] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: T owards Real- T ime Object Detection with Region Proposal Networks, ” in Advances in Neural Information Pr ocessing Systems , 2015, pp. 91–99. [27] D. Simon, Optimal State Estimation: Kalman, H Inﬁnity , and Nonlinear Appr oaches . New Y ork, NY , USA: Wile y-Interscience, 2006. [28] A. Clausse, S. Benslimane, and A. De La Fortelle, “Large-scale extrac- tion of accurate vehicle trajectories for driving behavior learning, ” 30th IEEE Intelligent V ehicles Symposium (IV) , 2019. [29] K. He, G. Gkioxari, P . Doll ´ ar , and R. B. Girshick, “Mask R-CNN, ” 2017 IEEE International Confer ence on Computer V ision (ICCV) , pp. 2980–2988, 2017. [30] E. Bochinski, V . Eiselein, and T . Sikora, “High-speed tracking-by- detection without using image information, ” in International W orkshop on T rafﬁc and Street Surveillance for Safety and Security at IEEE A VSS 2017 , Lecce, Italy , Aug. 2017. [Online]. A vailable: http://elvera.nue.tu- berlin.de/ﬁles/1517Bochinski2017.pdf [31] A. Luke ˇ zi ˇ c, T . V oj’i ˇ r, L. ˇ Cehovin Zajc, J. Matas, and M. Kristan, “Dis- criminativ e correlation ﬁlter tracker with channel and spatial reliability , ” International Journal of Computer V ision , 2018. [32] D. C. Brown, “Close-range camera calibration, ” PHOT OGRAMMETRIC ENGINEERING , vol. 37, no. 8, pp. 855–866, 1971. [33] P . Polack, F . Altch ´ e, B. d’Andr ´ ea-Novel, and A. de La F ortelle, “The kinematic bicycle model: A consistent model for planning feasible trajectories for autonomous vehicles?” in 2017 IEEE Intelligent V ehicles Symposium (IV) , June 2017, pp. 812–818. [34] F . Poggenhans, J. Pauls, J. Janosovits, S. Orf, M. Naumann, F . Kuhnt, and M. Mayr , “Lanelet2: A high-deﬁnition map framew ork for the future of automated driving, ” in 2018 21st International Confer ence on Intelligent T ransportation Systems (ITSC) , Nov . 2018, pp. 1672–1679. [35] W . Zhan, L. Sun, D. W ang, Y . Jin, and M. T omizuka, “Constructing a Highly Interactiv e V ehicle Motion Dataset, ” in 2019 IEEE/RSJ Interna- tional Conference on Intelligent Robots and Systems (IROS) , 2019. [36] H. Ma, J. Li, W . Zhan, and M. T omizuka, “W asserstein Generative Learning with Kinematic Constraints for Probabilistic Prediction of Interactiv e Driving Behavior , ” in 2019 IEEE Intelligent V ehicles Sym- posium , 2019. [37] Y . Hu, L. Sun, and M. T omizuka, “Generic prediction architecture considering both rational and irrational driving behaviors, ” in 2019 22st International Confer ence on Intelligent Tr ansportation Systems (ITSC), to appear , 2019. [38] L. Sun, C. Peng, W . Zhan, and M. T omizuka, “ A Fast Integrated Planning and Control Framework for Autonomous Driving via Imitation Learning, ” in ASME 2018 Dynamic Systems and Contr ol Conference . American Society of Mechanical Engineers, Sep. 2018, pp. 1–11. [39] W . Zhan, J. Chen, C. Y . Chan, C. Liu, and M. T omizuka, “Spatially- partitioned en vironmental representation and planning architecture for on-road autonomous driving, ” in 2017 IEEE Intelligent V ehicles Sympo- sium (IV) , Jun. 2017, pp. 632–639. [40] W . Zhan, C. Liu, C. Y . Chan, and M. T omizuka, “ A non-conservati vely defensiv e strategy for urban autonomous driving, ” in 2016 IEEE 19th International Confer ence on Intelligent Tr ansportation Systems (ITSC) , pp. 459–464. [41] T . Gu, J. Atwood, C. Dong, J. M. Dolan, and J.-W . Lee, “T unable and stable real-time trajectory planning for urban autonomous driving, ” in 2015 IEEE/RSJ International Conference on Intellig ent Robots and Systems (IROS) . IEEE, 2015, pp. 250–256. [42] J. Schulz, C. Hubmann, J. Lchner, and D. Burschka, “Interaction-A ware Probabilistic Behavior Prediction in Urban Environments, ” in 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IR OS) , Oct. 2018, pp. 3999–4006. [43] D. Pelle g, A. W . Moore et al. , “X-means: Extending k-means with efﬁcient estimation of the number of clusters. ” in ICML , vol. 1, 2000, pp. 727–734. [44] T . Shu, Y . Peng, L. Fan, H. Lu, and S.-C. Zhu, “Perception of human interaction based on motion trajectories: From aerial videos to decontextualized animations, ” T opics in cognitive science , vol. 10, no. 1, pp. 225–241, 2018. [45] L. Sun, W . Zhan, Y . Hu, and M. T omizuka, “Interpretable modelling of driving behaviors in interactive driving scenarios based on cumula- tiv e prospect theory , ” in 2019 IEEE Intelligent T ransportation Systems Confer ence (ITSC) , 2019.

INTERACTION Dataset: An INTERnational, Adversarial and Cooperative moTION Dataset in Interactive Driving Scenarios with Semantic Maps

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment