Deconfounded Lifelong Learning for Autonomous Driving via Dynamic Knowledge Spaces
End-to-End autonomous driving (E2E-AD) systems face challenges in lifelong learning, including catastrophic forgetting, difficulty in knowledge transfer across diverse scenarios, and spurious correlations between unobservable confounders and true dri…
Authors: Jiayuan Du, Yuebing Song, Yiming Zhao
Deconfounded Lifelong Learning for Autonomous Driving via Dynamic Kno wledge Spaces Jia yuan Du ∗ , Y uebing Song ∗ , Yiming Zhao , Xianghui P an, Jiaw ei Lian, Y uch u Lu, Liuyi W ang, Cheng ju Liu † , and Qijun Chen † T ong ji Univ ersity , Shanghai, China {dujiayuan, 2431997, liuchengju, qjchen}@tongji.edu.cn ∗ Equal con tributions † Corresp ondence Abstract. End-to-End autonomous driving (E2E-AD) systems face chal- lenges in lifelong learning, including catastrophic forgetting, difficult y in kno wledge transfer across div erse scenarios, and spurious correlations b et w een unobserv able confounders and tru e driving inten ts. T o address these issues, w e prop ose DeLL, a Deconfounded Lifelong Learning frame- w ork that in tegrates a Dirichlet pro cess mixture mo del (DPMM) with the fron t-do or adjustment mechanism from causal inference. The DPMM is employ ed to construct t wo dynamic kno wledge spaces: a tra jectory kno wledge space for clustering explicit driving b ehaviors and an im- plicit feature kno wledge space for discov ering latent driving abilities. Lev eraging the non-parametric Bay esian nature of DPMM, our frame- w ork enables adaptiv e expansion and incremental up dating of kno wledge without predefining the n umber of clusters, thereb y mitigating catas- trophic forgetting. Meanwhile, the fron t-do or adjustment mechanism uti- lizes the DPMM-deriv ed kno wledge as v alid mediators to deconfound spurious correlations, such as those induced by sensor noise or en viron- men tal c hanges, and enhances the causal expressiv eness of the learned represen tations. A dditionally , w e in tro duce an ev olutionary tra jectory deco der that enables non-autoregressiv e planning. T o ev aluate the life- long learning p erformance of E2E-AD, we prop ose new ev aluation pro- to cols and metrics based on Bench2Driv e. Extensive ev aluations in the closed-lo op CARLA simulator demonstrate that our framework signifi- can tly improv es adaptability to new driving scenarios and o verall driving p erformance, while effectiv ely retaining previous acquired kno wledge. Keyw ords: End-to-End Autonomous Driving · Lifelong Learning · CARLA 1 In tro duction End-to-End autonomous driving (E2E-AD) metho ds [15, 17, 18, 40, 46, 57] ha ve ac hieved remark able performance in closed-loop CARLA [11] sim ulator. Ho w- ev er, their deploymen t in op en-w orld, non-stationary environmen ts is severely hindered by catastrophic forgetting and causal confusion [7, 10]. Imitation learn- ing arc hitectures, op erating primarily as correlational engines, struggle to con- tin uously assimilate new scenarios without ov erwriting historical parameters, as 2 J. Du, Y. Song et al. D ri vi ng s c ore , s uc c e s s ra t e a nd m ul t i - a bi l i t y m e a n duri ng t he l e a rni ng proc e s s (a ) (b) (c ) Fig. 1: Driving score, success rate and m ulti-ability success rate during the lifelong learning process. Our metho d not only demonstrates sup erior ov erall driving p erfor- mance but also substan tially mitigates catastrophic forgetting. sho wn in Fig. 1. F urthermore, treating driving as a partially observ able Marko v decision process reveals that these mo dels frequen tly capture spurious correla- tions induced b y unobserv ed confounders, leading to decision-making failures under contin uous cov ariate shifts [42]. Existing metho ds either impro ve the mo deling ability through transformer- based metho ds and div erse auxiliary tasks [15, 17, 21, 46, 57], or improv e the in ter- pretabilit y of the mo del through in terpretable intermediate represen tations and visualizations [8, 19, 44], or impro ve the reasoning abilit y and tra jectory general- ization p erformance of the model b y means of large mo dels (LLMs) and diffusion p olicy [13, 30, 40, 49, 52]. How ever, emp ow ering mo dels with lifelong learning ca- pabilities and mitigating causal confusion issues through dynamic kno wledge spaces seems to b e unexplored. In the closed-lo op simulator CARLA [11], there is also a lack of a lifelong learning b enchmark. T o address aforementioned problems, we introduce DeLL , a De confounded L ifelong L earning framew ork for E2E-AD. The prop osed arc hitecture fundamen- tally resolves the rigidit y of fixed-capacit y netw orks b y introducing dynamic dual kno wledge spaces gov erned by a Dirichlet pro cess mixture mo del (DPMM) [28]. As a Bay esian non-parametric model, it dynamically instantiates new cluster comp onen ts (knowledge anchors) as nov el driving scenarios are encountered, pre- serving historical knowledge in isolated, sp ecialized distributions. Our dynamic kno wledge spaces alleviate catastrophic forgetting without relying on rigid task b oundaries [35] or computationally exp ensiv e exp erience replay buffers [3]. The dynamically generated knowledge anc hors also serv e as the mediator v ariables required for causal fron t-do or adjustm en t [38, 48] in our causal feature enhance- men t mo dule. W e implemen t fron t-do or adjustmen t in a nov el attention-based w ay to mitigate the spurious correlations of unobserv able confounders. By con- tin uously injecting accumulated historical priors bac k into the net w ork’s forw ard propagation process, it ac hieves hi ghly efficient knowledge transfer, fundamen- tally supporting the lifelong learning goals. W e also design the evolutionary tra jectory deco der to matc h our dynamic kno wledge spaces. In order to ver- ify the effectiv eness of the proposed method, w e in tegrate the ev aluation pro- DeLL 3 to cols of lifelong learning, design the lifelong learning task sequence based on Benc h2Drive’s m ulti-ability classification, and in tro duce the corresp onding ev al- uation metrics. Our metho d outp erforms the previous state-of-the-art in both lifelong learning and full-data learning settings. Our contributions are summarized as follows: – A nov el deconfounded lifelong learning framework “DeLL” for E2E-AD. – DPMM-based dual knowledge spaces dynamically preserve and up date latent feature and tra jectory kno wledge. – A causal feature enhancement mo dule lev erages knowledge spaces and fron t- do or adjustmen t to enhance features and alleviates spurious correlations caused by unobserv able confounders. – An evolutionary tra jectory deco der that natively supp orts non-autoregressive, parallel tra jectory generation. – A new lifelong learning ev aluation proto col based on the Benc h2Drive bench- mark, where our metho d achiev es state-of-the-art closed-lo op p erformance. 2 Related W ork 2.1 End-to-End Autonomous Driving in CARLA E2E-AD in CARLA [11] closed-lo op simulator is predominantly built upon b e- ha vior cloning paradigms. Early w orks like LBC [6] pioneers the teacher-studen t distillation approach, transferring privileged information to a vision-only p olicy . LA V [5] expands this idea b y incorp orating multi-agen t tra jectory data to b etter understand so cial interactions. TCP [50] proposes tra jectory-guided control with uncertain ty estimation for safer decision-making. The adven t of transformers rev olutionized sensor fusion. T ransfuser [9] first in tro duces cross-mo dal attention mec hanisms. Its successor T ransfuser++ [17, 57] enhances temporal mo deling for o ccluded scenarios and long horizon plan- ning. In terF user [44] and E2E-Parking [51] use transformer enco der to fuse multi- mo dal feature and transformer deco der to generate sequential w ayponits. Driv- eT ransformer [21] further unified p erception and planning within transformer framew orks. T o address the "blac k-b ox" critique, man y w orks in tro duce in terpretabil- it y through auxiliary tasks. Approaches such as P anT [41], UniAD [15], and V AD [22] emphasizes joint optim ization of driving sub-tasks. SimLingo [40] further integrates large language m odels to jointly address closed-lo op driv- ing, vision-language understanding, and language-action alignmen t. ORION [13] com bines neural planners with symbolic guards to enforce explicit traffic rules. ThinkT wice [19] emplo ys a t wo-stage deco der with attention-based rationaliza- tion, while HiP-AD [46] uses hierarchical predictiv e mo deling. Resource-efficien t designs hav e also b een explored. AD-MLP [55] replaces transformers with MLP-Mixers for real-time inference. Driv eMoE [52] employs dynamic mixture-of-experts routing. DriveA dapter [18] enables plug-and-pla y 4 J. Du, Y. Song et al. adaptation of foundation mo dels. DiffAD [49] applies diffusion mo dels to handle am biguous scenarios through probabilistic sampling. Ho wev er, existing closed-lo op E2E-AD metho ds still face fundamental chal- lenges in incrementally learning new driving abilities in dynamic op en en viron- men ts as h umans do. In this pap er, we explore ho w to integrate lifelong learning in an E2E-AD framework. 2.2 Lifelong Learning Lifelong learning, also kno wn as incremen tal learning, aims to enable mo dels to acquire new kno wledge while retaining previously learned information, with catastrophic forgetting b eing the core c hallenge [47]. Existing approaches fall in to three categories [34]. Arc hitectural strategies dynamically expand net work structures: PNN [43] allocate isolated columns for eac h task with lateral con- nections, while ExpertGate [2] trains dedicated exp erts with a gating net work for selection. Regularization strategies constrain parameter updates based on imp ortance: EWC [26] p enalizes c hanges to critical parameters via Fisher infor- mation, MAS [1] computes imp ortance from output sensitivity , LwF [29] applies kno wledge distillation using old mo del outputs as supervision, AF A [54] fur- ther introduces multi-lev el feature alignment losses. Rehe arsal strategies revisit past data by storing exemplars or generating syn thetic samples: iCaRL [3] com- bines distillation with prototype replay , F earNet [24] mimics biological memory with separate mo dules for rapid learning and long-term storage, generativ e mo d- els [14, 25] pro duce pseudo-samples for replay [45, 56]. Most existing lifelong learning metho ds fo cus on alleviating forgetting, while pa ying insufficient attention to the organization of kno wledge itself. Moreo ver, the lifelong learning in the field of closed-loop autonomous driving in CARLA seems to b e unexplored. 3 Metho d W e introduce DeLL, a nov el deconfounded lifelong learning framework for E2E- AD. As sho wn in Fig. 2, the o verall workflo w of the metho d features core mo dules including a multi-modal perception backbone, dynamic dual knowledge spaces, causal feature enhancemen t mo dules, and an ev olutionary tra jectory deco der. 3.1 Multi-mo dal P erception Backbone The fron t end of the netw ork utilizes T ransfuser++ [17, 57], which b oasts state- of-the-art p erformance, as the base extractor. It first processes R GB image se- quences and LiDAR p oint clouds independently using RegNetY [39] to obtain m ulti-sacale features. Subsequen tly , cross-mo dal cues are fused via a m ulti-scale cross-atten tion mechanism, pro jecting them to generate high-dimension Bird’s Ey e View (BEV) feature maps F bev ∈ R 8 × 8 × 256 . DeLL 5 T r aj ect o r y E n co d er P l a nni ng I nt e r a c t i o n T r an s f o r m er E v o l u t i o n ar y T r aj ect o r y D eco d er Q K, V C o n tr o l S i gna l T r aj ect o r y S p eed D ua l - b r an ch H ead C as cad ed P ID D eco d er S co r e H ead O f f s et H ead to p - k r o ut e r R G B L i DAR T F ++ B ack b o n e A uxi l i a r y T a s ks B E V S e g . & De t . M u lti - m o d al P er cep t i o n Cr o s s - A ttn C a us a l E nha nc e m e nt T ra n s f o rm e r D eco d er S p eed & T P FFE M T FE M Cr o s s - A ttn F eat u r e a nc ho r s T r aj ect o r y a nc ho r s DP M M - b as ed D u al K n o w l ed g e S p aces F eat . K n o w l ed g e Sp a c e s ( FK S) T r aj . K no w l e d ge S p aces ( T K S ) L i n ear Se l f - A ttn σ L i n ear Se l f - A ttn σ Fig. 2: Overview architecture of our prop osed metho d. W e also in tro duce auxiliary learning tasks (BEV semantic segmentation and detection) to enric h the BEV represen tations with clear geometric and semantic b oundaries. These geometrically constrained BEV features are then transformed and concatenated with the ego-v ehicle’s curren t velocity feature and target point feature. Finally , a transformer deco der equipp ed with 11 learnable queries po ols these panoramic features in to a highly compact fused scene represen tation vector F f used ∈ R 11 × 256 . 3.2 Dynamic Dual Kno wledge Spaces T o ov ercome the memory b ottleneck of fixed-capacit y netw orks under contin uous tasks, w e emplo y the Diric hlet process mixture model (DPMM) [28] to construct explicit and implicit dual knowledge spaces in a dynamic manner. As a t ypical Ba yesian non-parametric statistical model, DPMM allo ws the data to sponta- neously dictate the n umber of cluster comp onen ts, a prop erty naturally suited to scenarios in lifelong learning where unkno wn data contin uously flow in while the b oundary is unkno wn. DPMM emplo ys the Dirichlet pro cess as a prior to c haracterize how data are generated from an infinite n um b er of potential clusters. The Diric hlet process can b e formalized as G ∼ D P ( α, H ) , where G is a random probabilit y measure o ver the parameter space, H is the base distribution that represents the prior exp ected parameter distribution, and α is the concen tration parameter, which con trols the aggressiv eness of generating new clusters. Expressed using the stick- breaking pro cess, the generativ e pro cess of DPMM can be represen ted as follo ws: 6 J. Du, Y. Song et al. θ k | λ ∼ H ( λ ) , π | α ∼ GEM( α ) , v i | π ∼ Cat( π ) , x i | v i ∼ F ( θ v i ) , (1) where θ k is a laten t v ariable drawn indep endently from a Dirichlet pro cess prior, with G as its base distribution. The mixing prop ortions π is sampled from a gen- eralized Ewens distribution (GEM). V ariable v i assigns data p oin t x i to a cluster and tak es on the v alue k with probability π k , whic h is dra wn from a categori- cal distribution (Cat). Each data x i is sampled from the b elonged distribution F ( θ v i ) . Clustering emerges naturally in the DPMM because observ ations that share the same latent parameter θ k , which is dra wn from a discrete distribution, are automatically group ed together. In our approach, we assume that the parameters of each comp onen t in the DPMM adhere to a shared Normal-Wishart base distribution. F or computational efficiency , w e also assume that each activ e comp onen t in the DPMM can b e represen ted as a multi-v ariate Gaussian with only a diagonal cov ariance matrix. Our framework instan tiates DPMM knowledge spaces across tw o different lev els: F eature Knowledge Space (FKS) is an implicit feature space used to cluster the fused features F f used extracted by the backbone netw ork. The mis- sion of FKS is to mine and precipitate latent top ological causal structures in the environmen t. As the learning sequence progresses, this space automatically extracts the center p oin t of each cluster as feature kno wledge anc hors, denoted as A f eat ∈ R K f × 2816 , where K f is the total n um b er of en vironmental feature patterns currently iden tified by the DPMM. T ra jectory Kno wledge Space (TKS) is an explicit kinematic space that directly clusters the ground truth expert tra jectories from historical training data using DPMM. It constructs a ph ysical prior action library co vering maneu- v ers like lane changing, and sharp turns. The corresp onding cluster centers are extracted as tra jectory knowledge anc hors, denoted as A traj ∈ R K t × 20 , where K t is the dynamically evolving n umber of tra jectory prototypes. Giv en the intractabilit y of exact posterior inference in DPMM, w e adopt memoVB [16] as an efficient approximation. memoVB decomp oses global suffi- cien t statistics into mini-batch summations, enabling online coordinate ascen t up dates. It lev erages the nonparametric nature of DPMM to dynamically adjust cluster counts via birth and merge heuristics, helping the mo del escape local optima and con tin uously adapt to ev olving data streams. During training, the DPMM and the neural net work are up dated in an alternating manner. 3.3 Causal F eature Enhancemen t Mo dule There are unobserv able confounders in the latent space that may cause spurious correlations betw een perception and action. T o mitigate the p otential influence of unobserv able confounders, w e introduce the causal feature enhancement mo dule inspired by fron t-do or adjustment [38, 48]. The core idea of front-door adjustmen t is to construct a front-door path b e- t ween input X and output Y via a mediator v ariable M ( X → M → Y ), and M DeLL 7 in tercepts all directed paths from X to Y , while there are no un blo ck ed bac k- do or paths from M to Y . Then even in the presence of unobserved confounder U ( X ← U → Y ), the interv entional distribution P ( Y | do ( X )) can still b e unbi- asedly iden tified and calculated via the standard front-door adjustment form ula: P ( Y = y | do ( X = x )) = X m P ( m | x ) X x ′ P ( y | x ′ , m ) P ( x ′ ) . (2) In our design, the sets of knowledge anchors generated by DPMM ( A f eat and A traj ) p erfectly serv e as the discrete state space for this legitimate observ ed me- diator v ariable M . This causal interv ention pro cess is divided into tw o cascaded sub-mo dules, b oth adopting a unified dual-atten tion and gated fusion architec- ture to impro ve co de reusabilit y . F used F eature Enhancemen t Mo dule (FFEM) pro cesses the raw m ulti- mo dl fused features F f used ∈ R 11 × 256 . First, the feature kno wledge anc hors A f eat are mapp ed to a latent space via a pro jection netw ork to obtain ˆ F f used ∈ R K f × 256 . Next, a Self-atten tion la yer extracts the in ternal dep endencies of the curren t input features, follo wed b y a cross-atten tion lay er that uses the input fea- tures as queries (Q) and the pro jected kno wledge anc hors as k eys (K) and v alues (V). This op eration essen tially calculates the expectation term P m P ( m | x ) in the form ula, searc hing for the most fitting historical pure causal template for the current cluttered scene: F enhan = A ttn ( Q = F input , K = V = F know ledge ) + F input . (3) Finally , a learnable gating net work built with a Sigmoid-activ ated Multi- La yer P erceptron (MLP) adaptiv ely calculates the fusion weigh t w , smoothly com bining the raw input with the causally enhanced features: ( F output = w ⊙ F enhan + (1 − w ) ⊙ F input , w = σ MLP F enhan + MLP A ttn Q = K = V = F input . (4) The FFEM ultimately outputs the de-confounded enhanced fused features F f used ′ ∈ R 11 × 256 . T ra jectory F eature Enhancemen t Mo dule (TFEM) receives the sub- set of the FFEM output sp ecifically resp onsible for tra jectory prediction F traj = F f used ′ 1:10 ∈ R 10 × 256 . TFEM transforms the geometric co ordinate anchors A traj from the tra jectory knowledge space in to high-dimensional spatiotemp oral fea- tures ˆ F traj ∈ R K t × 10 × 256 via positional encoding and temporal extension pro- jection. Subsequently , TFEM reuses the exact same cross-attention and gated in terven tion mechanism, outputting tra jectory features containing strong causal kinematic constrain ts F traj ′ ∈ R 10 × 256 . Mean while, the target speed feature F speed = F f used ′ 11 is passed through directly in a residual format to the do wn- stream sp eed deco der. 8 J. Du, Y. Song et al. 3.4 Ev olutionary T ra jectory Deco der T o address the architectural rigidit y of traditional fixed-channel decoders in lifelong learning, w e propose an ev olutionary tra jectory decoder driv en b y a dynamic tra jectory knowledge base. Leveraging the transformer’s p ermutation in v ariance and sequence flexibilit y , the deco der maps heterogeneous tra jectory anc hors A traj in to dynamic planning tok en em b eddings via a temp oral encod- ing netw ork. As the DPMM expands the cluster coun t K t with accumulated exp erience, this tok en po ol gro ws naturally to ac hiev e un b ounded kno wledge acquisition. These tok ens then serv e as queries in a planning interaction trans- former, performing cross-atten tion against scene context features to ev aluate the relev ance of historical driving patterns in parallel. F or tra jectory generation, a dual-branc h decoupled prediction head replaces traditional autoregressive meth- o ds with a parallel strategy , where a coarse-grained branch computes selection scores Y log its ∈ R K t for eac h anc hor, while a fine-grained branc h predicts geo- metric offsets Y of f sets ∈ R K t × 20 to refine the predicted co ordinates. The Y log its is then transferred in to a temperature-scaled probabilit y distribution ˆ Y probs . Finally , the system generates a candidate set ˆ Y traj s b y applying offsets to the original anchors, with the final execution output ˆ Y traj selected via a T op-K rout- ing mechanism based on ˆ Y probs . The core mathematical expression for decoding is as follo ws: ˆ Y traj s = MLP A ttn Q = E A traj , K = V = F traj + A traj , ˆ Y probs = φ MLP A ttn Q = E A traj , K = V = F traj , τ , ˆ Y traj = ˆ Y traj s T opK ( ˆ Y probs ,k ) , (5) where φ [ · ] is Softmax function and τ is the temp erature factor. 3.5 Loss F unction The ov erall loss function is given by: L = L sem + L det + L traj + L speed , (6) where the BEV semantic segmentation loss L sem and the target sp eed loss L speed are b oth cross-en tropy losses . The BEV detection loss L det is following the pat- tern of CenterNet [12], whic h consists of heatmap loss, offset loss and size loss. The tra jectory loss L traj consists of three parts, as given b elo w: L traj = L prob + L best + L weig hted . (7) F or anchor selection probability loss L prob , we use KL div ergence to minimize the difference b etw een the predicted probability distribution and the real probability distribution, given b y: L prob = D K L Y probs ∥ ˆ Y probs , Y probs = φ h − ˆ Y traj s − Y traj 2 , τ i , (8) DeLL 9 where the ground-truth anc hor selection probability distribution Y probs is d e- fined b y applying the softmax function to the negated distances betw een all pre- dicted tra jectories and the ground-truth tra jectory Y traj . The best-tra jectory loss L best and the weigh ted-tra jectory loss L weig hted are both supervised using the smo oth L1 loss. The former measures the deviation b etw een the ground truth and the closest predicted tra jectory , while the latter computes the w eigh ted sum of deviations from all anc hor tra jectories, with w eights b eing the ground-truth selection probabilities. This is form ulated as follows: L best = S moothL 1( ˆ Y traj − Y traj ) , L weig hted = Y probs · S moothL 1( ˆ Y traj s − Y traj ) . (9) 4 Exp erimen ts 4.1 Dataset and Metrics Inspired b y existing lifelong learning metho ds [31, 36, 53], we construct a stream- ing data training pip eline for lifelong learning based on the Bench2Driv e b enc h- mark [20]. The dataset comprises approximately 512K frames cov ering five key driving competencies: Emergency Braking, T raffic Sign Recognition, Merging, Ov ertaking, and Giving W ay . W e leverage this inherent categorization to de- fine five sequential learning tasks, sim ulating a lifelong learning scenario where tasks are encountered in order. The data v olume for eac h comp etency decreases sequen tially , with approximately 184K, 146K, 92K, 78K, and only 11K frames, resp ectiv ely . T o comprehensively ev aluate the lifelong learning capabilit y of mo d- els, we organize the tasks in the aforementioned order corresp onding to decreas- ing data volume and increasing learning difficult y , thereby rigorously testing the mo del’s resistance to forgetting and its knowledge transfer ability .. In addition to the well-established ev aluation criteria in CARLA [11] and Benc h2Drive [20], w e define a suite of metrics for lifelong learning that follo ws common practices in the field [23, 32, 37]. Our framew ork comprises three cate- gories: v ertical metrics for temp oral stabilit y , horizon tal metrics for knowledge transferabilit y , and comprehensiv e metrics for o verall task proficiency . While the categories are inspired by classic w orks [33, 36], the exact calculations are adapted to align with the Benc h2Driv e benchmark’s success criteria and the nature of our tasks. Eac h metric is detailed in the following sections. V ertical Dimension (Time-Series Stability Metrics): – F orgetting Ratio (FR) ↓ : Measures the degree of p erformance decay on historically learned tasks after the mo del learns subsequent new tasks, rela- tiv e to the p erformance immediately after learning them. The form ula is as follo ws, where S R i,j represen ts the success rate on task j after learning the i -th task. Lo wer v alues indicate stronger resistance to forgetting: F R = 1 N − 1 N − 1 X i =1 ( S R i,i − S R N ,i ) /S R i,i . (10) 10 J. Du, Y. Song et al. – Pro cess F orgetting Ratio (PFR) ↓ : Captures the sev erit y of drastic p erformance oscillations during the learning pro cess caused b y interference from other tasks, measuring the degree of deviation from the historical b est state: P F R = 1 N − 1 N − 1 X j =1 1 N − j N X i = j +1 ( H i,j − S R i,j ) /H i,j , (11) where H i,j = max { S R i,j , i = 1 , 2 , . . . , j − 1 } represents the mo del’s histori- cal b est p erformance on task j after training i th task. Horizon tal Dimension (Kno wledge T ransferabilit y Metrics): – F orw ard T ransfer (FT) ↑ : Ev aluates the zero-shot generalization or fa- cilitating capability of the currently accum ulated causal knowledge p o ol for subsequen t unseen tasks: F T = 1 N − 1 N − 1 X i =1 1 N − i N X j =1+1 S R i,j . (12) – Bac kward T ransfer (BT) ↑ : Quantifies the system’s av erage maintenance or even reverse-enhancemen t level across all past task sets after contin uous learning: B T = 1 N − 1 N X i =2 1 i − 1 i − 1 X j =1 S R i,j . (13) Comprehensiv e Overall P erformance Metrics: W e also adopts the of- ficial metrics suc h as driving score (DS), success rate (SR) and m ulti-abilit y success rate follo wing the CARLA [11] and Benc h2Drive [20]. – A v erage Driving Score ↑ : Comprehensively considers route completion and p enalt y deductions, reflecting the ov erall p erformance of the metho d. – A v erage Success Rate ↑ : The ratio of completely unbiased routes finished within strict time limits without any ma jor collisions or in fractions. – A v erage Multi-Ability Success Rate ↑ : The av erage success metric de- riv ed after decoupling fiv e different d imensions of abilities, reflecting the o verall balance of the metho d. 4.2 Implemen tation Details W e adopt different training strategies under differen t settings. F or full-data learn- ing, a t wo-stage approach is adopted. In the first stage, we pre-train the backbone with only auxiliary task losses L sem and L det . It is then follow ed by a second stage where the entire mo del is trained with all losses. Each stage is conducted DeLL 11 for 30 epo chs. W e use a v ariable learning rate from 3e-4 to 3e-5. When doing full- data learning, our mo del is trained on 4 NVIDIA L40 GPUs with a total batch size of 64 for ab out 64 hours. The GPU memory usage of our mo del’s inference is appro ximately 1.69 GB, and the time consumption for a single forw ard pass is ab out 26.29 ms. F or lifelong learning, the first task follows the aforemen tioned t wo-stage protocol, while for subsequent tasks, the bac kb one is frozen and train- ing pro ceeds in a single stage for 30 ep o c hs. W e reset the intrinsic parameters of optimizers b efore initiating the training on every new incoming task. 4.3 Main Results T able 1: Lifelong learning results on Bench2Driv e [20]. F or ev ery tw o ro ws of data, the top row is the result of baseline mo del and the bottom row is the result of ours. After T ask Overall(%) ↑ Ability(%) ↑ DS SR Merge Overtak e EmgBrake GiveW ay TSign Multi-Ab Mean 1(EmgBrake) 75.89 53.18 62.50 8.89 40.00 50.00 70.00 46.28 79.88 56.82 43.75 20.00 90.00 50.00 79.47 56.64 2(T sign) 74.31 52.27 42.50 15.56 83.33 50.00 78.42 53.69 77.24 55.91 52.50 15.56 80.00 50.00 82.11 56.03 3(Merge) 72.57 48.18 53.75 11.11 51.67 50.00 74.74 48.25 75.26 53.18 55.00 17.76 68.33 50.00 77.89 53.80 4(Overtak e) 69.12 39.09 37.50 46.67 21.67 50.00 57.37 42.64 72.09 45.00 42.50 40.00 41.67 50.00 65.26 47.89 5(GiveW ay) 60.89 30.00 35.00 8.89 26.67 50.00 55.79 35.27 68.97 42.73 36.25 22.22 51.67 50.00 70.00 46.03 Lifelong Metrics V ert. Metrics ↓ Hori. Metrics ↑ Overall Metrics ↑ FR PFR FT BT A vg DS A vg SR A vg Multi-Ab SR 44.50 40.25 41.11 52.83 70.55 44.54 45.23 33.97 29.8 42.88 79.63 74.69 50.73 52.08 The lifelong learning ev aluation on Benc h2Drive demonstrates that our frame- w ork substan tially outp erforms the baseline TF++ [17, 57] across all sequential tasks, as shown in T able 1. After the final task, our metho d achiev es an av erage driving score of 74.69%, while the av erage success rate improv es to 50.73%. On data-scarce ‘GiveW a y’, our mo del maintains 68.97% driving score and 42.73% success rate, v ersus baseline’s decline to 60.89% and 30%. Our approach also reduces the progress forgetting ratio from 40.25% to 29.8% and raises bac kward transfer from 52.83% to 79.63%, confirming that the dynamic kno wledge spaces and causal in terven tion effectiv ely mitigate catastrophic forgetting and enable p ositiv e knowledge transfer. Under the full-data training paradigm, our metho d ac hieves state-of-the- art performance among all compared end-to-end mo dels, as shown in T able 2. It attains the highest driving score of 86.86% and also leads in the a verage m ulti-ability success rate with 68.9%, reflecting well-rounded comp etence across div erse driving scenarios. These results confirm that the architectural innov ations 12 J. Du, Y. Song et al. T able 2: F ull-data learning results of E2E-AD methods on Benc h2Drive [20]. Bold stands for b est and underlined for second b est. Method Overall(%) ↑ Ability(%) ↑ DS SR Merge Overtak e EmgBrake Giv eW ay TSign Multi-Ab Mean AD-MLP [55] 18.05 0.00 0.00 0.00 0.00 0.00 4.35 0.87 TCP [50] 40.70 15.00 16.18 20.00 20.00 10.00 6.69 14.63 V AD [22] 42.35 15.00 8.11 24.44 18.64 20.00 19.15 18.07 UniAD [15] 45.81 16.36 14.10 17.78 21.67 10.00 14.21 15.55 ThinkT wice [19] 62.44 31.23 27.38 18.42 35.82 50.00 54.23 37.17 DriveT ransformer [21] 63.46 35.01 17.57 35.00 48.36 40.00 52.10 38.60 DriveA dapter [18] 64.22 33.08 28.82 26.38 48.76 50.00 56.43 42.08 DiffAD [49] 67.92 38.64 30.00 35.55 46.66 40.00 46.32 38.79 DriveMoE [52] 74.22 48.64 34.67 40.00 65.45 40.00 59.44 47.91 ORION [13] 77.74 54.62 25.00 71.11 78.33 30.00 69.00 54.72 TF++ [17, 57] 84.21 67.27 58.75 57.77 83.33 40.00 82.11 64.39 SimLingo [40] 85.07 67.27 54.01 57.04 88.33 53.33 82.45 67.03 HiP-AD [46] 86.77 69.09 50.00 84.44 83.33 40.00 72.10 65.98 DeLL (Ours) 86.86 68.63 61.25 62.22 80.00 60.00 81.05 68.90 in tro duced for lifelong learning also enhance representational p o wer and causal reasoning in static training settings. 4.4 Ablation Study T able 3: Ablation study on Bench2Driv e [20] under lifelong learning setting Method Overall Metrics(%) ↑ V ert. Metrics(%) ↓ Hori. Metrics(%) ↑ A vg DS A vg SR A vg Multi-Ab SR FR PFR BT FT Baseline Mo del 70.55 44.54 45.23 44.50 40.25 52.83 41.11 w/o ET Dec. 72.94 49.82 50.74 33.12 31.76 72.21 42.98 w/o TFEM 73.00 48.36 49.92 36.43 32.86 77.14 40.04 w/o FFEM 73.10 49.33 49.02 38.33 30.49 77.32 36.59 F ull Model 74.69 50.73 52.08 33.97 29.80 79.63 42.88 *w/o: full mo del without a certain mo dule. TF++ [17, 57] is used as the baseline mo del. Ablation exp eriments quantify the con tribution of each core comp onen t within our framew ork, as shown in T able 3. The evolutionary tra jectory deco der prov es indisp ensable for adaptiv e planning. Its remo v al reduces the av erage driving score to 72.94% and backw ard transfer to 72.21%. This suggests that the dynamically expanding tra jectory anc hor p o ol con tinually impro v es driving p erformance and is crucial for effective kno wledge preserving. The TFEM plays a vital role in im- p osing causally consistent kinematic constrain ts. Without it, the a verage driving score falls to 73%, while the pro cess forgetting ratio rises to 32.86%. This degra- dation confirms that enhancing tra jectory features with fron t-do or adjustmen t helps preserv e maneuv er-sp ecific kno wledge. The FFEM is critical for causal fea- ture purification, as its remo v al causes the forgetting ratio to increase to 38.33% DeLL 13 and forward transfer to drop to 36.59%, indicating that deconfounding raw m ul- timo dal features is essential for b oth knowledge retention and generalization to new tasks. The full mo del achiev es the best performance, demonstrating that the synergistic combination of knowledge spaces and cascaded front-door adjustment is essential for robust lifelong autonomous driving. 4.5 Visualization (1) TF++ (2) Ours (b) (c) (a) (e) (f) (d) (3 ) TF++ (4 ) Ours Fig. 3: Lifelong p erformance on CARLA b enchmark DEV10 [21]. As shown in Fig. 3, TF++ [17, 57] initially learns to decelerate and follow a slo w-moving bicycle ahead after training on T ask 1 (1a), but subsequen t training causes it to forget this sp eed con trol strategy , leading to a collision (1b, 1c). Our metho d also learns timely deceleration and low-speed following initially (2a) but do es not change lanes to o v ertake due to the absence of training data (2b). It acquires the lane-c hanging ov ertaking skill after training on T ask 4 (2c). TF++ and our metho d are b oth unable to p erform parking sp ot exit maneuvers during the early stages of training (1d, 2d). After learning from T ask 3, b oth metho ds acquire this capability (1e, 2e). How ever, following subsequent training, TF++ exhibits incorrect v elo cit y predictions, potentially ha ving learned a mistak en 14 J. Du, Y. Song et al. causal relationship b et ween 0 velocity and a stationary vehicle ahead (1f ). In con trast, our metho d con tin ued to predict v elo cit y correctly under the same con- ditions (2f ). TF++ cannot reliably master lane merging (3a, 3b, 3c), whereas our metho d is able to decisively merge b y exploiting traffic gaps on (4a) and grad- ually learns to decelerate in dense traffic b efore accelerating to complete lane c hanges (4b, 4c), illustrating the progressive accum ulation of driving kno wledge. When facing unseen scenarios at the early stage of learning, TF++’s planned tra jectories often conflicts with nearby v ehicles (3d, 3e), while our metho d gen- erated reasonable tra jectories (4d, 4e), indicating forw ard transfer of kno wledge. After learning subsequent tasks, TF++ forgets to decelerate at stop signs (3f ), while our metho d retains this ability (4f ), demonstrating resistance to forgetting. Fig. 4 presents the clustering results of dynamic kno wledge spaces during the learning process, including n umbers, IDs and visualizations. In particular, w e sample 50 data p oints p er cluster from the feature knowledge space and pro ject them in to 2D using t-SNE [27] for visualization, with colors indicating their associated driving ability . Notably , some driving capabilities encompass m ultiple clusters. While mos t clusters are clearly separated, a small num b er of clusters from differen t capabilities exhibit spatial proximit y or partial o verlap, indicating coupling relationships among distinct driving abilities. Fig. 4: Clustering results of dynamic kno wledge spaces during the learning pro cess. DeLL 15 5 Conclusion In this paper, w e prop ose DeLL, a deconfounded lifelong learning framework for end-to-end autonomous driving. Our metho d introduces dual dynamic knowledge spaces based on DPMM to incremen tally preserv e and up date both latent feature represen tations and explicit tra jectory priors. By lev eraging these kno wledge an- c hors as mediators in a causal fron t-do or adjustmen t mechanism, the framew ork effectiv ely mitigates spurious correlations caused b y unobserved confounders. A dditionally , an evolutionary tra jectory deco der enables non-autoregressive, par- allel planning. Extensiv e experiments in CARLA demonstrate that DeLL achiev es sup erior driving performance, significantly reduces catastrophic forgetting, and enhances knowledge transfer across sequential tasks. Despite its effectiveness, our metho d has certain limitations. The alternating optimization b etw een DPMM up dates and netw ork training introduces compu- tational o verhead and training cost. In addition, the current framework operates within sim ulated environmen ts, so the domain gap betw een CARLA and real- w orld deplo yment remains an op en c hallenge. A c kno wledgements W e thank Zhiyong Bao (2353604@tong ji.edu.cn) for his contribution to the Com- parison with Adaptation of Mainstream Lifelong Learning Metho ds section and for his help in building the lifelong learning b enc hmark on the CARLA sim ulator for autonomous driving. References 1. Aljundi, R., Babiloni, F., Elhosein y , M., Rohrbac h, M., T uytelaars, T.: Memory a ware synapses: Learning what (not) to forget. In: ECCV. pp. 139–154 (2018) 2. Aljundi, R., Chakra v art y , P ., T uytelaars, T.: Exp ert gate: Lifelong learning with a net work of exp erts. In: CVPR. pp. 3366–3375 (2017) 3. Castro, F.M., Marín-Jiménez, M.J., Guil, N., Schmid, C., Alahari, K.: End-to-end incremen tal learning. In: ECCV. pp. 233–248 (2018) 4. Chaudhry , A., Rohrbac h, M., Elhosein y , M., Ajan than, T., Dok ania, P .K., T orr, P .H., Ranzato, M.: On tin y episo dic memories in con tinual learning. arXiv preprint arXiv:1902.10486 (2019) 5. Chen, D., Krähenbühl, P .: Learning from all v ehicles. In: CVPR. pp. 17222–17231 (2022) 6. Chen, D., Zhou, B., K oltun, V., Krähenbühl, P .: Learning by cheating. In: CoRL. pp. 66–75. PMLR (2020) 7. Chen, L., W u, P ., Chitta, K., Jaeger, B., Geiger, A., Li, H.: End-to-end autonomous driving: Challenges and fron tiers. TP AMI 46 (12), 10164–10183 (2024) 8. Chitta, K., Prak ash, A., Geiger, A.: Neat: Neural atten tion fields for end-to-end autonomous driving. In: ICCV. pp. 15793–15803 (Oct 2021) 9. Chitta, K., Prak ash, A., Jaeger, B., Y u, Z., Renz, K., Geiger, A.: T ransfuser: Imita- tion with transformer-based sensor fusion for autonomous driving. TP AMI 45 (11), 12878–12895 (2022) 16 J. Du, Y. Song et al. 10. De Haan, P ., Ja yaraman, D., Levine, S.: Causal confusion in imitation learning. A dv ances in neural information processing systems 32 (2019) 11. Doso vitskiy , A., Ros, G., Co devilla, F., Lop ez, A., Koltun, V.: CARLA: An op en urban driving simulator. In: CoRL. pp. 1–16. PMLR (2017) 12. Duan, K., Bai, S. , Xie, L., Qi, H., Huang, Q., Tian, Q.: Centernet: Keyp oint triplets for ob ject detection. In: ICCV. pp. 6569–6578 (2019) 13. F u, H., Zhang, D., Zhao, Z., Cui, J., Liang, D., Zhang, C., Zhang, D., Xie, H., W ang, B., Bai, X.: Orion: A holistic end-to-end autonomous driving framework b y vision-language instructed action generation. arXiv preprint (2025) 14. Go odfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., W arde-F arley , D., Ozair, S., Courville, A., Bengio, Y.: Generative adv ersarial netw orks. Communications of the A CM 63 (11), 139–144 (2020) 15. Hu, Y., Y ang, J., Chen, L., Li, K., Sima, C., Zh u, X., Chai, S., Du, S., Lin, T., W ang, W., et al.: Planning-orien ted autonomous driving. In: CVPR. pp. 17853– 17862 (2023) 16. Hughes, M.C., Sudderth, E.: Memoized online v ariational inference for diric hlet pro cess mixture models. A dv ances in neural information pro cessing systems 26 (2013) 17. Jaeger, B., Chitta, K., Geiger, A.: Hidden biases of end-to-end driving mo dels. In: ICCV. pp. 8240–8249 (2023) 18. Jia, X., Gao, Y., Chen, L., Y an, J., Liu, P .L., Li, H.: Driveadapter: Breaking the coupling barrier of p erception and planning in end-to-end autonomous driving. In: ICCV. pp. 7953–7963 (2023) 19. Jia, X., W u, P ., Chen, L., Xie, J., He, C., Y an, J., Li, H.: Think t wice before driving: T ow ards scalable deco ders for end-to-end autonomous driving. In: CVPR. pp. 21983–21994 (2023) 20. Jia, X., Y ang, Z., Li, Q., Zhang, Z., Y an, J.: Bench2Driv e: T o wards multi-abilit y b enc hmarking of closed-lo op end-to-end autonomous driving. In: NeurIPS (2024) 21. Jia, X., Y ou, J., Zhang, Z., Y an, J.: Driv etransformer: Unified transformer for scalable end-to-end autonomous driving. arXiv preprint arXiv:2503.07656 (2025) 22. Jiang, B., Chen, S., Xu, Q., Liao, B., Chen, J., Zhou, H., Zhang, Q., Liu, W., Huang, C., W ang, X.: V ad: V ectorized scene represen tation for efficien t autonomous driving. In: ICCV. pp. 8340–8350 (2023) 23. Jiang, M., F an, J., Li, F.: A dv ances in con tinual learning: A comprehensive review. Exp ert Systems with Applications 294 , 128739 (2025) 24. Kemk er, R., Kanan, C.: F earnet: Brain-inspired mo del for incremen tal learning. arXiv preprint arXiv:1711.10563 (2017) 25. Kingma, D.P ., W elling, M.: Auto-enco ding v ariational bay es. arXiv preprint arXiv:1312.6114 (2013) 26. Kirkpatric k, J., P ascanu, R., Rabino witz, N., V eness, J., Desjardins, G., Rusu, A.A., Milan, K., Quan, J., Ramalho, T., Grabsk a-Barwinsk a, A., et al.: Ov ercoming catastrophic forgetting in neural netw orks. Pro ceedings of the national academy of sciences 114 (13), 3521–3526 (2017) 27. Laurens, V.D.M., Hinton, G.: Visualizing data using t-sne. Journal of Machine Learning Research 9 (2605), 2579–2605 (2008) 28. Li, Y., Schofield, E., Gönen, M.: A tutorial on dirichlet process mixture modeling. Journal of Mathematical Psyc hology 91 , 128–144 (2019) 29. Li, Z., Hoiem, D.: Learning without forgetting. TP AMI 40 (12), 2935–2947 (2017) DeLL 17 30. Liao, B., Chen, S., Yin, H., Jiang, B., W ang, C., Y an, S., Zhang, X., Li, X., Zhang, Y., Zhang, Q., et al.: Diffusiondriv e: T runcated diffusion model for end-to-end autonomous driving. In: CVPR. pp. 12037–12047 (2025) 31. Lin, Y., Li, Z., Du, G., Zhao, X., Gong, C., W ang, X., Lu, C., Gong, J.: H2c: Hipp ocampal circuit-inspired contin ual learning for lifelong tra jectory prediction in autonomous driving. arXiv preprint arXiv:2508.01158 (2025) 32. Liu, B., Zh u, Y., Gao, C., F eng, Y., Liu, Q., Zhu, Y., Stone, P .: Lib ero: Benchmark- ing kno wledge transfer for lifelong robot learning. A dv ances in Neural Information Pro cessing Systems 36 , 44776–44791 (2023) 33. Lop ez-P az, D., Ranzato, M.: Gradien t episodic memory for con tinual learning. A dv ances in neural information processing systems 30 (2017) 34. Luo, Y., Yin, L., Bai, W., Mao, K.: An appraisal of incremen tal learning methods. En tropy 22 (11), 1190 (2020) 35. Mally a, A., Lazebnik, S.: P acknet: Adding multiple tasks to a single netw ork by iterativ e pruning. In: CVPR. pp. 7765–7773 (2018) 36. Meng, Y., Bing, Z., Y ao, X., Chen, K., Huang, K., Gao, Y., Sun, F., Knoll, A.: Preserving and com bining kno wledge in robotic lifelong reinforcement learning. Nature Machine Intelligence pp. 1–14 (2025) 37. P arisi, G.I., Kemk er, R., Part, J.L., Kanan, C., W ermter, S.: Con tinual lifelong learning with neural net works: A review. Neural Netw orks 113 , 54–71 (2019) 38. P earl, J., Glymour, M., Jewell, N.P .: Causal inference in statistics: A primer. John Wiley & Sons (2016) 39. Radosa vo vic, I., Kosara ju, R.P ., Girshick, R., He, K., Dollár, P .: Designing netw ork design spaces. In: CVPR. pp. 10428–10436 (June 2020) 40. Renz, K., Chen, L., Arani, E., Sina vski, O.: Simlingo: Vision-only closed-lo op au- tonomous driving with language-action alignment. In: CVPR (2025) 41. Renz, K., Chitta, K., Mercea, O.B., K o epk e, A., Ak ata, Z., Geiger, A.: Plan t: Explainable planning transformers via ob ject-level represen tations. arXiv preprin t arXiv:2210.14222 (2022) 42. Ruan, K.: When Causalit y Meets Autonom y: Causal Imitation Learning to Un- ra vel Unobserved Influences in Autonomous Driving Decision-Making. Columbia Univ ersity (2024) 43. Rusu, A.A., Rabino witz, N.C., Desjardins, G., Soy er, H., Kirkpatrick, J., Ka vukcuoglu, K., P ascanu, R., Hadsell, R.: Progressive neural netw orks. arXiv preprin t arXiv:1606.04671 (2016) 44. Shao, H., W ang, L., Chen, R., Li, H., Liu, Y.: Safet y-enhanced autonomous driving using interpretable sensor fusion transformer. In: CoRL. pp. 726–737. PMLR (2023) 45. Shin, H., Lee, J.K., Kim, J., Kim, J.: Con tinual learning with deep generativ e repla y . A dv ances in neural information pro cessing systems 30 (2017) 46. T ang, Y., Xu, Z., Meng, Z., Cheng, E.: Hip-ad: Hierarchical and multi-gran ularity planning with deformable atten tion for autonomous driving in a single decoder. arXiv preprint arXiv:2503.08612 (2025) 47. V an de V en, G.M., T uytelaars, T., T olias, A.S.: Three types of incremen tal learning. Nature Machine Intelligence 4 (12), 1185–1197 (2022) 48. W ang, L., He, Z., Dang, R., Shen, M., Liu, C., Chen, Q.: Vision-and-language na vigation via causal learning. In: CVPR. pp. 13139–13150 (2024) 49. W ang, T., Zhang, C., Qu, X., Li, K., Liu, W., Huang, C.: Diffad: A unified diffu- sion modeling approach for autonomous driving. arXiv preprin t (2025) 18 J. Du, Y. Song et al. 50. W u, P ., Jia, X., Chen, L., Y an, J., Li, H., Qiao, Y.: T ra jectory-guided control pre- diction for end-to-end autonomous driving: A simple y et strong baseline. Adv ances in Neural Information Processing Systems 35 , 6119–6132 (2022) 51. Y ang, Y., Chen, D., Qin, T., Mu, X., Xu, C., Y ang, M.: E2e parking: Autonomous parking by the end-to-end neural net w ork on the carla simulator. In: 2024 IEEE In telligent V ehicles Symposium (IV). pp. 2375–2382. IEEE (2024) 52. Y ang, Z., Chai, Y., Jia, X., Li, Q., Shao, Y., Zh u, X., Su, H., Y an, J.: Drivemoe: Mixture-of-exp erts for vision-language-action mo del in end-to-end autonomous driving. arXiv preprint arXiv:2505.16278 (2025) 53. Y ao, H., Li, P ., Jin, B., Zheng, Y., Liu, A., Mu, L., Su, Q., Zhang, Q., Chen, Y., Li, P .: Lilodriver: A lifelong learning framew ork for closed-lo op motion planning in long-tail autonomous driving scenarios. arXiv preprint arXiv:2505.17209 (2025) 54. Y ao, X., Huang, T., W u, C., Zhang, R.X., Sun, L.: Adv ersarial feature alignment: A v oid catastrophic forgetting in incremental task lifelong learning. Neural compu- tation 31 (11), 2266–2291 (2019) 55. Zhai, J.T., F eng, Z., Du, J., Mao, Y., Liu, J.J., T an, Z., Zhang, Y., Y e, X., W ang, J.: Rethinking the op en-loop ev aluation of end-to-end autonomous driving in nuscenes. arXiv preprint arXiv:2305.10430 (2023) 56. Zhai, M., Chen, L., T ung, F., He, J., Nawhal, M., Mori, G.: Lifelong gan: Contin ual learning for conditional image generation. In: ICCV. pp. 2759–2768 (2019) 57. Zimmerlin, J., Beißwenger, J., Jaeger, B., Geiger, A., Chitta, K.: Hidden biases of end-to-end driving datasets. arXiv preprint arXiv:2412.09602 (2024) DeLL 19 Deconfounded Lifelong Learning for Autonomous Driving via Dynamic Kno wledge Spaces Supplemen tary Material A Comparison with A daptation of Mainstream Lifelong Learning Metho ds T o the b est of our knowledge, ours DeLL is the first w ork to do lifelong learning in the field of closed-lo op end-to-end autonomous driving (E2E-AD). Therefore, w e adapt Exp erience Replay (ER) [4] and P ackNet [35] to the E2E-AD domain for fair comparison, denoted as “Baseline + ER” and “Baseline + Pac kNet”. T able 4: Comparison with lifelong learning methods adapted to E2E-AD on Benc h2Drive [20] dataset. Bold means best. Methods T raining Cost ↓ V ert. Metrics ↓ Hori. Metrics ↑ Overall Metrics ↑ Data Time FR PFR FT BT DS SR Multi-Ability Baseline [17, 57] × 1 × 1 44.50 40.25 41.11 52.83 70.55 44.54 45.23 Baseline + ER [4] × 1.35 × 1 41.52 31.11 39.26 79.17 72.80 47.57 49.86 Baseline + P ackNet [35] × 1 × 2 34.11 25.75 42.81 74.50 74.46 50.65 51.64 DeLL (Ours) × 1 × 1 33.97 29.8 42.88 79.63 74.69 50.73 52.08 T able 4 shows that our metho d is the b est in almost every metric. It is w orth noting that the training cost of our metho d is also relativ ely low. ER [4] requires extra space to cache the dataset because it is a rehearsal-based metho d, while P ackNet [35] requires t wice the training time to prune the net work. In contrast, our prop osed dynamic kno wledge space approach is quite effective. B Lifelong Learning Results in Reverse Sequence W e also consider the learning of task sequences in reverse order and ev aluate the lifelong learning ability of the models, as shown in T able 5. Our prop osed metho d comprehensiv ely outp erforms the basline [17, 57] mo del. C Ablation Study under F ull-data Learning W e also conduct ablation exp eriments under the full data learning setting to v erify the effectiveness of the causal feature enhancement mo dule (CFEM) and ev olutionary tra jectory deco der (ET Dec.), as shown in T able 6. 20 J. Du, Y. Song et al. T able 5: Lifelong learning results on Benc h2Drive [20] (in rev erse sequence). F or every t wo rows of data, the top row is the result of baseline mo del [17, 57] and the b ottom ro w is the result of ours. Bold means b est. After T ask Overall(%) ↑ Ability(%) ↑ DS SR Merge Overtak e EmgBrake GiveW ay TSign Multi-Ab Mean 1(GiveW ay) 42.01 11.82 10.00 8.89 15.00 30.00 26.84 18.14 49.35 17.73 18.75 6.67 21.67 50.00 37.89 27.00 2(Overtak e) 69.53 39.09 38.75 57.78 23.33 60.00 40.00 43.97 69.26 36.36 41.25 31.11 16.67 40.00 64.21 38.65 3(Merge) 68.90 40.91 55.00 17.78 33.33 40.00 56.84 40.59 74.23 50.45 56.25 26.67 46.67 60.00 74.74 52.86 4(T sign) 69.40 41.82 45.00 13.33 51.67 40.00 65.49 43.16 79.23 59.09 65.00 20.00 70.00 20.00 86.32 52.26 5(EmgBrake) 69.12 40.00 41.25 11.11 48.33 60.00 60.53 44.24 76.73 55.46 45.00 17.78 86.67 40.00 78.95 53.68 Lifelong Metrics V ert. Metrics ↓ Hori. Metrics ↑ Overall Metrics ↑ FR PFR FT BT A vg DS A vg SR A vg Multi-Ab SR -57.97 12.26 35.00 26.50 63.79 34.73 38.02 -72.95 0.29 32.28 38.13 69.76 43.82 44.89 T able 6: Ablation Study on Bench2Driv e [20] under F ull-data Learning Metho d DS ↑ RC ↑ IS ↑ SR ↑ Multi-Ab Mean ↑ w/o CFEM 83.39 95.17 0.87 64.55 64.9 w/o ET Dec. 86.66 97.24 0.89 71.36 65.5 F ull Mo del 86.86 98.25 0.88 68.63 68.9 * w/o: full mo del without a certain mo dule.
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment