Reliability of stochastic capacity estimates
Stochastic traffic capacity is used in traffic modelling and control for unidirectional sections of road infrastructure, although some of the estimation methods have recently proved flawed. However, even sound estimation methods require sufficient da…
Authors: Igor Mikolasek
R el i a bi l i t y o f st oc h as t i c c ap ac i t y e st i m at es Igor Mikolasek Transport Research Ce ntre CDV, Lise nska 33a, 636 00 Brno, Cz ech Republic igor.mikolasek@cdv.gov.cz Abstract: Stochastic traffic c apacity is used in traffic modelli ng and control for unidirectional sections of road infrastructure, although some of the estim ation methods have recently proved flawed. However, even sound estimation methods req uire sufficient data . Because breakdowns are rare, the number of recorded breakdowns effectively determines sample size. This is especially relevant for temporary traffic infrastructure, but also for permanent bottlenecks (e.g., on- and off-ramps ), where practitioners must know when estimates are reliable enough for control or design decisions. This paper studies th is reliability along with the impact of ce nsored data using synthetic data with a known capacity distribution . A corrected maximum-likelihood estimator is applied to varied samples. In total, 360 artificial measurements a re created and used to estimate the capacity distribution, and the deviation from the pre-defined distribution is then quantified. Results indicate that at least 50 recorded breakdowns are necessary; 100-200 are the re commended mi nimum for temporary measurements. Beyond t his, further improvements are marginal, with th e expected average relative error below 5 %. Keywords: maximum likelihood e stimation; censored data; synthetic data generation; traffic breakdown probability; sample size ; traffic control; traffic modelling 1 Introduction Stochastic traffic ca pacity is an established conc ept used in traffic modelli ng and control for one-directional sections of road infrastructure (Brilon et al., 2005; Kianfar & Abdoli, 2021; Lorenz & Elefteriadou, 2001; Shojaat et al., 2018; Wang et al., 2022). It enables realistic modelling of traffic flow (TF) and breakdown behaviour in relevant use cases. It c an be especially useful in traffic control applications to establish an operating po int that minimizes breakdown probability while maximizing TF intensity , using concepts such as the sustainable flow index (Shojaat et al., 2016) – although it is notably sensitive to the aggregation interval and may not be r eliable (Mikolasek, 2025) – or traffic efficiency (Sohrabi & Ermagun, 2018). The corrected maximum likelihood estimator (MLE) should be used to estimate breakdown probability in either case (Mikolasek, 2025). Multiple methods have been used to estimate the breakdown probability distribution (Arnesen & Hjelkrem, 2 018; Brilon et al., 2005; Polus & P ollatschek, 2002) . However, some have recently been found to be unsuitable or flaw ed. S pecifically, the Kaplan-Meier estimator (product limit method) is inherently unsuitable for this use case because of the differences between age and lifetime on one side and TF intensity and capacity on the other. Moreover, as hinted above, MLE has been applied incorrectly to th is problem in the past (Mikolasek, 2025). This paper evaluates the reliability of capacity es timates with respect to r ecorded s ample size. I t is use ful to know how many (reliable) breakdown observa tions must be recorded before the estimates can be us ed in modelling or traffic control without the risk of significantly skewing results. This is particularly relevant for temporary bottlenecks such as work zones , where the time frame to gather data is limited. 2 Methodology 2.1 Maximum likelihood estimation MLE is a well-established survival analysis method , although it has been m isapplied to the capacity problem in the past. Here, it is used to estimate the capacity (b reakdown probability) distribution. The correct formulation, derived by Mikolasek (2025), is : , (1) where and are parameters of the Weibull distribution (other distributions can be used analogously) optimized by maximizing log-likelihood . is the failure indicator for observation (1 fail ure/breakdown, 0 survival, i.e., censored) , and is the capacity cumulative dist ribution function (CDF) evaluated at TF intensity , which also defines the breakdown probability, since breakdown occurs when current capacity is exceeded. For the Weibull distribution , it is: . (2) Refer to Mikolasek (2025) for more details about the concepts of cap acity, traffic breakdowns, and censoring. 2.2 Synthetic data generation For empirical data, the true capacity distribution i s unknown. To address this, a synthetic data generation approach is employed to evaluate the reliability of the estimates by comparing them against a known, pre-defined distributi on. Simil ar methodologies have been applied in other disciplines, such as geology (Lin & Shearer, 2005) a nd meteorology (Haupt et al., 2006), when real-world data cannot provide a r eliable benchmark due to the absence of ground -truth values. Figure 1: Left – CDF of capaci ty representing the breakdo wn probability ( ; red triangles) and the nu mber of re cords of individua l traffic flow intensities ( ; blue circles ). Right – the resulting cu mulative frequ ency of breakdo wns calculated via (3) and (4). The basis for synthetic data gen eration was a dat aset of real TF intensity records , the number of records at TF intensity (Mikolasek, 2026b) from Mikolasek (2025). Details on data collection and processed are provided there (parameters such as aggregation interval and thresholds may vary by use case). Data from the period without TF harmonisation, comprising 7,447 overlapping 3-minute TF records, 52 of which directly pre ceded a TF breakdown (exactly one uncensored flow corresponds to one breakdow n), were used. While a fully synthetic datase t could be used, employing real data preserves realistic characteristics of the modelled situation and supports the choice of pre-defined capacity-distribution para meters based on the same study. The CDF of capacity can be us ed to compute the expected, the oretical number of breakdowns at each TF intensity level by (3): . (3) . (4) The co rresponding theoretical (and/or empirica l) cumulative fr equency of breakdowns ( CF B ; Figure 1 , right) is then cal culated via (4), where can be either c alculated from (3) (theoretical CF B ) or the empiric al count. Ind ic e s an denote diff erent TF intensity levels. The values serve as the basis for generating synthetic pseudo-empirical TF breakdown observations using (6). The (pseudo-)empirica l CF B curve s do not match t he theor etical CF B exactly, but will fluctuate around it, with deviation depending on the number of recorded breakdowns, which leads to capacity estimation errors. The synthetic data gener ation was performed as a series of B ernoulli trials at each level of TF intensity to replicate stochastic variability. For , the number of breakdowns at level is modelled a s . I mplementation is simple : draw and compare it to : (5) For , (5) would always y ield , but can be split into smaller components such that and each , allowing the Bernoulli trial to be applied sepa rately to each component. This is expressed in (6), where is a realisation of : (6) This yields the number of hypothetical breakdowns with expe cted value at each TF intensity level . Therefore, is needed to generate more than th e expected number of breakdowns. The resulting pseudo-empirical CF B curves are then computed via (4). It is possible to ge nerate virtually an infinite number of synthetic datasets for a hypothetical motorway with a pre-defined capacity distributi on. The ca pacity distribution can then be estimated for each such pseudo-empirical dataset using MLE, a nd the e stimated capac ity CDFs can be compared to the pre- defined “true” CDF . 2.3 Reliability analysis of the CDF estimates The ability to generate multiple pseudo-empirical datasets enables a sensitivity analysis of capacity distributi on error with respect to sample size . The original dataset with 7,447 TF intensity records yields approximately 52 e xpected breakdowns under the pre -defined capacity distribution . To assess the effect of sample size, the original TF data were resampled by multipl ying (or divi ding) the num ber of rec ords at each level to create eight datasets with 13, 26, 52, 78, 104, 156, 208, and 260 expected breakdowns. Fifteen synthetic pseudo-empirical d atasets were generated for e ach. The underlying capacity distributi on remained fixed, ensuring a constant ratio between the expec ted breakdowns and total TF records (i.e., the censoring rate). The capacity distribution was then estimated using the corrected MLE formul a. For each sample size, the mean an d standard deviation of t he estimates and the ass ociated errors were computed across the fiftee n replications. Note assum ing normal ity to compute the standard deviation is precarious fo r relative errors, which are bound below at 0 % and have a long right tail; empirica l distributions should be considered. The standard deviations are more informative for other variables, such as the estimated Weibull parameter s, for wh ich the normality assumption is sounder. Root mean squared error (RM SE ) , average relat ive error (ARE ), and average weighted relative error (A WRE ; ( 7)) of the CF B curves and, primarily, of the capacity CDF were calculated using the pre-defined C DFs and corre sponding theoretical CF B curves as g round truth. W eights a ssign greater we ight to TF levels where most brea kdowns occur, since larger errors at the outer parts of the estimated curves are typically less consequential for practical performance – assuming traffic patterns and c apacity at the site (or at sit es with comparable capacity dist ribution based on similar layout and traffic composition ) do not change significant ly . (7) It was further hypothesised that the censoring rate (determined by th e capacity dist ribution and the demand patterns ) affects the reliability of the e stimates. To test this, additional simulations were conducted using varied capacity dist ributions. Specifically, new sample sets were generated by slightl y modifying the original theoretical capacity distribution while keeping the expected number of breakdowns approximately aligned with those of the original eight sample sizes again. This was achieved by p roportionally multiplying the number of TF records at each level. For eight additional sets, the breakdown probability was reduced by a factor of two ( ; for other eights sets, by a factor of eight ( . As with the original sets, each new set consisted of 15 synthetic pseudo- empirical datasets, yielding a total of 360 datasets across three ce nsoring levels (capacity distributions), each with eight sample sizes and 15 simulation runs. These were used to estimate a regression model for the average weighted relative error o f the estimat ed CF B and capacity CDF cu rves. The total number of TF intensity r ecords (sum of ), the number of breakdowns , their ratio (i.e., the censoring rate), and the natura l logarithms of these three variables were considered as candidate explanatory variables in the regression model. Model selection was guided by the coefficient of determination R 2 and significance of the included varia bles (P -value < 0.0 5) . 3 Results and discussion Given the extent of th e results, only a few illustrative examples of the pseudo-empirical measurements and capacity -model estimation simulations are provided, along with regression models that predict esti mation e rror. The full models and results are available in a data repository (Mikolasek, 2026a). (a) (b) (c) Figure 2 : Il lustrative grap hs of the simulated measurem ents and capaci ty model est imation from the set with roughly 8x reduced bre akdown pr obability, 8x increased traffic flow da ta, and 50 expected breakdo wns: (a) run 1, (b ) run 9, (c) run 11 (s ee Table 1). Figure 2, together with the numbers in Table 1, illustrates the impact of randomness on the accuracy of capacity distribution estimates. Graph (a) shows a smoo th empirical CF B curve that only deviates fr om the theoretical (unknown in practic e) curve mainly in the second half, yet it yields a severely skewed capacity estimate. By contrast, the slightly more ru gged curve in graph (c) produces an almost perf ect estimate. Graph (b) also shows a relatively smooth C F B curve that gradually diverges from the theoretical one. The estimated CDF appears to match the true curve visually, but this is a scale effect of the left part of the plot – the average relative error is in fact 29.81% (AWRE = 28.45%). This highlights that l ittle or no information about reliability of the capacity estimates can be inferred from the CF B curve shape, tempting as it may be when it is the only available “clue” in pra ctice. While there is a clear correlation between the C F B a nd CDF erro r ( Table 1), the CF B error cannot be cal culated, since th e theoretical CF B curve is unknown, and thus cannot be used to infer the CDF error. Table 1: Excerpt from the simulated capacity est imation result s ( see Figure 2 ). Run 1 9 10 11 12 Mean (of 15) Maximum (of 15) Records 59576 True shape 7.5 True scale 183.0 Estimated shape 8.896 8.331 7.898 7.570 8. 05 8 7.592 9.099 Estimated scale 157.7 175.8 172.2 180.8 174.5 184.9 233.2 Theoretical breakd owns 50 . 32 Recorded breakdo wns 62 36 59 52 47 49 . 93 62 Predicted breakd owns 62 .02 36 . 00 59 . 00 52 . 00 47 .01 49 . 94 62 .02 RMSE CFB 5. 69 3 9.542 4.825 0. 95 2 2. 77 6 4.044 9.542 ARE CFB 20 .00% 38 .93% 9.91% 2.01% 16 .11% 17 .54% 38 .93% AWRE CFB 12 .96% 37 .16% 10 .48% 2.23% 14 .36% 14 .71% 37 .16% RMSE CDF 0.0081 0.0011 0.0031 0.0005 0.0010 0.0026 0.0081 ARE CDF 38 .23% 29 .81% 16 .63% 3.11% 12 .54% 19 .66% 38 .23% AWRE CDF 29 .41% 28 .45% 17 .38% 3.35% 8.54% 15 .93% 29 .41% RMSE CFB (vs. emp .) 1. 60 4 1.427 1.070 1. 78 3 1.214 1. 49 4 2. 82 7 ARE CFB (vs. emp .) 10 .14% 14 .22% 8.26% 18 .39% 6.84% 12 .09% 18 .94% The calculated errors of the estimated capacity CDF curves cast a new light on earlier studies on stochastic capacity and underscore the need to c ollect sufficie nt data befor e drawing conclusions about the capacity distribution. This primarily concerns the breakdowns, but the intermediate, censored, n on-breakdown data must always be included, too . For example, TF harmonisation was recently found to reduce breakdown probability by 40-50 % for a given TF at a 2- to -1 lane motorway work zone (Mikolasek, 2025). Accounting for possi ble error – up to ± 30 % for either case (with or without the h armonisation) , given the number of recorded breakdowns – suggest that, while such a fringe statistical fluke is extremely unlikely, if the possible errors aligned unfavourably, the estimated effect o f harmonisation could be in fact negative. However, the impact of ha rmonisation still appears to be statistically significant; it i s also possible that the true effect is even more positive than previously estimated. Table 2 shows the details of thre e notable regression models for C DF AWRE. The sim ple Model 6 was chosen as t he most suitable and is shown in Figure 3, which plots C DF AWRE vs. the number of recorded breakdowns and reveals a clear logarithmic relation. While Model 10 has marginally better R 2 , and Model 4 shows even higher R 2 , but one v ariable statistical ly in significant (P-value = 0.24), Model 6 is prefer red for interpretability and practicality. The censoring rate do es not appear to affect estimate reliability. Although it enters Model 10 via the term , that term primarily reflects the number of breakdowns; the impact of is offset by the other (log) term with a negative sign. Table 3 reports analogous models for CF B AWRE, with the same qualitative conclusions. Figure 3: Regres sion model o f expected breakdown probabi lity (capacity C DF) average weighted relative error (AWR E) and the under lying pseudo-empirica l data. Table 2: Notable models f or breakdown probab ility model error (AWRE). Model 4 (R 2 = 0.4499) Coefficients P-value Lower 95% Upper 95% Intercept 0.6854 2.671E- 12 0.4994 0.8715 x3 : sum r Ij /CFB(I max ) 9. 62 1E- 05 5.680E-3 2.821E- 05 1.642E-4 x4 : ln(sum r Ij ) -0.04996 8. 355E -3 -0.08701 -0.01291 x5 : ln(CFB(I max )) -0.02283 0.2427 -0.06120 0.01554 Model 6 (R 2 = 0.4379) Coefficients P-value Lower 95% Upper 95% Intercept 0.4456 1.882E- 72 0.4074 0.4837 x5 (ln(CFB(I_max)) ) -0.07348 1.059E- 46 -0.08213 -0.06482 Model 10 (R 2 = 0.4477) Coefficients P-value Lower 95% Upper 95% Intercept 0.7858 4. 72 9E- 59 0.7074 0.8642 x3 (sum r_Ij/CFB(I_ max)) 1.344E-4 1.682E- 27 1. 12 1E-4 1. 568E - 4 x4 (ln(sum r_Ij)) -0.07145 1.500E- 47 -0.07976 -0.06314 The spread of estimated parameters fo r the same pre-defined capacity (see Table 1) also sheds a new light fixing the shape parameter as discussed in Miko lasek (2025). Fix it appears even less appropriate, as it limits model ’s ability to adjust the c apacity distribution shape in the relevant re gion, increasing the risk of severely under- or over- esti mating breakdown probability at low or high TF intensities. Table 3: Notable models f or cumulative f requency of breakdo wns model err or (AWRE). Model 4 (R 2 = 0.3415) Coefficients P-value Lower 95% Upper 95% Intercept 0.7212 1. 49 1E- 09 0.4928 0.9497 x3 (sum r_Ij/CFB(I_ max)) 8. 81 9E- 05 0.03853 4.6 80 7E - 06 1.717E-4 x4 (ln(sum r_Ij)) -0.05657 0.01497 -0.1021 -0.01107 x5 (ln(CFB(I_max)) ) -0.01466 0.5409 -0.06178 0.03246 Model 6 (R 2 = 0.3282) Coefficients P-value Lower 95% Upper 95% Intercept 0.4355 2.671E- 53 0.3887 0.4823 x5 (ln(CFB(I_max)) ) -0.07141 8. 66 3E- 33 -0.08203 -0.06079 Model 10 (R 2 = 0.3408) Coefficients P-value Lower 95% Upper 95% Intercept 0.7857 4.1 40 E- 44 0.6895 0.8818 x3 (sum r_Ij/CFB(I_ max)) 1.127E-4 9.105E- 15 8. 53 5 4E - 05 1.401E-4 x4 (ln(sum r_Ij)) -0.07037 3.671E- 34 -0.08055 -0.06018 4 Conclusions It is clearly shown that data volume, specifically the number of recorded breakdowns, plays a crucial role in reliability of estimates of stochastic capacity and the breakdown probability distribution. Based on the results, fewer than 50 breakdowns are likely to yiel d substantial errors in the estimated capacity distribution . Even with about 100 bre akdowns th e expected average weighted relative error is roughly 10 % and can commonly reach 30 %. With more than 200 recorded breakdowns, the AWRE is likely to remain below 10 %, with expected value about 5 % or lowe r as dat a increase. This is esp ecially important where the traffi c flow measu rements are temporary, such as in work zones. Where sites are sufficiently similar – or where key variables beyond TF int ensity are controlled – data from multi ple locations can be pooled to obtain a more robust capacity model. These findings likely genera lize to other stochastic capacity estimation methods, though the aggregation interval and TF intensity bin size may affect results and warrant further investigation. Acknowledgements This article was p roduced with financial support from the Czech Ministr y of Transport within the programme of long-term conceptual development of research institutions. References Arnesen, P., & Hjelkrem, O. A. (2018). An Estimator for Tra ffic Bre akdown Proba bility Based on Classification of Transitional Breakdown E vents. T ransportation Science , 52 (3). https://doi.org/10.1287/trsc.2017.0776 Brilon, W., Geistefe ldt, J., & Regler, M. (2005). Reliability of Freeway Traffic Flow: A stochastic Concept of Capacity. Proceedings of the 16th International Sy mposium on Transportation and Traffic Theory , 125 – 144. https://www .ruhr-uni- bochum.de/verkehrswesen/download/literatur/ISTTT16_Brilon_Geistefeldt_Regler_final _citation.pdf Haupt, S. E., Young, G. S., & Allen, C. T. (2006). Validation of a receptor -dispersion model coupled with a genetic algorithm using synthetic data. Journal of Applied Meteorology and Climatology , 45 (3), 476 – 490. https://doi.org/10.1175/JAM2359.1 Kianfar, J., & Abdoli, S. (2021). Deterministic and S tochastic Capacity in Work Zones: Findings from a Long -Term Work Zone. Journal of Transportation Engineering, Part A: Systems , 147 (1). https://doi.org/10.1061/JTEPBS.0000470 Lin, G., & Shearer, P. (2005). Tests of relative earthquake location techniques using synthetic data. Journal of Geophysical Research: Solid Earth , 110 (B4), 1 – 14. https://doi.org/10.1029/2004JB003380 Lorenz, M. R., & Elefteriadou, L. (2001). Defining fr eeway capacity a s function of breakdown probability. Transportation Research Rec ord , (1776), 43 – 51. https://doi.org/10.3141/1776-06 Mikolasek, I. (2025). Stochastic highway capacity: Unsuitable Kaplan-Mei er estimator, revised maximum likelihood estimator, and impact of speed harmonisation. ArXiv Preprint ArXiv:2507.00893 . Mikolasek, I. (2026a). Re liability of stochastic traffic capacity estimates - supplementary material . Zenodo. https://doi.org/10.5281/zenodo.18735590 Mikolasek, I. (2026b). Stochastic highway capacity: Unsuitable Kaplan -Meier estimator, revised maximum likelihood estimator, and impact of speed harmonisat ion - supplementary material . Zenodo. https://doi.org/10.5281/zenodo.18733055 Polus, A., & Pollatschek, M. A. (2002). Stochastic nature of freewa y capacity and its estimation. Canadian Journal of Civil Engineering , 29 (6), 842 – 852. https://doi.org/10.1139/l02-093 Shojaat, S., Geistefeldt, J., Parr, S. A., Escobar, L., & Wolshon, B. (2018). Defining freeway design capacity based on stochastic observations. Transportation Research Record , 2672 (15), 131 – 141. https://doi.org/10.1177/0361198118784401 Shojaat, S., Geistefeldt, J., Parr, S. A., Wilmot, C. G., & Wolshon, B. (201 6). Sustained flow index: Stochastic measure of freeway performance. Transportation Research Record , 2554 , 158 – 165. https://doi.org/10.3141/2554-17 Sohrabi, S., & Ermagun, A. (2018). Optimum Capacity of Freeways: A Stochastic Approach. Journal of Transportation Engineering, Part A: Systems , 144 (7) , 04018032. https://doi.org/10.1061/jtepbs.0000156 Wang, Y., Cheng, Q., Wang, M., & Liu, Z. (2022). Weibull Distribution-Based Neural Ne twork for Stochastic Capacity Estimation. Journal of Transportat ion Engineering, Part A: Systems , 148 (4). https://doi.org/10.1061/jtepbs.0000646
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment