Accident Analysis and Prevention 62 (2014) 161–167

Contents lists available at ScienceDirect

Accident Analysis and Prevention journal homepage: www.elsevier.com/locate/aap

Using hierarchical Bayesian binary probit models to analyze crash injury severity on high speed facilities with real-time traffic data Rongjie Yu a,∗ , Mohamed Abdel-Aty b a b

School of Transportation Engineering, Tongji University, 4800 Cao’an Road, 201804 Shanghai, China Department of Civil, Environmental and Construction Engineering, University of Central Florida, Orlando, FL 32826-2450, United States

a r t i c l e

i n f o

Article history: Received 6 November 2012 Received in revised form 24 July 2013 Accepted 10 August 2013 Keywords: Crash injury severity Binary probit model Random effects Hierarchical probit model Bayesian inference

a b s t r a c t Severe crashes are causing serious social and economic loss, and because of this, reducing crash injury severity has become one of the key objectives of the high speed facilities’ (freeway and expressway) management. Traditional crash injury severity analysis utilized data mainly from crash reports concerning the crash occurrence information, drivers’ characteristics and roadway geometric related variables. In this study, real-time traffic and weather data were introduced to analyze the crash injury severity. The space mean speeds captured by the Automatic Vehicle Identification (AVI) system on the two roadways were used as explanatory variables in this study; and data from a mountainous freeway (I-70 in Colorado) and an urban expressway (State Road 408 in Orlando) have been used to identify the analysis result’s consistence. Binary probit (BP) models were estimated to classify the non-severe (property damage only) crashes and severe (injury and fatality) crashes. Firstly, Bayesian BP models’ results were compared to the results from Maximum Likelihood Estimation BP models and it was concluded that Bayesian inference was superior with more significant variables. Then different levels of hierarchical Bayesian BP models were developed with random effects accounting for the unobserved heterogeneity at segment level and crash individual level, respectively. Modeling results from both studied locations demonstrate that large variations of speed prior to the crash occurrence would increase the likelihood of severe crash occurrence. Moreover, with considering unobserved heterogeneity in the Bayesian BP models, the model goodness-of-fit has improved substantially. Finally, possible future applications of the model results and the hierarchical Bayesian probit models were discussed. © 2013 Elsevier Ltd. All rights reserved.

1. Introduction Traffic safety is a major concern in the transportation industry; tremendous efforts have been dedicated by researchers and engineers to improve traffic safety. Crash occurrence has caused immense loss from the human, economic and social sides, especially the injury and fatality crashes. According to the National Highway Traffic Safety Administration (NHTSA), the fatality rate per 100 million vehicle miles traveled (VMT) has reached the lowest point of 1.10 in 2010 while the injury rate per 100 million VMT is the same as those of 2009 (NHTSA, 2012). Although it seems that traffic safety conditions of injury and fatality crashes have been improved, considering the improvements of vehicles and roadway designs, more efforts are needed to reduce the occurrence of injury and fatality crashes. Crash occurrence is attributed to a combination of influences from driver behaviors, environmental

∗ Corresponding author at: School of Transportation Engineering, Tongji University, 4800 Cao’an Road, 201804 Shanghai, China. Tel.: +86 13761572327. E-mail addresses: [email protected], [email protected] (R. Yu). 0001-4575/$ – see front matter © 2013 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.aap.2013.08.009

conditions, traffic flow, and geometry characteristics. Most previous crash injury severity studies mainly analyzed the data obtained from crash reports and geometrical roadway characteristics inventory (RCI) database, however, this approach is unable to capture the important micro-level contributing factors (like average speed) which are hypothesized to be critical factors affecting severe crash occurrence. Our previous study (Yu et al., 2013) introduced realtime traffic and weather data in developing safety performance functions and the results demonstrated that incorporating the realtime traffic and weather data in the crash frequency models have the merit of achieving better understanding of crash occurrence mechanisms for different seasons and crash types, and superior accuracy in model implementation. As an extension of the previous study, the real-time traffic data are utilized in this study to analyze the crash injury severity. In this study, we focus on crash injury severity analysis of high speed facilities (freeway and expressway). Two distinct high speed facilities were selected: (1) a 15-mile mountainous freeway on I-70 in Colorado which connects tourists’ skiing resorts and major cities; (2) State Road (SR) 408 (expressway) in the metro Orlando area which passes through downtown Orlando. Both the

162

R. Yu, M. Abdel-Aty / Accident Analysis and Prevention 62 (2014) 161–167

studied roadways are equipped with Automatic Vehicle Identification (AVI) system originally designed for electronic toll collection (ETC) and/or travel time estimation. AVI system captures and calculates the travel times for the traveling vehicles passing through the ETC lanes and based on their data provide travel time estimation to the travelers. Since the AVI system has fixed locations and the benefits of travel times from various vehicles, space-mean speed can be achieved. Previous studies (Ahmed and Abdel-Aty, 2011; Ahmed et al., 2012a, 2012b) have demonstrated that the AVI data can be utilized in the traffic safety studies. In this study, crash injury severity analyses (severe and non-severe) have been conducted for both selected roadways with the AVI data used to represent the traffic status prior to the crash occurrence. The identical analyses on two different roadway facilities were conducted to ensure the transferability of utilizing the standard deviation of AVI speed on crash injury severity prediction. Binary probit (BP) models were developed with both the maximum likelihood estimation and Bayesian inference approach. Moreover, in order to account for the unobserved heterogeneity at different levels, hierarchical Bayesian binary probit (HBBP) modeling structures were proposed and utilized. Furthermore, the importance of including real-time traffic data in crash injury severity studies, and future applications of the hierarchical Bayesian probit models had been demonstrated. 2. Background Ordered probit (OP) models are a natural fit to analyze crash injury severity because of the inherently ordered outcomes (property damage only, injury, and fatality). Thus, this method has been widely adopted in crash injury severity analysis. The BP model can be regarded as a special case of OP models. Abdel-Aty (2003) used ordered probit models to analyze drivers’ injury severity levels of roadway sections, signalized intersections, and toll plazas separately. The results of the ordered probit models have been compared with results achieved from multinomial logit and nested logit models, and it was concluded that ordered probit approach has a simple structure and provides good results. Xie et al. (2009) also employed ordered probit models to analyze the relationship between crash injury severity and driver’s characteristics, vehicle types and roadway conditions. Unlike the usual Maximum likelihood estimation inference technique, the Bayesian inference approach was adopted and the results have been compared with the MLE modeling results. The authors demonstrated that for a large sample, the results from both inference methods would not have much difference; while for the smaller data size, Bayesian OP models would produce more reliable parameter estimations. Haleem and Abdel-Aty (2010) examined crash injury severity at unsignalized intersections in Florida. Different approaches and models have been used to analyze three-legged and fourlegged intersection models separately. Ordered probit models have been estimated with five injury levels outcomes (no injury, possible injury, non-incapacitating injury, incapacitating injury, and fatal injury) and also the aggregated binary outcomes (severe and non-severe). Models’ fitting indicated that the aggregated binary probit models are superior to the disaggregated ordered probit model. Based on the basic formulation of the OP models, extensions like the generalized ordered probit models were also introduced in the crash injury severity analysis. Lemp et al. (2011) utilized heteroskedastic ordered probit models to analyze large truck crash severity. Unlike the traditional ordered probit model, the heteroskedastic OP model’s error term’s variance can be parameterized as a function of covariates. The Bayesian inference approach was used instead of the MLE method in their study. With the benefits of permitting variations of unobserved components, the authors

concluded that the heteroskedastic ordered probit model performs better than the traditional OP model. Moreover, other than the OP models, various statistical methodologies have also been employed in crash injury severity analysis. Shankar et al. (1996) estimated a nested logit model to analyze the crash severity levels and the authors concluded that a nested logit model accounting for shared unobservable factors between property damage and possible injury accidents has the best model fit. Huang et al. (2008) introduced hierarchical Bayesian binomial logistic models to perform the multi-vehicle crash injury severity analysis. By incorporating driver-vehicle units’ correlations in the same multi-vehicle crashes and the unobserved heterogeneity at the crash units’ level, a better model fit has been achieved. Moreover, by calculating the intra-class correlation coefficient (ICC) it was concluded that for the benefit of the proposed hierarchal model, 28.9% of the unexplained variations resulted from betweencrash variance was accounted for by the random effects. Milton et al. (2008) utilized a mixed logit model to investigate the accident severities along with the frequency model. The mixed logit model (or could be referred to as random parameter logit model) allows some variables to vary across different roadway segments and in this way the methodology could account for the unobserved effects (roadway characteristics, environmental factors and driver behavior). Wang and Abdel-Aty (2008) investigated left-turn crash injury severity with the partial proportional odds models. The partial proportional odds model is a generalized ordered probability model, which allows beta coefficients to vary across different alternatives. It was found that the generalized ordered logit models provide better model fit and parameter interpretations yielding better insight of the contributing factors compared to the ordered logit models. Quddus et al. (2010) investigated the relationship between traffic congestion and crash severity. Data from the crash reports, traffic characteristics and roadway geometry have been used. Different ordered response models were adopted and it was concluded that the increase in traffic flow would reduce the severity level of crashes. Recently, Savolainen et al. (2011) documented a review study of the various methodological alternatives used in the highway crashinjury severity studies. From the review it can be seen that the most popular methods used in analyzing crash injury severities are multinomial logit, nested logit, ordered logit and ordered probit and random parameters (mixed) logit models. In addition, other than these traditional statistical models, data mining techniques like artificial neural networks (Abdelwahab and Abdel-Aty, 2001), classification and regression tree (CART) models (Chang and Wang, 2006), and support vector machine (Li et al., 2012), are also used. Bayesian inference is an emerging approach to investigate traffic safety; it has been widely used in crash frequency studies (Guo et al., 2010; Ahmed et al., 2011; Miaou et al., 2011; Yu et al., 2013), real-time crash risk evaluation studies (Ahmed et al., 2012a, 2012b) and also crash injury severity analysis (Xie et al., 2009; Quddus et al., 2010). In this study, the basic BP models are estimated with both the MLE and Bayesian inference techniques and the results are compared. Moreover, hierarchical Bayesian BP models accounting for the unobserved heterogeneity at both the segment and crash individual levels are developed.

3. Data preparation 3.1. I-70 Data (Colorado) Four datasets were used to form the data for the I-70 study, (i) I70 crash data provided by Colorado Department of Transportation

R. Yu, M. Abdel-Aty / Accident Analysis and Prevention 62 (2014) 161–167

Crash Occurrence

AVI Detector MM 211

AVI Detector MM 213.6

Fig. 1. Example of an AVI segment and crash location.

(CDOT), (based on the AVI data availability, crash data from July, 2007 to July, 2009 and August, 2010 to April, 2011 were used), (ii) roadway segment geometry data obtained from the Roadway Characteristics Inventory (RCI), (iii) real-time weather data recorded by 6 weather stations along the study roadway segment, and (iv) realtime traffic data detected by 20 Automatic Vehicle Identification (AVI) detectors located on the east and west bounds along I-70. For each specific crash, the corresponding weather data and traffic data were identified and matched. AVI traffic data on I-70 provide average segment speed at 2-min intervals, and in order to decrease the data noises, raw data were aggregated into 6-min intervals. The mean, standard deviation and coefficient of variations of the AVI speed for 6-min interval were calculated at the data aggregation step and used as explanatory variables in the crash injury severity analysis to represent the traffic flow conditions when the crash occurred. Moreover, in order to avoid the errors of crash reporting time, speed variables (mean and standard deviation values) during 6–12 min prior to the documented crash time were defined as the crash traffic statuses. Crashes were assigned to the AVI segments based on the mile markers (MM), as Fig. 1 shows, if a crash happened at 10:26, of the MM 211.3. The corresponding traffic status for this crash is represented by the traffic condition of time interval 10:14 and 10:20 recorded by AVI segment between MM 211 and MM 213.6. For the weather data, visibility information from six weather stations has been utilized. Crashes have been assigned to the nearest weather station according to the Mile Markers. Weather conditions were archived in real-time; once the weather conditions’ changes have reached pre-set thresholds, a new record will be documented. The weather data were matched with each crash based on the reported crash times, and the closest weather records prior to the crash time have been extracted and used as the crash weather condition. A total of 670 crashes were documented within the study period. There are 2 fatality crashes, 47 injury crashes and 621 property damage only (PDO) crashes. In this study, in order to perform the binary probit model, the crashes were classified as non-severe crashes (PDO crashes only) and severe crashes (injury and fatality crashes). Table 1 provides the summary descriptive statistics of the final selected variables included in the models. Besides, a correlation matrix for the variables entered in the final models has been checked to avoid multicollinearity problems. 3.2. SR 408 Data (Florida) Two datasets were utilized for State Road 408, (i) one years’ crash data for 2009 obtained from the Crash Analysis Reporting (CAR) system; and (ii) real-time AVI data provided by the OrlandoOrange County Expressway Authority (OOCEA) (from January 7, 2009 to December 31, 2009). The crash data downloaded from the CAR system were augmented with the principal roadway geometric characteristics by matching the crashes with the Florida Department of Transportation (FDOT) RCI system. The crash data that were collected from CAR system has 315 crashes in which there are 112 PDO crashes and 203 severe crashes (injury and fatality crashes). AVI data for SR 408 was archived at 1-min interval and the raw data was then aggregated into the same 6-min intervals. Comparable

163

data preparation procedures have been done to match the crash data with the AVI speed information. After matching with the AVI data, a total of 249 crashes were extracted; among which there were 88 PDO crashes and 161 severe crashes. Table 2 provides the summary descriptive statistics of the final selected variables included in the models. 4. Methodology In the probit models (BP or OP models), a latent continuous variable was used to represent the functional relationship of the explanatory variables and an error term. The standardized normal latent variables can be represent as Ui = i + εi = X i ˇ + εi ,

with εi ∼N(0, 1),

where Xi represent the vectors of explanatory variables and ˇ are the coefficients to be estimated and εi is the error term. For the binary response variable case, we specify that Yi = 1 for Ui > 0 and Yi = 0 for Ui < 0 Then it can be obtained P(Yi = 1| X i ) = P(Ui > 0) = 1 − P(Ui < 0) = 1 − P(Ui − X i ˇ ≤ X i ˇ) = 1 − (− X i ˇ) = ( X i ˇ) = (i )where P(Yi ) is the probability that subject i belongs to category 1. Based on the standard latent variables formulation, here we introduced the random effects probit model which contains random effects results from either a group specific error term or individual error term, Uit = it + i + εit = X it ˇ + i εit , where data is from groups i = 1,. . .,N and individuals t = 1,. . .Ti in each group. The random effects i are assumed to be independent of the covariates and the error term. With adding the random effects i accounting for unobserved heterogeneity at group level (in this study, group means the roadway segments along the freeway), the formulation can be understood as the hierarchical binary probit (HBP) model. Moreover, the random effects can also be specified at the individual crash level by allowing the error term’s variance  it to follow a certain distributions which varies across the crashes, for example, εit ∼N(0, it ) and

it has a gamma prior as it ∼gamma(1, 1).

Full Bayesian inference was employed in this study. The key ‘hierarchical’ part of these models is that  the random effects ((ui ,εit ) is unknown and thus has its own prior distribution, p() The joint prior distribution is (Gelman et al., 2004) p(, ) = p()p(|), and the joint posterior distribution can be defined as p(, |y) ∝ p(, )p(y|, ) = p(, )p(y|). Based on the abovementioned equations, four types of models have been estimated in this study: maximum likelihood estimation binary probit models; Bayesian binary probit models; Hierarchical Bayesian binary probit (HBBP) models with segment level random effects; and HBBP model with individual level random effects. The MLE BP model was developed in SAS (SAS Institute, 2004) and the Bayesian BP models were estimated by WinBUGS (Spiegelhalter et al., 2003). For each Bayesian model, three chains of 15,000 iterations were set up in WinBUGS, 5000 iterations were used in the burn-in step. Convergences of the models have been checked by monitoring the MCMC trace plots for the model parameters: if all values are within a zone without strong periodicities and tendencies, then the model would be concluded as convergence. The significant variables in both models were selected based on the 95% confidence interval.

164

R. Yu, M. Abdel-Aty / Accident Analysis and Prevention 62 (2014) 161–167

Table 1 Summary of variables descriptive statistics for I-70 data. Variables

Description

Dependent variables Severity

1 if severe crashes; 0 if non-severe crashes

Independent variables Snow season Steep grade Std speed Visibility LogAADT

1 if crash occurred during snow season; 0 if dry season 1 if longitudinal grades ≥4%; 0 if longitudinal grades < 4% Standard deviation of speed Visibility condition (mile) Logarithmic transformation of AADT

5. Modeling results and discussions 5.1. I-70 models Four models were estimated for the 15-mile mountainous freeway on I-70: MLE binary probit model, Bayesian binary probit model and two hierarchical Bayesian binary probit models. As stated in previous studies (e.g., Ahmed et al., 2011), seasonal effects has a significant effect on crash occurrence. There are two distinct seasons in this area, the snow season from October to April and the dry season from May to September. Moreover, freeway segments have grades up to 8% and these steep grades were proved to have great influence on crash occurrence. In this study, these important factors along with the other explanatory variables were utilized to analyze the crash injury severity. Furthermore, hierarchical models were established based on the homogenous segment level and the crash individual level. The studied freeway segment was divided into 120 homogenous segments for both directions based on the geometric characteristics and AADT, more information about the segmentation can be found in the previous study (Ahmed et al., 2011). Tables 3 and 4 show the estimates and confidence intervals of the significant variables and the model goodness-of-fit. From Tables 3 and 4 it can be seen that a total of four explanatory variables were found to be significant in the models: two binary indicator variables, one real-time traffic variables and one weather variable. For the two binary factors: (1) the binary snow season indicator has a negative sign which indicates that relative to the non-severe crashes, severe crashes are more likely to happen during the dry seasons. During the snow seasons, adverse weather conditions may happen frequently and the road surface would be slippery, drivers are more likely to drive slowly and carefully; (2) the steep slopes indicator (Steep grade) was also found to be significant with a positive sign which can be interpreted as steep slopes increase the probability severe crashes. Regarding the real-time traffic and weather variables, standard deviation of the speed 6–12 min prior to the crash time is significant with a positive sign. This is an interesting finding since it is the first time that real-time speed was used in the crash injury severity studies (Quddus et al. (2010) tried to include the 15-min average speed but it was removed from the model since it is highly

Mean 0.073 0.81 0.52 1.34 1.81 10.25

Std. dev. 0.26 0.39 0.49 1.92 1.61 0.06

Min

Max

0

1

0 0 0 0 10.14

1 1 13.2 7.1 10.28

associated with the key variable of traffic congestion). When the freeway operated at a relatively high speed, sudden variations of speed are likely to increase the probabilities of having a severe crash. Furthermore, the visibility condition and snow season interaction variable is significant for the snow seasons. This can be understood that during snow seasons, visibility plays an important role in classifying severe and non-severe crashes; bad visibility conditions increase the likelihood of having severe crashes. With bad visibility conditions, driving maneuvers (like lane changing) are much more difficult as compared to good visibility situations which would lead to more severe crashes. While in the dry season, it turned out that visibility does not have any key impact on severe crash occurrence since the 95% credible interval across zero. Moreover, the exposure variable LogAADT is not significant in all the models. Comparisons have been made first between the MLE and Bayesian binary probit models. Although parameters estimates from the two models are similar, the Bayesian BP model is superior to the MLE BP model. This is because the two explanatory variables (Steep grade, Visibility × snow) are only significant at 90 percentile while in the Bayesian BP models they are significant at the 95% level. Then the Bayesian BP model has been compared with the segment level random effects hierarchical BP model. DIC was selected as the evaluation measure. The DIC, recognized as Bayesian generalization of AIC (Akaike information criterion), is a combination measure of model fit and the effective number of parameters. According to Spiegelhalter et al. (2003), differences of more than 10 might definitely rule out the model with higher DIC. With the random effects accounting for the segment level unobserved heterogeneity, DIC of the hierarchical model is substantially lower than the basic model (306.57 vs. 337.08). This phenomenon indicates that there are some unobserved heterogeneity that was not captured by the basic model at the homogenous segments level; crashes happened within the same homogenous segments shared some unobserved important factors (such as un-captured geometric characteristics, driver behavior, etc.). Finally, comparisons have been made between the two hierarchical Bayesian binary probit models. After introducing the crash individual level random effects, the DIC has been reduced (from 306.57 to 271.08), which demonstrated that with a

Table 2 Summary of variables descriptive statistics for SR 408 data. Variables

Description

Mean

Std. dev.

Min

Dependent variables Severity

1 if severe crashes; 0 if non-severe crashes

0.64

0.48

0

Independent variables Passenger car Daytime Impact point Shoulder width Std speed

1 if passenger car; 0 if other vehicle types 1 if crash occurred during daytime; 0 if nighttime 1 if driver side; 0 if others Outside shoulder width Standard deviation of speed

0.59 0.64 0.42 9.62 0.65

0.49 0.48 0.49 0.80 1.24

0 0 0 8 0

Max 1 1 1 1 12 10.92

R. Yu, M. Abdel-Aty / Accident Analysis and Prevention 62 (2014) 161–167

165

Table 3 Parameters estimates and model fit for I-70 binary probit models. MLE binary probit model

Bayesian binary probit model

Variables

Mean

Std

2.5%

97.5%

Mean

Std

2.5%

97.5%

Explanatory variables Snow season Steep grade Std speed Visibility* snow Visibility* dry Threshold

−1.01 0.32* 0.09 −0.19* 0.01 1.62

0.29 0.16 0.03 0.09 0.06 0.29

−1.58

−0.44 −0.006 0.15 0.01 0.13 2.18

−1.04 0.63 0.09 −0.21 0.01 0.91

0.29 0.32 0.03 0.10 0.06 0.27

−1.64 0.16 0.03 −0.44 −0.12 0.35

−0.49 0.03 0.15 −0.04 0.13 1.42

LR index DIC Number of observations *

0.03 −0.37 −0.11 1.05 0.081 N/A

0.64

N/A 337.08

670

Significant at 90 percentile.

Table 4 Parameters estimates and model fit for I-70 hierarchical Bayesian BP models. Segment-level random effects model

Crash-level random effects model

Variables

Mean

Std

2.5%

97.5%

Mean

Std

2.5%

97.5%

Explanatory variables Snow season Steep grade Std speed Visibility × snow Visibility × dry Threshold Random effects

−1.51 0.48 0.14 −0.31 0.005 1.31 0.18

0.45 0.25 0.05 0.15 0.09 0.43 0.19

−2.39 0.01 0.04 −0.65 −0.19 0.43 0.004

−0.61 0.99 0.23 −0.04 0.18 2.16 0.69

−2.03 0.66 0.18 0.42 0.02 1.81 0.57

0.73 0.35 0.08 0.22 0.13 0.74 0.66

−3.64 0.06 0.05 −0.94 −0.24 0.56 0.13

−0.77 1.44 0.34 −0.06 0.26 3.57 2.29

DIC Number of observations

670

306.57

271.08

more flexible error term, the model fit can be substantially improved. 5.2. SR 408 models The same four models were also applied to the expressway SR 408 in the Orlando area. Unlike the I-70 which has only 7.3% severe crashes, the majority of SR 408’s crashes are severe crashes. The two roadways are substantially different since unlike I-70, SR408 is flat and the weather does not play an important role. However, due to the similar AVI systems on SR408, identical crash injury severity analysis would confirm the effects of AVI speed standard deviation on crash injury prediction. In addition, as the expressway passes through downtown Orlando, it is important to reduce the number of severe crashes to decrease traffic delays and economic losses. Tables 5 and 6 provide the estimates of the significant variables at 95 percentile and the model goodness-of-fits for the developed models.

There are total of five explanatory variables that are found to be significant in the SR 408 models. Passenger cars indicator is significant with a positive sign demonstrating that passenger cars are more likely to experience severe crashes, which is consistent with a previous crash injury severity study (that dealt with the same study area Abdel-Aty, 2003). The daytime indicator relative to nighttime is significant with a positive sign which indicates that severe crashes are more probable to happen during daytimes. This result contradicts with most previous studies which concluded that during nighttime (bad lighting conditions), severe crashes have the higher possibility to happen. However, since this roadway is a newly built expressway with good lighting equipments while during daytime exposure is higher, it is more likely to have severe crashes during the daytime period. Moreover, the impact points indicator (driver side) is also significant which can be interpreted as if the crash happened at the driver side of the vehicle, the crash have more potential to end up as a severe crash. For the roadway geometric characteristics, the outside shoulder width becomes significant

Table 5 Parameters estimates and model fit for SR 408 binary probit models. MLE binary probit model

Bayesian binary probit model

Variables

Mean

Std

2.5%

97.5%

Mean

Std

2.5%

97.5%

Explanatory variables Passenger car Daytime Impact point Shoulder width Std speed Threshold

0.34 0.29* 0.44 −0.19* 0.16* −2.2

0.17 0.17 0.18 0.11 0.09 1.08

0.007 −0.04 0.08 0.29 −0.01 −4.32

0.67 0.62 0.79 1.25 0.34 −0.08

0.35 0.28* 0.45 −0.22 0.18 −1.88

0.18 0.18 0.18 0.11 0.09 1.03

0.01 −0.06 0.11 −0.46 0.01 −4.2

0.71 0.63 0.80 −0.05 0.38 −0.17

LR index DIC Number of observations *

Significant at 90 percentile

0.064 N/A 249

N/A 314.62

166

R. Yu, M. Abdel-Aty / Accident Analysis and Prevention 62 (2014) 161–167

Table 6 Parameters estimates and model fit for SR 408 hierarchical Bayesian BP models. Segment-level random effects model

Crash-level random effects model

Variables

Mean

Std

2.5%

97.5%

Mean

Std

2.5%

97.5%

Explanatory variables Passenger car Daytime Impact point Shoulder width Std speed Threshold Random effects

0.57 0.47 0.64 −0.36 0.29 −2.95 0.31

0.26 0.24 0.25 0.10 0.15 1.02 0.20

0.02 0.004 0.16 −0.56 0.02 −4.79 0.03

1.07 0.95 1.13 −0.15 0.62 −0.98 0.56

0.63* 0.55* 0.83 −0.38 0.33 −3.21 0.64

0.38 0.36 0.38 0.21 0.2 1.03 0.63

−0.04 −0.08 0.19 −0.78 0.006 −5.03 0.14

1.48 1.4 1.68 −0.004 0.79 −0.47 2.45

DIC Number of observations

249

*

293.98

272.80

Significant at 90 percentile.

with a negative sign. This can be understood as larger outside shoulder segments are less likely to have severe crashes. Furthermore, again the standard deviation of the speed during the 6-min interval is significant and showed a positive relationship with severe crash occurrence. As explained in the I-70 models, the larger variation of speed would increase the hazardousness of having severe crashes. Model comparisons have also been made for the SR 408 models. Compared to the MLE binary probit model, the Bayesian binary probit model increases the significance level of the shoulder width and standard deviation of speed variables. And these two models outcomes have comparable values for the estimated coefficients. In addition, by adding the segment level random effects to the Bayesian BP model, the hierarchical Bayesian BP model has significantly lower DIC values (272.80 vs. 293.98). In this case, the segment level random effects were formed based on the 24 AVI segments along the expressway; these segments represent different traffic composition, peak hours, surrounding land use types and etc. However, by allowing the random errors to vary across different crashes, the model fit is very comparable to the segment level random effects BP model (293.98 vs. 272.80).

6. Conclusion Crash injury severity analysis is important for freeway traffic management strategies. This paper investigated crash injury severity of a mountainous freeway and an urban expressway separately. In addition to the traditional explanatory variables used in the crash injury severity, like data from crash reports, geometric variables captured from roadway characteristics inventory (RCI), this study introduced real-time traffic and weather data to look to investigate the crash injury severity occurrence mechanisms. The identical analyses were performed for two different roads to ensure the transferable effects of standard deviation of AVI speed in crash injury severity analysis. The Automatic Vehicle Identification (AVI) system was implemented for the toll collection and/or travel time estimation along the roadways. The AVI data provide space mean speed for every AVI segments. Moreover, real-time weather data are available for I-70 with visibility conditions recorded by 6 weather stations along the studied freeway segment. Binary probit models were developed to analyze the severe and non-severe crashes in this study. For the I-70 data, it was concluded that severe crashes are more likely to happen during dry seasons; at steep grades (longitudinal grades more than 4%) segments. During the snow seasons, bad visibility conditions would increase the likelihood of having severe crashes. More importantly, standard deviation of speed 6–12 min prior to the crash occurrence is significant with a positive sign which indicates that large variations of speed would increase the possibility of severe crashes.

For SR 408, passenger cars were found to be more likely to experience severe crashes and most severe crashes occurred during daytime maybe because of the high volume. Furthermore, if the impact of the collision happened at the drivers’ side, then crashes would have a high probability to be severe. Large outside shoulder width segments would have less severe crashes and again the standard deviation of speed 6–12 min prior to the reporting crashtimes is significant with a positive sign. These findings can be used in traffic management strategies like variable speed limit (VSL) systems. Once a large variation of speed is detected, speed limits can be changed to decrease the variation and improve the speed homogeneity. However, due to the distinct geometry characteristics, demographic and driver behaviors, diverse significant variables were concluded for the two studied roadways. The same real-time traffic variable, the standard deviation of speed 6–12 min prior to the reporting crash-times was shown to be significant with consistent results. As for the methodology part, several binary probit models have been developed for each study location. With the basic binary probit model structure, the Bayesian inference approach was proven to be better than the Maximum likelihood estimation method since the significance levels of the variables can be improved. Furthermore, different levels of random effects were introduced in the hierarchical Bayesian BP models: (1) with the random effects at segment level accounting for the grouped unobserved heterogeneity, the model fit was improved substantially; (2) by allowing the random effects at the individual crash level to capture the individual observation unobserved heterogeneity, the model fit can also been significantly improved. These findings indicate that the standard error term that follows N ∼ (0,1) have many restrictions and with more flexible error terms, better model fit can be achieved. An example is the heteroskedastic ordered probit model, where the variance of the error term might be a function of covariates can increase the model’s performance significantly (Lemp et al., 2011). Moreover, the proposed hierarchical Bayesian probit models can have promising future applications in solving the spatial and temporal correlations of the crash-injury severity data. Similar to the random effects it can be formulized to capture the spatial dependence between neighboring segments and intersections (e.g., Conditional Autoregressive model) while accounting for the segment or intersection level unobserved heterogeneity. Also the proposed BP model framework can be used to estimate heteroskedastic ordered probit model and multi-level crash injury analysis. However, there are limitations for the crash injury severity predictors in this study. Due to the restriction of AVI data which only contain speed information, other traffic flow parameters’ (e.g. volume and occupancy) influences on severe crash occurrence need to be investigated. Moreover, models in this study

R. Yu, M. Abdel-Aty / Accident Analysis and Prevention 62 (2014) 161–167

lack of incorporating driver information variables (e.g. driver’s age, gender). Roadways implemented with additional real-time traffic data detectors (such as loop detectors, video recording systems) and have more complete driver information are recommended for future studies to focus on. Furthermore, driver behaviors’ effects on severe crash occurrence are also not included in this study. With the naturalistic driving study data available, human errors’ influence on crash occurrence can also be investigated. References Abdel-Aty, M., 2003. Analysis of driver injury severity levels at multiple locations using ordered probit models. Journal of Safety Research 34, 597–603. Abdelwahab, H., Abdel-Aty, M., 2001. Development of artificial neural network models to predict driver injury severity in traffic accidents at signalized intersections. Transportation Research Record: Journal of the Transportation Research Board 1746, 6–13. Ahmed, M., Abdel-Aty, M., Yu, R., 2012a. Assessment of the interaction between crash occurrence, mountainous freeway geometry, real-time weather and AVI traffic data. In: Proceedings of the Transportation Research Board, Washington, DC. Ahmed, M., Huang, H., Abdel-Aty, M., Guevara, B., 2011. Exploring a Bayesian hierarchical approach for developing safety performance functions for a mountainous freeway. Accident Analysis and Prevention 43, 1581–1589. Ahmed, M., Abdel-Aty, M., Yu, R., 2012b. A Bayesian updating approach for real-time safety evaluation using AVI data. In: Proceedings of the Transportation Research Board, Washington, DC. Ahmed, M., Abdel-Aty, M.A., 2011. The viability of using automatic vehicle identification data for real-time crash prediction. IEEE Transactions on Intelligent Transportation Systems 99, 1–10. Chang, L., Wang, H., 2006. Analysis of traffic injury severity: an application of nonparametric classification tree techniques. Accident Analysis & Prevention 38, 1019–1027. Gelman, A., Carlin, J., Stern, H., Rubin, D., 2004. Bayesian Data Analysis. Chapman and Hall/CRC Press, Boca Raton, FL.

167

Guo, F., Wang, X., Abdel-Aty, M., 2010. Modeling signalized intersection safety with corridor-level spatial correlations. Accident Analysis and Prevention 42, 84–92. Haleem, K., Abdel-Aty, M., 2010. Examining traffic crash injury severity at unsignalized intersections. Journal of Safety Research 41, 347–357. Huang, H., Chin, H., Haque, M., 2008. Severity of driver injury and vehicle damage in traffic crashes at intersections: a Bayesian hierarchical analysis. Accident Analysis & Prevention 40, 45–54. Lemp, J., Kockelman, K., Unnikrishnan, A., 2011. Analysis of large truck crash severity using heteroskedastic ordered probit models. Accident Analysis & Prevention 43, 370–380. Li, Z., Liu, P., Wang, W., Xu, C., 2012. Using support vector machine models for crash injury severity analysis. Accident Analysis & Prevention 45, 478–486. Miaou, S., Song, J., Mallick, B.K., 2011. Roadway traffic crash mapping: a space–time modeling approach. Journal of Transportation and Statistics 6, 33–58. Milton, J., Shankar, V., Mannering, F., 2008. Highway accident severities and the mixed logit model: an exploratory empirical analysis. Accident Analysis & Prevention 40, 260–266. NHTSA, 2012. 2010 Motor Vehicle Crashes: Overview, http://www-nrd.nhtsa.dot.gov/Pubs/811552.pdf Accessed on 15.09.12. Quddus, M., Wang, C., Ison, S., 2010. Road traffic congestion and crash severity: econometric analysis using ordered response models. Journal of Transportation Engineering 136, 424–435. SAS Institute, 2004. SAS/graph (r) 9.1 Reference. SAS Institute Inc., Cary, NC. Savolainen, P., Mannering, F., Lord, D., Quddus, M., 2011. The statistical analysis of highway crash-injury severities: a review and assessment of methodological alternatives. Accident Analysis & Prevention 43, 370–380. Shankar, V., Mannering, F., Barfield, W., 1996. Statistical analysis of accident severity on rural freeways. Accident Analysis & Prevention 28, 391–401. Spiegelhalter, D., Thomas, A., Best, N., Lunn, D., 2003. Winbugs User Manual. MRC Biostatistics Unit, Cambridge. Wang, X., Abdel-Aty, M., 2008. Analysis of left-turn crash injury severity by conflicting pattern using partial proportional odds models. Accident Analysis & Prevention 40, 1674–1682. Xie, Y., Zhang, Y., Liang, F., 2009. Crash injury severity analysis using Bayesian ordered probit models. Journal of Transportation Engineering 135, 18–25. Yu, R., Abdel-Aty, M., Ahmed, M., 2013. Bayesian random effect models incorporating real-time weather and traffic data to investigate mountainous freeway hazardous factors. Accident Analysis & Prevention 50, 371–376.

Using hierarchical Bayesian binary probit models to analyze crash injury severity on high speed facilities with real-time traffic data.

Severe crashes are causing serious social and economic loss, and because of this, reducing crash injury severity has become one of the key objectives ...
439KB Sizes 0 Downloads 0 Views