Examining the Generalized Waring Model for the Analysis of Traffic Crashes

Peng, Yichuan

Examining the Generalized Waring Model for the Analysis of Traffic Crashes

Date

2013-05-03

Authors

Peng, Yichuan

Abstract

As one of the major data analysis methods, statistical models play an important role in traffic safety analysis. A common situation associated with crash data is the phenomenon known as overdispersion which has been discussed and investigated frequently in recent years. As such, researchers have proposed several models, such as the Poisson Gamma (PG) or Negative Binomial (NB), the Poisson-lognormal, or the Poisson-Weibull, to handle the overdispersion. Unfortunately, very few models have been proposed for specifically analyzing the sources of dispersions in the data. Better understanding of sources of variation and overdispersion could help in managing safety, such as establishing relationships and applying appropriate treatments or countermeasures, more efficiently.

Given the limitations of existing models for exploring the source of overdispersion of crash data, this research examined a new model function that could be applied to explore sources of extra variability through the use of the Generalized Waring (GW) models. This model, which was recently introduced by statisticians, divides the observed variability into three components: randomness, internal differences between road segments or intersections, and the variances caused by other external factors that have not been included as covariates in the model. To evaluate these models, GW models were examined using both simulated and empirical crash datasets, and the results were compared to the most commonly used NB model and the recently developed NB-Lindley models. For model parameter estimation, both the maximum likelihood method and a Bayesian approach were adopted for better comparison.

A simulation study was used to show the better performance of this model compared to NB model for overdispersed data, and then an application in the empirical crash data illustrates its capability of modeling data sets with great accuracy and exploring the source of overdispersion.

The performances of hotspot identification for these two kinds of models (i.e., GW models and NB models) were also examined and compared based on the estimated models from the empirical dataset. Finally, bias properties related to the choice of prior distributions for parameters in GW model were examined by using a simulation study. In addition, the suggestions on the choice of minimum sample size and priors were presented for different kinds of datasets.