Conclusion.
A number of
studies have attempted to ascertain whether NHTSA frontal crash tests reflect
real-world crash outcomes. Generally,
studies that restrict the range of crashes considered as closely as possible to
the specific circumstances of the crash test (head-on collision with the driver
wearing seat belts) indicate a substantial difference in risk of fatality or
serious injury for vehicles with better crash test performance compared to
those with worse crash test performance.
But outside those specific circumstances, six studies provide little
evidence that crash test performance consistently signals safety in the general circumstances of a serious
crash. But these studies did not
adequately address the problem of driver heterogeneity: systematic differences
in the distribution of driver characteristics including propensity for riskier
driving behaviors. Our analytical
approach is quite different in that we take only the information from different
crash test ratings within a vehicle line to test whether crash test ratings predict
driver fatality risk. So instead of
obtaining information on fatality risk and ratings from, in effect, comparing a
high-performance sports car with better crash test ratings to a staid sedan
with lower ratings, we construct our estimates from differences in fatality
risk when the sports car or the staid sedan are re-tested. By doing so, we control for such differences
in driver heterogeneity that persist within a vehicle line.
Our conclusions regarding the
predictive validity of crash test ratings are decidedly mixed by vehicle type. For passenger cars, our analyses lead to the
strongest evidence to date supporting the predictive validity of crash test
ratings for driver fatality risk: In our
main analysis, we find statistically significant differences in fatality risk
for NHTSA one-, two-, three-, and four-star ratings compared to five star
ratings, with our estimates indicating a 7% to 36% increase in driver deaths
compared to a vehicle with a five-star rating.
The only anomaly in our results for the NHTSA star ratings is that the
two-star rating has the highest risk (a 36% increase in deaths relative to a
five-star rating versus 18% for the one-star rating). The size of these estimates are consistent
with the estimates in Kahane (1994), but while he found no statistically significant
evidence of differences in fatality risk between better and average crash test
we do observe statistically and practically significant differences between
five-star ratings and four- or three-star ratings. For trucks (pickups, sport utility vehicles,
minivans, and vans), however, we find no statistically significant differences
in fatality risk for different NHTSA star ratings (and even the order of the estimates
does not correspond to the star ratings).
We also examined vehicle lines tested twice or more in IIHS offset crash tests. While acknowledging the limitation of a sample size of approximately one-fifth the size for the NHTSA crash tests, the pattern of results was similar to that for the NHTSA ratings. In our main analysis of passenger cars, we found large, statistically significant differences in risk for vehicles rated "Poor" and "Marginal" compared to vehicles rated "Good"; for trucks, however, there were no statistically significant differences and the point estimates did not correspond to the IIHS ratings (e.g., the point estimate for a rating of "Poor" suggested a 31% fewer driver deaths compared to a rating of "Good"). Finally, among the 22 passenger car vehicle lines tested twice or more by both IIHS and NHTSA, we provided evidence that the two types of crash tests provided different information (because, for example, the tests showed differences even in the direction of the ratings at when retests occurred), but that both sets of ratings predicted driver fatality risk. Hence, our tentative evidence, based on just 22 passenger car vehicle lines, suggests that the two crash test types provide complementary information about passenger car safety.