Skip to main content

A simulation study to quantify the impacts of exposure measurement error on air pollution health risk estimates in copollutant time-series models

Abstract

Background

Exposure measurement error in copollutant epidemiologic models has the potential to introduce bias in relative risk (RR) estimates. A simulation study was conducted using empirical data to quantify the impact of correlated measurement errors in time-series analyses of air pollution and health.

Methods

ZIP-code level estimates of exposure for six pollutants (CO, NOx, EC, PM2.5, SO4, O3) from 1999 to 2002 in the Atlanta metropolitan area were used to calculate spatial, population (i.e. ambient versus personal), and total exposure measurement error.

Empirically determined covariance of pollutant concentration pairs and the associated measurement errors were used to simulate true exposure (exposure without error) from observed exposure. Daily emergency department visits for respiratory diseases were simulated using a Poisson time-series model with a main pollutant RR = 1.05 per interquartile range, and a null association for the copollutant (RR = 1). Monte Carlo experiments were used to evaluate the impacts of correlated exposure errors of different copollutant pairs.

Results

Substantial attenuation of RRs due to exposure error was evident in nearly all copollutant pairs studied, ranging from 10 to 40% attenuation for spatial error, 3–85% for population error, and 31–85% for total error. When CO, NOx or EC is the main pollutant, we demonstrated the possibility of false positives, specifically identifying significant, positive associations for copollutants based on the estimated type I error rate.

Conclusions

The impact of exposure error must be considered when interpreting results of copollutant epidemiologic models, due to the possibility of attenuation of main pollutant RRs and the increased probability of false positives when measurement error is present.

Peer Review reports

Background

Many epidemiologic studies of the health effects associated with ambient air pollution exposure have focused on evaluating the effects associated with a single pollutant. In reality, humans are exposed to a complex mixture of pollutants that can vary spatially and temporally. The challenges of examining the relationship between multipollutant exposures and health effects include high correlations between pollutants preventing the use of standard statistical models, correlation of measurement errors across pollutants, varying degrees of exposure error across pollutants, the potential for interaction between pollutants, and pollutant mixtures varying by location [15]. Further, these epidemiologic studies often use data from centrally located measurement sites as the exposure estimate, which may not accurately capture the exposure variability due to the spatial variability of pollutant concentrations. As an example, analysis of pollutants with primarily local sources (CO, NOx, and EC) which exhibit a large degree of spatial heterogeneity [68] may be more error prone when central-site (CS) monitor measurements are used to estimate exposure, compared to pollutants primarily originating from regional sources (PM2.5, SO4, O3) which may exhibit spatially homogeneous patterns and thus have a lower degree of measurement error when CS monitor estimates are used. Fixed-location ambient monitors also do not account for human exposure factors such as where people move in time and space (e.g. in-vehicle concentrations impacting exposure), and concentrations in different microenvironments (e.g. infiltration-related factors). There is therefore the potential for exposure measurement error when CS measurements are used as a surrogate for exposure in epidemiologic studies. The presence of exposure measurement error can lead to effect attenuation and reduced statistical power in the resultant health risk estimate [9].

Billionnet et al. summarized the literature on statistical methods to study the effect of multiple pollutants [4], and Oakes et al. published a review of multipollutant exposure metrics [10]. Both identified a comprehensive understanding of exposure measurement error in the context of multipollutant studies as a key issue to address and investigate further. However while these papers detail work on multipollutant exposure metrics and statistical methods for analyzing multipollutant exposures, previous work to quantify the impact of exposure error on health risk estimates has focused on single-pollutant time-series models [68, 1114]. While there are inherent difficulties in examining multipollutant exposures, we still do not have a clear understanding of the relationship of exposure measurement error in a more simplistic two pollutant model [1517].

To our knowledge this is the first work which uses empirical pollutant relationships and health data to quantify the impact of correlated exposure measurement error in copollutant time-series models on resultant health risk estimates in a simulation study. We consider Poisson regression and allow for additive, multiplicative, and correlated measurement errors, which are typically not considered in the classical/Berkson error framework [16]. We use previously described, daily exposure metrics ranging from CS measurements to more complex modeling approaches [18] to calculate empirical relationships between pollutants (PM2.5, SO4, O3, CO, NOx, EC). Relationships between exposure metrics were previously used to calculate estimates of exposure measurement error due to spatial variability of pollutant concentrations, and human exposure factors at the ZIP code level [17], but did not include the use of empirical health data nor analysis of an epidemiological model. Our previous work showed the potential for bias in model coefficients for copollutant models, motivating the current work to examine the degree of attenuation of relative risks (RRs) empirically. In the current paper, the previous work is extended in a simulation study by considering a Poisson time-series analysis of emergency department (ED) visits for respiratory disease in the Atlanta, GA metropolitan area to obtain empirical estimates of the attenuation in health risk estimates due to various sources of exposure measurement error.

Methods

Estimates of exposure and exposure measurement error

Three estimates of daily exposure to ambient PM2.5, EC, SO4, CO, NOx, and O3 were derived for 193 ZIP codes in the 20-county Atlanta metropolitan area. Pollutant-specific measured concentrations, modeled exposure estimates, and summary statistics have been described previously in detail [17, 18]. Briefly the three exposure assessment approaches include: (1) CS measurements (CS), (2) ambient air quality (AQ) estimates obtained by combining simulations from an air quality model for regional background and a dispersion model for the local contribution, and (3) estimates from a stochastic population exposure model (PE). For PE, the contribution from indoor sources was not included because of the desire to examine the association between the health outcomes and exposure to ambient pollution, and for comparability with the CS measurements and AQ estimates. All three approaches estimate exposures to ambient pollution at each ZIP code centroid in the study area. Daily estimates (8-h maximum for O3, 24-h average for other pollutants) for 1999–2002 were generated for each approach.

For each ZIP code, three types of exposure error (δspatial, δpopulation, δtotal) were calculated as the difference between two exposure metrics. Detailed summary statistics including the magnitude and variance of δ, and between-pollutant correlations of δ are presented in [17]. Briefly, the ZIP code-specific exposure error due to a lack of spatial refinement in the exposure estimate is represented by δspatial = AQ - CS, assuming that the difference between the more (i.e. AQ) and less (i.e. CS) spatially refined metrics gives an estimate of the amount of spatial measurement error that is present in the CS metric. This assumption is made because our air quality models add spatial variability to the AQ metric compared with CS measurements, which lack spatial variability because the same CS measurement was used to represent exposure in each ZIP code. Exposure error introduced when human exposure factors are not included in an exposure estimate is represented by δpopulation = PE – AQ. Our PE metric includes variability due to human exposure factors such as time-location-activity patterns of individuals, commuting patterns, and infiltration of ambient pollutants to indoor environments. A third type of exposure error, δtotal = PE – CS, represents the combined exposure error when both spatial variability and human exposure factors are not accounted for. δtotal does not represent all potential sources of exposure error that may be present in a study; instead it represents the total exposure error that we were able to assess in this analysis.

Epidemiologic model

To conduct the simulation, health data from a previously described epidemiologic study were used [19]. Briefly, individual-level data on ED visits for respiratory outcomes from 41 hospitals in the 20-county Atlanta area were aggregated to daily counts for each ZIP code. The same Poisson time-series model in S. Sarnat et al. [19] was used in this simulation study to examine the association between the above described exposure metrics and daily counts of ED visits. The basic form of the model was:

$$ \begin{array}{l} \log \left(E\left({Y}_{kt}\right)\right)=\propto +{\beta}_1 pollutio{n}_{1,kt}+{\beta}_2 pollutio{n}_{2,kt}+\\ {}{{\displaystyle \sum}}_k{\lambda}_kZI{P}_k+{{\displaystyle \sum}}_m{\lambda}_m DO{W}_{mt}+{{\displaystyle \sum}}_n{\nu}_n hospita{l}_{nt}+\\ {}g\left({\gamma}_1,\dots, {\gamma}_N;tim{e}_t\right)+{{\displaystyle \sum}}_o{\xi}_o IOtem{p}_{ot}+\\ {}{\eta}_1 dewp{t}_t+{\eta}_2 dewp{t}_t^2+{\eta}_3 dewp{t}_t^3+\\ {}{\delta}_1tem{p}_t+{\delta}_2tem{p}_t^2+{\delta}_3tem{p}_t^3\end{array} $$
(1)

where Y kt is the count of ED visits for respiratory outcomes in ZIP code k on day t. For each pollutant (pollution 1 and pollution 2 ), daily averages (8-h maximum for O3, 24-h average for all other pollutants) of same-day concentrations were used. To control for spatial autocorrelation in the baseline ED visits across the ZIP codes, an indicator variable for the kth ZIP code (ZIP k ) was used to represent the areas from which ED counts were spatially aggregated. Dummy variables for day of week and holidays (DOW, indexed by m) and for hospital (hospital, indexed by n) were used; the latter accounted for the differing durations of time each hospital was included in the study. Long-term trends and seasonality of health outcomes (time) were controlled for with parametric cubic splines with monthly knots (g(γ 1, …, γ N ; x)). Meteorology was controlled for using maximum temperature (IO temp ) with indicators for each degree Celsius, a cubic function for dew point (dewpt), a cubic function for minimum temperature (temp), and dummy variables for seasons. When simulating the health effect we assume zero lag, but expect results to be generalizable to single day lags.

Simulation of true exposures and health outcomes

True exposure was defined as an exposure estimate without spatial and/or population measurement error. To simulate the true daily exposure for each pollutant, linear models for each type of daily exposure were fitted separately for each ZIP code and for each pollutant; for example, for spatial measurement error, the model is given by AQ kt  = θ k,1 + θ k,2 * CS kt  + ε k,t for ZIP code k and day t, such that both additive (θ 1) and multiplicative bias (θ 2) between the two exposure metrics were accounted for. See Additional file 1: Table S1 for the mean and standard deviation across ZIP codes of estimates of θ k,1 and θ k,2. For each Monte Carlo iteration, we first calculated a ‘true’ exposure mean for each day by using the unrefined measurement as the predictor (i.e., CS for spatial or total measurement error, AQ for population measurement error) and a realization of the measurement error coefficients (θ 1 and θ 2) drawn from their asymptotic bivariate normal distribution. Across-ZIP code spatial heterogeneity in the measurement error coefficients was allowed, however for each ZIP code we assumed that additive and multiplicative bias remains constant across days. Finally, the residual of the true exposure (ε t ) was subsequently drawn from a bivariate normal distribution with covariance equal to the ZIP code specific empirical measurement error covariance (see Additional file 1: Table S2 for a summary of the median across ZIP codes of the correlation of measurement error covariance) for the two pollutants of interest, and then combined with the true exposure mean to obtain the full true exposure value (true_exp). We chose to simulate true exposure from the less refined exposure because our measurement error framework assumes that the true exposure is more variable than the error-prone (unrefined) exposure. The resultant measurement error between the true and error-prone exposures, derived here from empirical data, do not strictly follow the standard classical or Berkson measurement error models (Zeger et al. [16]).

Using observed health data from the study described in S. Sarnat et al. [19], a design matrix \( X \) (without the two pollutants) was constructed for the Poisson model in Equation (1). The Poisson mean \( \mu \) for each day and each ZIP code was calculated as:

$$ \log \left(\mu \right)=X\times B+ \log R{R}_1\ \frac{true\_ ex{p}_1}{IQ{R}_1}+ \log R{R}_2\ \frac{true\_ ex{p}_2}{IQ{R}_2} $$
(2)

where B is a vector of the estimated regression coefficients from Equation (1), subscripts 1 and 2 denote pollutants 1 and 2, RR 1 and RR 2 are the hypothetical RRs per interquartile range (IQR) increase to allow for comparison across pollutants, true_exp 1 and true_exp 2 are the simulated true exposures, and IQR 1 and IQR 2 are the interquartile ranges for the corresponding true exposure metrics. We assumed RR 1 = 1.05 and RR 2 = 1, and in subsequent text refer to pollutant 1 as the ‘main pollutant’ (i.e., the pollutant with an assumed health effect) and to pollutant 2 as the ‘copollutant’ (i.e., the pollutant assumed to have no effect). The assumed RR 1 = 1.05 was chosen based on previously established single-pollutant models for the same data set, specifically the O3-respiratory analysis [19]. Poisson counts were drawn from a Poisson distribution of the simulated Poisson mean μ. Two versions of the Poisson time-series model (Equation (1)) were then fitted: one using the simulated Poisson counts with the true exposure, and another using the same simulated Poisson counts, together with the unrefined exposure values (standardized by their corresponding IQRs). This resulted in two sets of estimated log RR: \( {\widehat{\beta}}_{1,\ true} \) and \( {\widehat{\beta}}_{2,\ true} \) for the ‘true exposure’ scenario, and \( {\widehat{\beta}}_{1,\ noisy} \) and \( {\widehat{\beta}}_{2,\ noisy} \) for the ‘noisy exposure’ scenario (i.e., using the unrefined exposure metric).

The above simulation was run 1000 times for each pollutant pair and measurement error scenario. The mean over the 1000 estimates for each RR was calculated and used as the overall point-estimate associated with each pollutant pair and measurement error scenario. As an example, the RR for the main pollutant, true exposure scenario, was calculated as:

$$ {\overline{RR}}_{1,\ true}= \exp \left(\frac{{\displaystyle {\sum}_{n=1}^N}{\widehat{\beta}}_{1,\ true,\ n}}{N}\right) $$
(3)

where n indexes the simulation iteration and N = 1000. Standard error of the RR was calculated as the standard error of the 1000 βs. A student’s two-sided t-test was used to determine significant differences between \( {\overline{RR}}_{noisy,a} \) and \( {\overline{RR}}_{noisy,b} \) for two different copollutants a and b, when paired with the same main pollutant, across Monte Carlo simulations.

Statistical analyses

Mean RR estimates from using the true and noisy exposures were compared to the true RR in order to determine if the presence of measurement error induces bias and a loss of power. Various summary statistics are presented, including percent attenuation, root mean-square error (RMSE), and power/type I error.

Percent attenuation of the RR for the main pollutant was calculated to determine the impact of inclusion of measurement error on attenuation of the RR. RMSE was calculated to assess both bias and loss of precision in the estimates. Lastly, statistical power/type I error was calculated to assess false positive associations. As an example, for pollutant 1, noisy scenario, we define:

$$ \%\ attenuatio{n}_{1, noisy} = \frac{R{R}_1 - \overline{R{R}_{1,\ noisy}}}{R{R}_1-1} $$
(4)
$$ RMS{E}_{1, noisy}=\sqrt{\frac{{\displaystyle {\sum}_{n=1}^N}{\left({\widehat{\beta}}_{1, noisy,\ n} - \log R{R}_1\right)}^2}{N}} $$
(5)
$$ Powe{r}_{1, noisy} = \mathrm{\mathbb{P}}\left(\frac{{\widehat{\beta}}_{1,\ noisy}}{s_{\beta_{1, noisy}}}>1.96\right) $$
(6)
$$ S{E}_{Powe{r}_1, noisy} = \sqrt{\frac{Powe{r}_{1, noisy}*\left(1- Powe{r}_{1, noisy}\right)}{N}} $$
(7)

where RR 1 = 1.05, RR = 1, N = 1000, β and s β (standard error of β) are obtained from the Poisson model (equation 1), and \( {\widehat{RR}}_{1,\ noisy} \) is defined as in equation 3. Quantities were similarly calculated for each pollutant pair and measurement error scenario. RMSE ratio was defined as RMSE noisy /RMSE true for pollutant 1 or 2.

All statistical analyses and model simulations were completed in R, version 3.0.2 (R Foundation for Statistical Computing; http://www.r-project.org/).

Results

Mean RRs for the main pollutant over 1000 runs of the health model for both the noisy and true exposure metrics are shown in Fig. 1. For all types of measurement error, resultant RR of the main pollutants (pollutant 1) when the health model used the true exposure metric (i.e., the simulated exposure metric without measurement error) is close to 1.05. Similarly, the resultant RR of the copollutants (pollutant 2) using the true exposure metric is close to 1, as expected due to the assumed RR 1 = 1.05 and RR 2 = 1. In contrast the presence of measurement error in the noisy exposure metric results in considerable attenuation for all types of measurement error (Fig. 1, Table 1). For spatial measurement error, we see greater attenuation for \( {\overline{RR}}_{1, noisy} \) when a local pollutant (CO, NOx, EC) is the main pollutant (29–40%) compared to a regional pollutant (PM2.5, SO4, O3; 10–15%) (ranges represent the attenuation across all copollutants models for a specific main pollutant). For population measurement error, there is little attenuation when CO is the main pollutant (3–4%), and substantial attenuation (82–85%) of \( {\overline{RR}}_{1, noisy} \) when NOx is the main pollutant. Similarly for total measurement error, the highest level of attenuation of \( {\overline{RR}}_{1, noisy} \) is seen when NOx is the main pollutant (85%), and the least attenuation for CO (31–32%). For all other pollutants, for both population and total measurement error, attenuation of \( {\overline{RR}}_{1, noisy} \) is moderate-high (Table 1). For all pollutants and all types of measurement error, attenuation of \( {\overline{RR}}_{1, noisy} \) mirrors the patterns of attenuation hypothesized in previous analyses (see figure four of Dionisio et al., 2014) [17] . The scenario with an assumed RR 1 = 1.05 and RR 2 = 1.05 was run as a sensitivity analysis. Resultant RR from the sensitivity analysis are presented in Additional file 1: Figure S1. Two-sided t-tests were used to compare differences in attenuation of the main pollutant RR by copollutant; there were minimal differences seen across copollutants for the case of RR 1 = 1.05 and RR 2 = 1, with more differences seen for the sensitivity analysis (see Additional file 1: Supplemental Text, Table S3, and Table S4). The coverage of the 95% confidence intervals was calculated for the main pollutant and copollutant for each of the different scenarios analyzed, and with one exception was in the range of 0.93–0.98.

Fig. 1
figure 1

Attenuation of RR due to measurement error in a copollutant model (RR1 = 1.05, RR2 = 1). For x-axis labels, top row indicates the main pollutant (pollutant 1), bottom row indicates the copollutant (pollutant 2). Overall point estimates shown are the mean over 1000 estimates; error bars indicate the 95th confidence interval for the 1000 estimates (i.e., the 2.5th and 97.5th percentiles of simulated effect estimates; note that extremely narrow confidence intervals result in some non-visible error bars). Note that when copollutant true and noisy model results do not differ substantially, plotted data points overlap. a Spatial measurement error (δspatial). b Population measurement error (δpopulation). c Total measurement error (δtotal)

Table 1 Percent attenuation of RR1,noisy by main pollutant and measurement error type

The RMSE ratios comparing the RMSE for the noisy and true model estimates are presented in Table 2, reflecting both bias and loss of precision. The RMSE ratio for the copollutants is typically small (≤2.0), while the RMSE ratio for the main pollutants ranges from 1.3 to 26.1. While the main pollutant RMSE ratio for regional pollutants (PM2.5, SO4, O3) ranges from 1.8 to 13.7, there is little difference seen across copollutants and measurement error type for the same main pollutant. In contrast, the copollutant can have a substantial impact on the main pollutant RMSE ratio when local pollutants (CO, NOx, EC) are the main pollutant. For example, for δspatial for CO, NOx, and EC as the main pollutant, the RMSE ratio of the main pollutant ranges from 6.6–11.5, 6.0–10.4, and 6.6 – 12.1 respectively, dependent upon the copollutant (Table 2).

Table 2 RMSE ratios from comparison of bipollutant models (RR1 = 1.05, RR2 = 1) with and without measurement error

Power estimates for all main pollutants were equal to 1, indicating we will always detect the health effect association for the main pollutant (results not shown) given the simulation setup. Copollutant type I error for the true scenarios (i.e., no measurement error) were approximately 0.05 (results not shown). We present the type I error for the copollutant, noisy scenario (i.e., with measurement error), when the copollutant has no effect (RR2 = 1) (Fig. 2). Figure 2 shows that for CO, NOx, or EC as the main pollutant there is a substantial increase in the type I error of the copollutant when spatial or total exposure measurement error is present in both the main pollutant and the copollutant, increasing the likelihood of detecting false positive associations for the copollutants when CO, NOx or EC are the main pollutant in these instances. For copollutants paired with NOx under the δpopulation scenario we see larger type I error values, particularly for NOx paired with EC. For PM2.5, SO4, and O3 as the main pollutant for δspatial and δtotal and for all pollutants except NOx for δpopulation, there is less likelihood of detecting false positive associations.

Fig. 2
figure 2

Type I error for the copollutant (pollutant 2) in a copollutant model (RR1 = 1.05, RR2 = 1), with exposure measurement error. For x-axis labels, top row indicates the main pollutant (pollutant 1), bottom row indicates the copollutant (pollutant 2). Red line indicates type I error = 0.05. Overall point estimates shown are the mean over 1000 estimates; error bars indicate the 95th confidence interval for the 1000 estimates. a Spatial measurement error (δspatial). b Population measurement error (δpopulation). c Total measurement error (δtotal)

Discussion

While previous work has examined the impact of exposure measurement error in single pollutant models [13] and studies have proposed statistical methods for handling multipollutant effects [15, 2022], little work has been done to quantify the impact of measurement error on health risk estimates in multipollutant models, including copollutant models. This study builds on previous work examining the potential attenuation of model coefficients in copollutant epidemiologic models based on empirical covariance structures [17]. While the previous work built the foundation for the current analysis through development of exposure metrics and calculation of empirical relationships between pollutants, the current manuscript extends the work with the addition of empirical health data applied in an epidemiological model with the previously developed exposure metrics. Using the empirical measurement error covariance structures, estimates of additive and multiplicative bias between exposures, and between-pollutant relationships, we extended the previous work by conducting a simulation study to assess the impact of measurement error on health risk estimates obtained from a copollutant time-series model. We show the substantial attenuation of RRs resulting from the presence of exposure measurement error in exposure estimates for most pollutants and measurement error types examined, an empirical result which agrees with the predictions in previous work [17]. Lastly, we have shown evidence for the detection of false positive associations between the copollutant and the health outcome, particularly when CO, NOx, or EC are the main pollutant.

The RMSE ratio comparing the RMSE of the risk estimates from the noisy and true exposures reflects both the bias and standard error of the estimates. When we compare the RMSE ratio for a main pollutant-measurement error type, we see that for regional pollutants as the main pollutant, RMSE ratio changes very little across copollutants. However the RMSE ratio of the local pollutants as main pollutants is impacted substantially depending upon whether regional or local pollutants are the copollutants. For all local pollutant- (as main pollutant) measurement error pairs except for CO-population measurement error, we see a substantial increase in RMSE ratio for the main pollutant when the copollutant is a regional pollutant, compared to when the copollutant is a local pollutant. Thus from examination of the RMSEs (Table 2) we conclude that the inclusion of a regional copollutant in a model with a local main pollutant substantially increases the degree of bias and standard error of the main pollutant estimates, compared to having a local copollutant. However Fig. 1 demonstrates that the copollutant (whether regional or local) has little effect on the degree of bias in the main pollutant’s effect estimate. The two results taken together allow us to conclude that the differences in RMSE across copollutant type are due to differences in standard error, rather than bias, with the spatial structure of the copollutant influencing the standard error but not the bias of the main pollutant’s effect estimate. We also see differences in the RMSE ratio of local pollutants as main pollutants when the copollutant is varied, even when the copollutants have an assumed null effect (RR = 1). This indicates that relationship of these copollutants and their error with the main pollutant does have an impact on the bias and standard error of main pollutant RR estimates.

Type I error estimates in this study show the likelihood of detecting a false positive effect for a copollutant with an assumed null effect (i.e., RR = 1) is often reduced when measurement error is not present, especially when CO, NOx, and EC are the main pollutants. It is likely that this impact is more evident for CO, NOx, and EC than it is for PM2.5, SO4, and O3 because the former are dominated by local sources, thus tend to be more spatially variable [17] and have higher degrees of correlated exposure measurement error. This can be important when trying to understand the independent effect of a pollutant that is a part of a complex mixture. For example, the EPA’s most recent Integrated Science Assessments (ISAs) for CO [23] and NOx [24] both state that while there is evidence of consistent positive associations between short-term exposure to CO and NO2 and effects on the respiratory system in epidemiologic studies, the challenge remains to disentangle the independent effect of CO or NO2 related health effects from the larger air pollution mixture. There is the possibility that CO and/or NOx are serving as indicators of combustion-related emissions, particularly from traffic, for some health outcomes. The moderate to high correlations between CO, NOx, and other pollutants generated from combustion processes (e.g., EC, PM2.5) further complicates interpretation of epidemiologic studies of these pollutants. This provides motivation for the use of exposure metrics which minimize measurement error in epidemiologic analyses.

While the simulation presented here has many strengths, including accounting for multiplicative bias and correlated measurement errors, and the use of empirical data, there are inherent limitations. First, we assume that the refined exposure metric in each of our three constructed measurement error types provides an accurate representation of measurement error. That is, we assume that the AQ exposure metric accurately and completely accounts for spatial variability ignored in the CS metric, and we assume the PE exposure metric appropriately accounts for population variability ignored in the AQ metric. The authors acknowledge that the AQ metric and PE metrics themselves include some error, but in the absence of a ‘true’ spatially refined metric (e.g., a fine-scale measurement/monitor network in the study area) and a set of population-based personal exposure measurements, we operate under the above assumptions. Any error present in the refined metrics will remain an influence on the calculated RRs. Further, though it is common to use an individual’s exposure to ambient-generated pollution in an epidemiologic study, the inclusion of exposure from indoor sources would allow for analysis of the impact of exclusion of this exposure source on commonly calculated RRs. Though we expect results presented here to have general applicability regarding the impact on bias of RRs in a copollutant time-series model, we acknowledge that results presented here are dependent on empirical data from the Atlanta metropolitan area. Multipollutant relationships may be different in other cities depending on the city-specific source profile and the magnitude of measurement errors may depend on the location of central monitor and population characteristics.

Simulation studies have examined the impact of measurement error in hypothetical multipollutant epidemiologic studies [16, 25]. Exposure measurement error is often categorized into two classes: Berkson error and classical error, which are often assumed to be independent of the true exposure [16]. In reality, exposure measurement error is a combination of the two. Adding to the complexity, the consequences of each error type are different [5, 26]. In most time-series studies, Berkson error will not bias the effect estimates, but will tend to increase the variance of the regression coefficients (increasing the width of the CIs and decreasing power), while classical measurement error tends to bias the true effect towards the null, with the magnitude of the effect attenuation depending on the error variance of the exposure estimate relative to the variance of the true exposure [16]. Empirical analyses have shown impacts of both types of error in single pollutant models [14, 2729]. When two pollutants are measured with error, the correlation between the pollutants, and the correlation between their errors, predicts the magnitude of the bias, while the sign of the correlation between the pollutants predicts the direction of the bias [16, 20]. In this analysis, we do not follow standard classical or Berkson measurement error models, but utilize the empirical relationships between different exposure metrics from a real-world epidemiologic study. By assuming that the more refined exposure metric is the true exposure while treating the less refined metric as the error-prone exposure, we observe effect attenuation and bias away from the null, depending on the copollutant pair considered. In addition, we note that time-series analyses typically rely on a small number of monitors to derive population-averaged exposure metrics. Recently there has been increasing interest in conducting spatio-temporal modeling of air pollutants via land-use regression models [30] and data fusion methods [31] to capture spatial heterogeneity within the study region. However, measurement errors are also associated with spatial predictions due to spatial smoothing and estimation uncertainty [21, 32].

Conclusions

To the best knowledge of the authors, this study is the first to quantify the impact of measurement error on health risk estimates in copollutant models using empirical data. In addition, this study directly addresses a question often considered when examining the collective body of evidence with respect to copollutant models: the impact of varying degrees of exposure measurement error on resultant risk estimates. Based on results presented here, the impact of measurement error in future studies of the health effects of exposure to air pollution must be considered, so that the true health impact of air pollution exposure is not underestimated, and to avoid false conclusions. In addition, future studies may investigate additional techniques and statistical methods for measurement error correction.

Abbreviations

AQ:

Ambient air quality estimates

CS:

Central-site measurements

ED:

Emergency department

ISA:

Integrated science assessment

PE:

Population exposure model estimates

RMSE:

Root mean-square error

RR:

Relative risk

References

  1. Dominici F, Peng RD, Barr CD, Bell ML. Protecting human health from Air pollution: shifting from a single-pollutant to a multipollutant approach. Epidemiology. 2010;21:187–94.

    Article  Google Scholar 

  2. Mauderly JL, Burnett RT, Castillejos M, Özkaynak H, Samet JM, Stieb DM, Vedal S, Wyzga RE. Is the air pollution health research community prepared to support a multipollutant air quality management framework? Inhal Toxicol. 2010;22:1–19.

    Article  CAS  Google Scholar 

  3. Vedal S, Kaufman JD. What does multi-pollutant air pollution research mean? Am J Respir Crit Care Med. 2011;183:4–6.

    Article  Google Scholar 

  4. Billionnet C, Sherrill D, Annesi-Maesano I. Study ObotG: estimating the health effects of exposure to multi-pollutant mixture. Ann Epidemiol. 2012;22:126–41.

    Article  Google Scholar 

  5. Bateson TF, Coull BA, Hubbell B, Ito K, Jerrett M, Lumley T, Thomas D, Vedal S, Ross M. Panel discussion review: session three - issues involved in interpretation of epidemiologic analyses - statistical modeling. J Expo Sci Environ Epidemiol. 2007;17:S90–6.

    Article  Google Scholar 

  6. Goldman GT, Mulholland JA, Russell AG, Srivastava A, Strickland MJ, Klein M, Waller LA, Tolbert PE, Edgerton ES. Ambient Air pollutant measurement error: characterization and impacts in a time-series epidemiologic study in Atlanta. Environ Sci Technol. 2010;44:7692–8.

    Article  CAS  Google Scholar 

  7. Sarnat SE, Klein M, Sarnat JA, Flanders WD, Waller LA, Mulholland JA, Russell AG, Tolbert PE. An examination of exposure measurement error from air pollutant spatial variability in time-series studies. J Expo Sci Environ Epidemiol. 2010;20:135–46.

    Article  CAS  Google Scholar 

  8. Strickland MJ, Gass KM, Goldman GT, Mulholland JA. Effects of ambient air pollution measurement error on health effect estimates in time-series studies: a simulation-based analysis. J Expo Sci Environ Epidemiol. 2015;25:160–6.

    Article  CAS  Google Scholar 

  9. Shy CM, Kleinbaum DG, Morgenstern H. The effect of misclassification of exposure status in epidemiological studies of air pollution health effects. Bull NY Acad Med. 1978;54:1155–65.

    CAS  Google Scholar 

  10. Oakes M, Baxter L, Long T. Evaluating the application of multipollutant exposure metrics in air pollution health studies. Environ Int. 2014;69:90–9.

    Article  CAS  Google Scholar 

  11. Setton E, Marshall JD, Brauer M, Lundquist KR, Hystad P, Keller P, Cloutier-Fisher D. The impact of daily mobility on exposure to traffic-related air pollution and health effect estimates. J Expo Sci Environ Epidemiol. 2011;21:42–8.

    Article  Google Scholar 

  12. Goldman GT, Mulholland JA, Russell AG, Strickland MJ, Klein M, Waller LA, Tolbert PE. Impact of exposure measurement error in air pollution epidemiology: effect of error type in time-series studies. Environ Health. 2011;10:1–11.

    Article  Google Scholar 

  13. Dominici F, Zeger SL, Samet JM. A measurement error model for time-series studies of air pollution and mortality. Biostatistics. 2000;1:157–75.

    Article  CAS  Google Scholar 

  14. Hart JE, Liao X, Hong B, Puett RC, Yanosky JD, Suh H, Kioumourtzoglou M-A, Spiegelman D, Laden F. The association of long-term exposure to PM2.5 on all-cause mortality in the Nurses’ Health Study and the impact of measurement-error correction. Environ Health. 2015;14:1–9.

    Article  CAS  Google Scholar 

  15. Chang HH, Peng RD, Dominici F. Estimating the acute health effects of coarse particulate matter accounting for exposure measurement error. Biostatistics. 2011;12:637–52.

    Article  Google Scholar 

  16. Zeger SL, Thomas D, Dominici F, Samet JM, Schwartz J, Dockery D, Cohen A. Exposure measurement error in time-series studies of air pollution: concepts and consequences. Environ Health Perspect. 2000;108:419–26.

    Article  CAS  Google Scholar 

  17. Dionisio KL, Baxter LK, Chang HH. An empirical assessment of exposure measurement error and effect attenuation in bipollutant epidemiologic models. Environ Health Perspect. 2014;122:1216–24.

    CAS  Google Scholar 

  18. Dionisio KL, Isakov V, Baxter L, Sarnat JA, Sarnat SE, Burke J, Graham SE, Mulholland J, Özkaynak H. Development and evaluation of alternative approaches for exposure assessment of multiple air pollutants in Atlanta, Georgia. J Expo Sci Environ Epidemiol. 2013;23:581–92.

    Article  CAS  Google Scholar 

  19. Sarnat SE, Sarnat JA, Mulholland J, Isakov V, Özkaynak H, Chang H, Klein M, Tolbert PE. Application of alternative spatiotemporal metrics of ambient air pollution exposure in a time-series epidemiological study in Atlanta. J Expo Sci Environ Epidemiol. 2013;23:593–605.

    Article  Google Scholar 

  20. Zeka A, Schwartz J. Estimating the independent effects of multiple pollutants in the presence of measurement error: an application of a measurement-error-resistant technique. Environ Health Perspect. 2004;112:1686–90.

    Article  CAS  Google Scholar 

  21. Bergen S, Sheppard L, Kaufman JD, Szpiro AA. Multipollutant measurement error in air pollution epidemiology studies arising from predicting exposures with penalized regression splines. J R Stat Soc: Ser C: Appl Stat. 2016;65(5):731–53.

  22. Schwartz J, Coull BA. Control for confounding in the presence of measurement error in hierarchical models. Biostatistics. 2003;4:539–53.

    Article  Google Scholar 

  23. U.S. EPA. Integrated science assessment for carbon monoxide. vol. EPA/600/R-09/019F. Research Triangle Park: National Center for Environmental Assessment-RTP Division, Office of Research and Development, U.S. Environmental Protection Agency; 2010.

    Google Scholar 

  24. EPA US. Integrated science assessment for oxides of nitrogen-health criteria. vol. EPA/600/R-08/071. Research Triangle Park: National Center for Environmental Assessment-RTP Division, Office of Research and Development, U.S. Environmental Protection Agency; 2008.

    Google Scholar 

  25. Zidek JV, Wong H, Le ND, Burnett R. Causality, measurement error and multicollinearity in epidemiology. Environmetrics. 1996;7:441–51.

    Article  Google Scholar 

  26. Thomas D, Stram D, Dwyer J. Exposure measurement error: influence on exposure-disease relationships and methods of correction. Annu Rev Public Health. 1993;14:69–93.

    Article  CAS  Google Scholar 

  27. Hart JE, Spiegelman D, Beelen R, Hoek G, Brunekreef B, Schouten LJ, Brandt Pvd. Long-Term Ambient Residential Traffic-Related Exposures and Measurement Error-Adjusted Risk of Incident Lung Cancer in the Netherlands Cohort Study on Diet and Cancer. Environ Health Perspect. 2015;123(9):860–66.

  28. Van Roosbroeck S, Li R, Hoek G, Lebret E, Brunekreef B, Spiegelman D. Traffic-related outdoor air pollution and respiratory symptoms in children: the impact of adjustment for exposure measurement error. Epidemiology. 2008;19:409–16.

    Article  Google Scholar 

  29. Li R, Weller E, Dockery DW, Neas LM, Spiegelman D. Association of indoor nitrogen dioxide with respiratory symptoms in children: application of measurement error correction techniques to utilize data from multiple surrogates. J Expo Sci Environ Epidemiol. 2006;16:342–50.

    Article  CAS  Google Scholar 

  30. Hu X, Waller LA, Lyapustin A, Wang Y, Al-Hamdan MZ, Crosson WL, Estes Jr MG, Estes SM, Quattrochi DA, Puttaswamy SJ, Liu Y. Estimating ground-level PM2.5 concentrations in the Southeastern United States using MAIAC AOD retrievals and a two-stage model. Remote Sens Environ. 2014;140:220–32.

    Article  Google Scholar 

  31. Friberg M, Zhai X, Holmes H, Chang HH, Strickland MJ, Sarnat SE, Tolbert PE, Russell AG, Mulholland JA. Method for fusing observational data and chemical transport model simulations to estimate spatiotemporally resolved ambient air pollution. Environ Sci Technol. 2016;50:3695–705.

    Article  CAS  Google Scholar 

  32. Szpiro AA, Paciorek CJ. Measurement error in two-stage analyses, with application to air pollution epidemiology. Environmetrics. 2013;24:501–17.

    Article  Google Scholar 

Download references

Acknowledgments

The authors would like to acknowledge Haluk Ozkaynak (U.S. EPA, retired), Vlad Isakov (U.S. EPA), Stefanie Sarnat (Emory University), Jeremy Sarnat (Emory University), and Jim Mulholland (Georgia Tech) for their contributions to the project. Although this work was reviewed by EPA and approved for publication, it may not necessarily reflect official Agency policy. EPA does not endorse the purchase of any commercial products or services mentioned in this publication.

Funding

This publication was made possible in part by US EPA grant R834799 and by grant R21ES022795-02 from the National Institutes of Health. Funding bodies did not play a role in the study design, data analysis, interpretation of results, or manuscript writing.

Availability of data and materials

After passing through internal clearance, the dataset supporting the conclusions of this article will be available on the publicly accessible Environmental Dataset Gateway (EDG) website of the U.S. EPA (https://edg.epa.gov/metadata/).

Authors’ contributions

All authors contributed to study design, provided critical input in manuscript development, and reviewed and approved the final manuscript. HHC provided guidance on statistical analyses and interpretation. KLD led data analysis and completed the first draft of the manuscript. LKB contributed to the introduction and discussion.

Competing interests

The authors declare that they have no competing interests.

Consent for publication

Not applicable.

Ethics approval and consent to participate

Not applicable.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kathie L. Dionisio.

Additional file

Additional file 1:

Supplemental materials including supplemental text, figures, and tables, are available as an additional file. (PDF 392 kb)

Rights and permissions

COPYRIGHT NOTICE. The article is a work of the United States Government; Title 17 U.S.C 105 provides that copyright protection is not available for any work of the United States government in the United States. Additionally, this is an open access article distributed under the terms of the Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0), which permits worldwide unrestricted use, distribution, and reproduction in any medium for any lawful purpose.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dionisio, K.L., Chang, H.H. & Baxter, L.K. A simulation study to quantify the impacts of exposure measurement error on air pollution health risk estimates in copollutant time-series models. Environ Health 15, 114 (2016). https://0-doi-org.brum.beds.ac.uk/10.1186/s12940-016-0186-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://0-doi-org.brum.beds.ac.uk/10.1186/s12940-016-0186-0

Keywords