Missing data has been a challenge in air quality measurement. In this study, we develop an input-adaptive proxy, which selects input variables of other air quality variables based on their correlation coefficients with the output variable. The proxy uses ordinary least squares regression model with robust optimization and limits the input variables to a maximum of three to avoid overfitting. The adaptive proxy learns from the data set and generates the best model evaluated by adjusted coefficient of determination (adjR2). In case of missing data in the input variables, the proposed adaptive proxy then uses the second-best model until all the missing data gaps are filled up. We estimated black carbon (BC) concentration by using the input-adaptive proxy in two sites in Helsinki, which respectively represent street canyon and urban background scenario, as a case study. Accumulation mode, traffic counts, nitrogen dioxide and lung deposited surface area are found as input variables in models with the top rank. In contrast to traditional proxy, which gives 20–80% of data, the input-adaptive proxy manages to give full continuous BC estimation. The newly developed adaptive proxy also gives generally accurate BC (street canyon: adjR2 = 0.86–0.94; urban background: adjR2 = 0.74–0.91) depending on different seasons and day of the week. Due to its flexibility and reliability, the adaptive proxy can be further extend to estimate other air quality parameters. It can also act as an air quality virtual sensor in support with on-site measurements in the future.
|DOI - pysyväislinkit|
|Tila||Julkaistu - 2020|
|OKM-julkaisutyyppi||A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä, vertaisarvioitu|
- 112 Tilastotiede
- 114 Fysiikka