Constructing a data-driven receptor model for organic and inorganic aerosol

a synthesis analysis of eight mass spectrometric data sets from a boreal forest site

Research output: Contribution to journalArticleScientificpeer-review

Abstract

The interactions between organic and inorganic aerosol chemical components are integral to understanding and modelling climate and health-relevant aerosol physicochemical properties, such as volatility, hygroscopicity, light scattering and toxicity. This study presents a synthesis analysis for eight data sets, of non-refractory aerosol composition, measured at a boreal forest site. The measurements, performed with an aerosol mass spectrometer, cover in total around 9 months over the course of 3 years. In our statistical analysis, we use the complete organic and inorganic unit-resolution mass spectra, as opposed to the more common approach of only including the organic fraction. The analysis is based on iterative, combined use of (1) data reduction, (2) classification and (3) scaling tools, producing a data-driven chemical mass balance type of model capable of describing site-specific aerosol composition. The receptor model we constructed was able to explain 83 +/- 8% of variation in data, which increased to 96 +/- 3% when signals from low signal-to-noise variables were not considered. The resulting interpretation of an extensive set of aerosol mass spectrometric data infers seven distinct aerosol chemical components for a rural boreal forest site: ammonium sulfate (35 +/- 7% of mass), low and semi-volatile oxidised organic aerosols (27 +/- 8% and 12 +/- 7 %), biomass burning organic aerosol (11 +/- 7 %), a nitrate-containing organic aerosol type (7 +/- 2 %), ammonium nitrate (5 +/- 2 %), and hydrocarbon-like organic aerosol (3 +/- 1 %). Some of the additionally observed, rare outlier aerosol types likely emerge due to surface ionisation effects and likely represent amine compounds from an unknown source and alkaline metals from emissions of a nearby district heating plant. Compared to traditional, ionbalance-based inorganics apportionment schemes for aerosol mass spectrometer data, our statistics-based method provides an improved, more robust approach, yielding readily useful information for the modelling of submicron atmospheric aerosols physical and chemical properties. The results also shed light on the division between organic and inorganic aerosol types and dynamics of salt formation in aerosol. Equally importantly, the combined methodology exemplifies an iterative analysis, using consequent analysis steps by a combination of statistical methods. Such an approach offers new ways to home in on physicochemically sensible solutions with minimal need for a priori information or analyst interference. We therefore suggest that similar statisticsbased approaches offer significant potential for un- or semi-supervised machine-learning applications in future analyses of aerosol mass spectrometric data.
Original languageEnglish
JournalAtmospheric Chemistry and Physics
Volume19
Issue number6
Pages (from-to)3645-3672
Number of pages28
ISSN1680-7316
DOIs
Publication statusPublished - 21 Mar 2019
MoE publication typeA1 Journal article-refereed

Fields of Science

  • POSITIVE MATRIX FACTORIZATION
  • QUALITY INTERACTIONS EUCAARI
  • EUROPEAN INTEGRATED PROJECT
  • SOURCE APPORTIONMENT
  • CHEMICAL-COMPOSITION
  • MULTILINEAR ENGINE
  • SOUTHERN FINLAND
  • NEURAL-NETWORKS
  • CLOUD CLIMATE
  • SECONDARY
  • 1172 Environmental sciences
  • 116 Chemical sciences
  • 114 Physical sciences

Cite this

@article{9e7290dd03424815a54617a3e0303f1c,
title = "Constructing a data-driven receptor model for organic and inorganic aerosol: a synthesis analysis of eight mass spectrometric data sets from a boreal forest site",
abstract = "The interactions between organic and inorganic aerosol chemical components are integral to understanding and modelling climate and health-relevant aerosol physicochemical properties, such as volatility, hygroscopicity, light scattering and toxicity. This study presents a synthesis analysis for eight data sets, of non-refractory aerosol composition, measured at a boreal forest site. The measurements, performed with an aerosol mass spectrometer, cover in total around 9 months over the course of 3 years. In our statistical analysis, we use the complete organic and inorganic unit-resolution mass spectra, as opposed to the more common approach of only including the organic fraction. The analysis is based on iterative, combined use of (1) data reduction, (2) classification and (3) scaling tools, producing a data-driven chemical mass balance type of model capable of describing site-specific aerosol composition. The receptor model we constructed was able to explain 83 +/- 8{\%} of variation in data, which increased to 96 +/- 3{\%} when signals from low signal-to-noise variables were not considered. The resulting interpretation of an extensive set of aerosol mass spectrometric data infers seven distinct aerosol chemical components for a rural boreal forest site: ammonium sulfate (35 +/- 7{\%} of mass), low and semi-volatile oxidised organic aerosols (27 +/- 8{\%} and 12 +/- 7 {\%}), biomass burning organic aerosol (11 +/- 7 {\%}), a nitrate-containing organic aerosol type (7 +/- 2 {\%}), ammonium nitrate (5 +/- 2 {\%}), and hydrocarbon-like organic aerosol (3 +/- 1 {\%}). Some of the additionally observed, rare outlier aerosol types likely emerge due to surface ionisation effects and likely represent amine compounds from an unknown source and alkaline metals from emissions of a nearby district heating plant. Compared to traditional, ionbalance-based inorganics apportionment schemes for aerosol mass spectrometer data, our statistics-based method provides an improved, more robust approach, yielding readily useful information for the modelling of submicron atmospheric aerosols physical and chemical properties. The results also shed light on the division between organic and inorganic aerosol types and dynamics of salt formation in aerosol. Equally importantly, the combined methodology exemplifies an iterative analysis, using consequent analysis steps by a combination of statistical methods. Such an approach offers new ways to home in on physicochemically sensible solutions with minimal need for a priori information or analyst interference. We therefore suggest that similar statisticsbased approaches offer significant potential for un- or semi-supervised machine-learning applications in future analyses of aerosol mass spectrometric data.",
keywords = "POSITIVE MATRIX FACTORIZATION, QUALITY INTERACTIONS EUCAARI, EUROPEAN INTEGRATED PROJECT, SOURCE APPORTIONMENT, CHEMICAL-COMPOSITION, MULTILINEAR ENGINE, SOUTHERN FINLAND, NEURAL-NETWORKS, CLOUD CLIMATE, SECONDARY, 1172 Environmental sciences, 116 Chemical sciences, 114 Physical sciences",
author = "Mikko {\"A}ij{\"a}l{\"a} and Daellenbach, {Kaspar R.} and Francesco Canonaco and Liine Heikkinen and Heikki Junninen and Tuukka Pet{\"a}j{\"a} and Markku Kulmala and Prevot, {Andre S. H.} and Mikael Ehn",
year = "2019",
month = "3",
day = "21",
doi = "10.5194/acp-19-3645-2019",
language = "English",
volume = "19",
pages = "3645--3672",
journal = "Atmospheric Chemistry and Physics",
issn = "1680-7316",
publisher = "COPERNICUS GESELLSCHAFT MBH",
number = "6",

}

TY - JOUR

T1 - Constructing a data-driven receptor model for organic and inorganic aerosol

T2 - a synthesis analysis of eight mass spectrometric data sets from a boreal forest site

AU - Äijälä, Mikko

AU - Daellenbach, Kaspar R.

AU - Canonaco, Francesco

AU - Heikkinen, Liine

AU - Junninen, Heikki

AU - Petäjä, Tuukka

AU - Kulmala, Markku

AU - Prevot, Andre S. H.

AU - Ehn, Mikael

PY - 2019/3/21

Y1 - 2019/3/21

N2 - The interactions between organic and inorganic aerosol chemical components are integral to understanding and modelling climate and health-relevant aerosol physicochemical properties, such as volatility, hygroscopicity, light scattering and toxicity. This study presents a synthesis analysis for eight data sets, of non-refractory aerosol composition, measured at a boreal forest site. The measurements, performed with an aerosol mass spectrometer, cover in total around 9 months over the course of 3 years. In our statistical analysis, we use the complete organic and inorganic unit-resolution mass spectra, as opposed to the more common approach of only including the organic fraction. The analysis is based on iterative, combined use of (1) data reduction, (2) classification and (3) scaling tools, producing a data-driven chemical mass balance type of model capable of describing site-specific aerosol composition. The receptor model we constructed was able to explain 83 +/- 8% of variation in data, which increased to 96 +/- 3% when signals from low signal-to-noise variables were not considered. The resulting interpretation of an extensive set of aerosol mass spectrometric data infers seven distinct aerosol chemical components for a rural boreal forest site: ammonium sulfate (35 +/- 7% of mass), low and semi-volatile oxidised organic aerosols (27 +/- 8% and 12 +/- 7 %), biomass burning organic aerosol (11 +/- 7 %), a nitrate-containing organic aerosol type (7 +/- 2 %), ammonium nitrate (5 +/- 2 %), and hydrocarbon-like organic aerosol (3 +/- 1 %). Some of the additionally observed, rare outlier aerosol types likely emerge due to surface ionisation effects and likely represent amine compounds from an unknown source and alkaline metals from emissions of a nearby district heating plant. Compared to traditional, ionbalance-based inorganics apportionment schemes for aerosol mass spectrometer data, our statistics-based method provides an improved, more robust approach, yielding readily useful information for the modelling of submicron atmospheric aerosols physical and chemical properties. The results also shed light on the division between organic and inorganic aerosol types and dynamics of salt formation in aerosol. Equally importantly, the combined methodology exemplifies an iterative analysis, using consequent analysis steps by a combination of statistical methods. Such an approach offers new ways to home in on physicochemically sensible solutions with minimal need for a priori information or analyst interference. We therefore suggest that similar statisticsbased approaches offer significant potential for un- or semi-supervised machine-learning applications in future analyses of aerosol mass spectrometric data.

AB - The interactions between organic and inorganic aerosol chemical components are integral to understanding and modelling climate and health-relevant aerosol physicochemical properties, such as volatility, hygroscopicity, light scattering and toxicity. This study presents a synthesis analysis for eight data sets, of non-refractory aerosol composition, measured at a boreal forest site. The measurements, performed with an aerosol mass spectrometer, cover in total around 9 months over the course of 3 years. In our statistical analysis, we use the complete organic and inorganic unit-resolution mass spectra, as opposed to the more common approach of only including the organic fraction. The analysis is based on iterative, combined use of (1) data reduction, (2) classification and (3) scaling tools, producing a data-driven chemical mass balance type of model capable of describing site-specific aerosol composition. The receptor model we constructed was able to explain 83 +/- 8% of variation in data, which increased to 96 +/- 3% when signals from low signal-to-noise variables were not considered. The resulting interpretation of an extensive set of aerosol mass spectrometric data infers seven distinct aerosol chemical components for a rural boreal forest site: ammonium sulfate (35 +/- 7% of mass), low and semi-volatile oxidised organic aerosols (27 +/- 8% and 12 +/- 7 %), biomass burning organic aerosol (11 +/- 7 %), a nitrate-containing organic aerosol type (7 +/- 2 %), ammonium nitrate (5 +/- 2 %), and hydrocarbon-like organic aerosol (3 +/- 1 %). Some of the additionally observed, rare outlier aerosol types likely emerge due to surface ionisation effects and likely represent amine compounds from an unknown source and alkaline metals from emissions of a nearby district heating plant. Compared to traditional, ionbalance-based inorganics apportionment schemes for aerosol mass spectrometer data, our statistics-based method provides an improved, more robust approach, yielding readily useful information for the modelling of submicron atmospheric aerosols physical and chemical properties. The results also shed light on the division between organic and inorganic aerosol types and dynamics of salt formation in aerosol. Equally importantly, the combined methodology exemplifies an iterative analysis, using consequent analysis steps by a combination of statistical methods. Such an approach offers new ways to home in on physicochemically sensible solutions with minimal need for a priori information or analyst interference. We therefore suggest that similar statisticsbased approaches offer significant potential for un- or semi-supervised machine-learning applications in future analyses of aerosol mass spectrometric data.

KW - POSITIVE MATRIX FACTORIZATION

KW - QUALITY INTERACTIONS EUCAARI

KW - EUROPEAN INTEGRATED PROJECT

KW - SOURCE APPORTIONMENT

KW - CHEMICAL-COMPOSITION

KW - MULTILINEAR ENGINE

KW - SOUTHERN FINLAND

KW - NEURAL-NETWORKS

KW - CLOUD CLIMATE

KW - SECONDARY

KW - 1172 Environmental sciences

KW - 116 Chemical sciences

KW - 114 Physical sciences

U2 - 10.5194/acp-19-3645-2019

DO - 10.5194/acp-19-3645-2019

M3 - Article

VL - 19

SP - 3645

EP - 3672

JO - Atmospheric Chemistry and Physics

JF - Atmospheric Chemistry and Physics

SN - 1680-7316

IS - 6

ER -