Significance of Patterns in Data Visualisations

Tutkimustuotos: Artikkeli kirjassa/raportissa/konferenssijulkaisussaKonferenssiartikkeliTieteellinenvertaisarvioitu

Kuvaus

In this paper we consider the following important problem: when we explore data visually and observe patterns, how can we determine their statistical significance? Patterns observed in exploratory analysis are traditionally met with scepticism, since the hypotheses are formulated while viewing the data, rather than before doing so. In contrast to this belief, we show that it is, in fact, possible to evaluate the significance of patterns also during exploratory analysis, and that the knowledge of the analyst can be leveraged to improve statistical power by reducing the amount of simultaneous comparisons. We develop a principled framework for determining the statistical significance of visually observed patterns. Furthermore, we show how the significance of visual patterns observed during iterative data exploration can be determined. We perform an empirical investigation on real and synthetic tabular data and time series, using different test statistics and methods for generating surrogate data. We conclude that the proposed framework allows determining the significance of visual patterns during exploratory analysis.

Alkuperäiskielienglanti
OtsikkoKDD'19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
Sivumäärä9
JulkaisupaikkaNew York, NY
KustantajaACM
Julkaisupäivä2019
Sivut1509-1517
ISBN (painettu)978-1-4503-6201-6
DOI - pysyväislinkit
TilaJulkaistu - 2019
OKM-julkaisutyyppiA4 Artikkeli konferenssijulkaisuussa
TapahtumaACM SIGKDD International Conference on Knowledge Discovery and Data Mining - Anchorage, Yhdysvallat (USA)
Kesto: 4 elokuuta 20198 elokuuta 2019
Konferenssinumero: 25

Tieteenalat

  • 113 Tietojenkäsittely- ja informaatiotieteet

Lainaa tätä

Savvides, R., Henelius, A., Oikarinen, E., & Puolamäki, K. (2019). Significance of Patterns in Data Visualisations. teoksessa KDD'19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (Sivut 1509-1517). New York, NY: ACM. https://doi.org/10.1145/3292500.3330994
Savvides, Rafael ; Henelius, Andreas ; Oikarinen, Emilia ; Puolamäki, Kai. / Significance of Patterns in Data Visualisations. KDD'19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. New York, NY : ACM, 2019. Sivut 1509-1517
@inproceedings{a91a3e5d00e04571829603dbcd2dceaf,
title = "Significance of Patterns in Data Visualisations",
abstract = "In this paper we consider the following important problem: when we explore data visually and observe patterns, how can we determine their statistical significance? Patterns observed in exploratory analysis are traditionally met with scepticism, since the hypotheses are formulated while viewing the data, rather than before doing so. In contrast to this belief, we show that it is, in fact, possible to evaluate the significance of patterns also during exploratory analysis, and that the knowledge of the analyst can be leveraged to improve statistical power by reducing the amount of simultaneous comparisons. We develop a principled framework for determining the statistical significance of visually observed patterns. Furthermore, we show how the significance of visual patterns observed during iterative data exploration can be determined. We perform an empirical investigation on real and synthetic tabular data and time series, using different test statistics and methods for generating surrogate data. We conclude that the proposed framework allows determining the significance of visual patterns during exploratory analysis.",
keywords = "exploratory data analysis, significance testing, visual analytics, 113 Computer and information sciences",
author = "Rafael Savvides and Andreas Henelius and Emilia Oikarinen and Kai Puolam{\"a}ki",
year = "2019",
doi = "10.1145/3292500.3330994",
language = "English",
isbn = "978-1-4503-6201-6",
pages = "1509--1517",
booktitle = "KDD'19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining",
publisher = "ACM",
address = "United States",

}

Savvides, R, Henelius, A, Oikarinen, E & Puolamäki, K 2019, Significance of Patterns in Data Visualisations. julkaisussa KDD'19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, New York, NY, Sivut 1509-1517, ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Anchorage, Yhdysvallat (USA), 04/08/2019. https://doi.org/10.1145/3292500.3330994

Significance of Patterns in Data Visualisations. / Savvides, Rafael; Henelius, Andreas; Oikarinen, Emilia; Puolamäki, Kai.

KDD'19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. New York, NY : ACM, 2019. s. 1509-1517.

Tutkimustuotos: Artikkeli kirjassa/raportissa/konferenssijulkaisussaKonferenssiartikkeliTieteellinenvertaisarvioitu

TY - GEN

T1 - Significance of Patterns in Data Visualisations

AU - Savvides, Rafael

AU - Henelius, Andreas

AU - Oikarinen, Emilia

AU - Puolamäki, Kai

PY - 2019

Y1 - 2019

N2 - In this paper we consider the following important problem: when we explore data visually and observe patterns, how can we determine their statistical significance? Patterns observed in exploratory analysis are traditionally met with scepticism, since the hypotheses are formulated while viewing the data, rather than before doing so. In contrast to this belief, we show that it is, in fact, possible to evaluate the significance of patterns also during exploratory analysis, and that the knowledge of the analyst can be leveraged to improve statistical power by reducing the amount of simultaneous comparisons. We develop a principled framework for determining the statistical significance of visually observed patterns. Furthermore, we show how the significance of visual patterns observed during iterative data exploration can be determined. We perform an empirical investigation on real and synthetic tabular data and time series, using different test statistics and methods for generating surrogate data. We conclude that the proposed framework allows determining the significance of visual patterns during exploratory analysis.

AB - In this paper we consider the following important problem: when we explore data visually and observe patterns, how can we determine their statistical significance? Patterns observed in exploratory analysis are traditionally met with scepticism, since the hypotheses are formulated while viewing the data, rather than before doing so. In contrast to this belief, we show that it is, in fact, possible to evaluate the significance of patterns also during exploratory analysis, and that the knowledge of the analyst can be leveraged to improve statistical power by reducing the amount of simultaneous comparisons. We develop a principled framework for determining the statistical significance of visually observed patterns. Furthermore, we show how the significance of visual patterns observed during iterative data exploration can be determined. We perform an empirical investigation on real and synthetic tabular data and time series, using different test statistics and methods for generating surrogate data. We conclude that the proposed framework allows determining the significance of visual patterns during exploratory analysis.

KW - exploratory data analysis

KW - significance testing

KW - visual analytics

KW - 113 Computer and information sciences

U2 - 10.1145/3292500.3330994

DO - 10.1145/3292500.3330994

M3 - Conference contribution

SN - 978-1-4503-6201-6

SP - 1509

EP - 1517

BT - KDD'19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

PB - ACM

CY - New York, NY

ER -

Savvides R, Henelius A, Oikarinen E, Puolamäki K. Significance of Patterns in Data Visualisations. julkaisussa KDD'19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. New York, NY: ACM. 2019. s. 1509-1517 https://doi.org/10.1145/3292500.3330994