Gaussian Clusters and Noise: An Approach Based on the Minimum Description Length Principle

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

Abstract

We introduce a well-grounded minimum description length (MDL) based quality measure for a clustering consisting of either spherical or axis-aligned normally distributed clusters and a cluster with a uniform distribution in an axis-aligned rectangular box. The uniform component extends the practical usability of the model e.g. in the presence of noise, and using the MDL principle for the model selection makes comparing the quality of clusterings with a different number of clusters possible. We also introduce a novel search heuristic for finding the best clustering with an unknown number of clusters. The heuristic is based on the idea of moving points from the Gaussian clusters to the uniform one and using MDL for determining the optimal amount of noise. Tests with synthetic data having a clear cluster structure imply that the search method is effective in finding the intuitively correct clustering.
Original languageEnglish
Title of host publicationDiscovery Science : 13th International Conference, DS 2010, Canberra, Australia, October 6-8, 2010. Proceedings
EditorsBernhard Pfahringer, Geoff Holmes, Achim Hoffmann
Number of pages15
Place of PublicationBerlin Heidelberg
PublisherSpringer
Publication date2010
Pages251-265
ISBN (Print)978-3-642-16183-4
ISBN (Electronic)3-642-16183-9
DOIs
Publication statusPublished - 2010
MoE publication typeA4 Article in conference proceedings
EventDS 2010 - Canberra, Australia
Duration: 6 Oct 20108 Oct 2010
Conference number: 13

Publication series

NameLecture Notes in Computer Science
PublisherSpringer
Volume6332

Fields of Science

  • 113 Computer and information sciences

Cite this

Luosto, P., Kivinen, J., & Mannila, H. (2010). Gaussian Clusters and Noise: An Approach Based on the Minimum Description Length Principle. In B. Pfahringer, G. Holmes, & A. Hoffmann (Eds.), Discovery Science: 13th International Conference, DS 2010, Canberra, Australia, October 6-8, 2010. Proceedings (pp. 251-265). (Lecture Notes in Computer Science; Vol. 6332). Berlin Heidelberg: Springer. https://doi.org/10.1007/978-3-642-16184-1_18
Luosto, Panu ; Kivinen, Jyrki ; Mannila, Heikki. / Gaussian Clusters and Noise: An Approach Based on the Minimum Description Length Principle. Discovery Science: 13th International Conference, DS 2010, Canberra, Australia, October 6-8, 2010. Proceedings. editor / Bernhard Pfahringer ; Geoff Holmes ; Achim Hoffmann. Berlin Heidelberg : Springer, 2010. pp. 251-265 (Lecture Notes in Computer Science).
@inproceedings{12c22c4e6fc84853ba2a2f2804a048f1,
title = "Gaussian Clusters and Noise: An Approach Based on the Minimum Description Length Principle",
abstract = "We introduce a well-grounded minimum description length (MDL) based quality measure for a clustering consisting of either spherical or axis-aligned normally distributed clusters and a cluster with a uniform distribution in an axis-aligned rectangular box. The uniform component extends the practical usability of the model e.g. in the presence of noise, and using the MDL principle for the model selection makes comparing the quality of clusterings with a different number of clusters possible. We also introduce a novel search heuristic for finding the best clustering with an unknown number of clusters. The heuristic is based on the idea of moving points from the Gaussian clusters to the uniform one and using MDL for determining the optimal amount of noise. Tests with synthetic data having a clear cluster structure imply that the search method is effective in finding the intuitively correct clustering.",
keywords = "113 Computer and information sciences",
author = "Panu Luosto and Jyrki Kivinen and Heikki Mannila",
note = "Volume: Proceeding volume:",
year = "2010",
doi = "10.1007/978-3-642-16184-1_18",
language = "English",
isbn = "978-3-642-16183-4",
series = "Lecture Notes in Computer Science",
publisher = "Springer",
pages = "251--265",
editor = "Bernhard Pfahringer and Geoff Holmes and Achim Hoffmann",
booktitle = "Discovery Science",
address = "United States",

}

Luosto, P, Kivinen, J & Mannila, H 2010, Gaussian Clusters and Noise: An Approach Based on the Minimum Description Length Principle. in B Pfahringer, G Holmes & A Hoffmann (eds), Discovery Science: 13th International Conference, DS 2010, Canberra, Australia, October 6-8, 2010. Proceedings. Lecture Notes in Computer Science, vol. 6332, Springer, Berlin Heidelberg, pp. 251-265, DS 2010, Canberra, Australia, 06/10/2010. https://doi.org/10.1007/978-3-642-16184-1_18

Gaussian Clusters and Noise: An Approach Based on the Minimum Description Length Principle. / Luosto, Panu; Kivinen, Jyrki; Mannila, Heikki.

Discovery Science: 13th International Conference, DS 2010, Canberra, Australia, October 6-8, 2010. Proceedings. ed. / Bernhard Pfahringer; Geoff Holmes; Achim Hoffmann. Berlin Heidelberg : Springer, 2010. p. 251-265 (Lecture Notes in Computer Science; Vol. 6332).

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

TY - GEN

T1 - Gaussian Clusters and Noise: An Approach Based on the Minimum Description Length Principle

AU - Luosto, Panu

AU - Kivinen, Jyrki

AU - Mannila, Heikki

N1 - Volume: Proceeding volume:

PY - 2010

Y1 - 2010

N2 - We introduce a well-grounded minimum description length (MDL) based quality measure for a clustering consisting of either spherical or axis-aligned normally distributed clusters and a cluster with a uniform distribution in an axis-aligned rectangular box. The uniform component extends the practical usability of the model e.g. in the presence of noise, and using the MDL principle for the model selection makes comparing the quality of clusterings with a different number of clusters possible. We also introduce a novel search heuristic for finding the best clustering with an unknown number of clusters. The heuristic is based on the idea of moving points from the Gaussian clusters to the uniform one and using MDL for determining the optimal amount of noise. Tests with synthetic data having a clear cluster structure imply that the search method is effective in finding the intuitively correct clustering.

AB - We introduce a well-grounded minimum description length (MDL) based quality measure for a clustering consisting of either spherical or axis-aligned normally distributed clusters and a cluster with a uniform distribution in an axis-aligned rectangular box. The uniform component extends the practical usability of the model e.g. in the presence of noise, and using the MDL principle for the model selection makes comparing the quality of clusterings with a different number of clusters possible. We also introduce a novel search heuristic for finding the best clustering with an unknown number of clusters. The heuristic is based on the idea of moving points from the Gaussian clusters to the uniform one and using MDL for determining the optimal amount of noise. Tests with synthetic data having a clear cluster structure imply that the search method is effective in finding the intuitively correct clustering.

KW - 113 Computer and information sciences

U2 - 10.1007/978-3-642-16184-1_18

DO - 10.1007/978-3-642-16184-1_18

M3 - Conference contribution

SN - 978-3-642-16183-4

T3 - Lecture Notes in Computer Science

SP - 251

EP - 265

BT - Discovery Science

A2 - Pfahringer, Bernhard

A2 - Holmes, Geoff

A2 - Hoffmann, Achim

PB - Springer

CY - Berlin Heidelberg

ER -

Luosto P, Kivinen J, Mannila H. Gaussian Clusters and Noise: An Approach Based on the Minimum Description Length Principle. In Pfahringer B, Holmes G, Hoffmann A, editors, Discovery Science: 13th International Conference, DS 2010, Canberra, Australia, October 6-8, 2010. Proceedings. Berlin Heidelberg: Springer. 2010. p. 251-265. (Lecture Notes in Computer Science). https://doi.org/10.1007/978-3-642-16184-1_18