Speedup Your Analytics: Automatic Parameter Tuning for Databases and Big Data Systems

Jiaheng Lu, Yuxing Chen, Herodotos Herodotou, Shivnath Babu

Research output: Contribution to journalArticleScientificpeer-review

Abstract

Database and big data analytics systems such as Hadoop and Spark have a large number of configuration parameters that control memory distribution, I/O optimization, parallelism, and compression. Improper parameter settings can cause significant performance degradation and stability issues. However, regular users and even expert administrators struggle to understand and tune them to achieve good performance.
In this tutorial, we review existing approaches on automatic parameter tuning for databases, Hadoop, and Spark, which we classify into six categories:
rule-based, cost modeling, simulation-based, experiment-driven, machine learning, and adaptive tuning. We describe the foundations of different automatic parameter tuning algorithms and present pros and cons of each approach. We also highlight real-world applications and systems, and identify research challenges for handling cloud services, resource heterogeneity, and real-time analytics.
Original languageEnglish
JournalProceedings of the VLDB Endowment
Volume12
Issue number12
Pages (from-to)1970-1973
Number of pages4
ISSN2150-8097
DOIs
Publication statusPublished - 26 Aug 2019
MoE publication typeA1 Journal article-refereed
EventInternational Conference on Very Large Data Bases - Los Angeles, United States
Duration: 26 Aug 201930 Aug 2019
Conference number: 45
https://vldb.org/2019/

Fields of Science

  • 113 Computer and information sciences

Cite this

@article{75886ff1a3694e079f8a320fdc6e79fe,
title = "Speedup Your Analytics: Automatic Parameter Tuning for Databases and Big Data Systems",
abstract = "Database and big data analytics systems such as Hadoop and Spark have a large number of configuration parameters that control memory distribution, I/O optimization, parallelism, and compression. Improper parameter settings can cause significant performance degradation and stability issues. However, regular users and even expert administrators struggle to understand and tune them to achieve good performance. In this tutorial, we review existing approaches on automatic parameter tuning for databases, Hadoop, and Spark, which we classify into six categories: rule-based, cost modeling, simulation-based, experiment-driven, machine learning, and adaptive tuning. We describe the foundations of different automatic parameter tuning algorithms and present pros and cons of each approach. We also highlight real-world applications and systems, and identify research challenges for handling cloud services, resource heterogeneity, and real-time analytics.",
keywords = "113 Computer and information sciences",
author = "Jiaheng Lu and Yuxing Chen and Herodotos Herodotou and Shivnath Babu",
year = "2019",
month = "8",
day = "26",
doi = "10.14778/3352063.3352112",
language = "English",
volume = "12",
pages = "1970--1973",
journal = "Proceedings of the VLDB Endowment",
issn = "2150-8097",
publisher = "Association for Computing Machinery",
number = "12",

}

Speedup Your Analytics : Automatic Parameter Tuning for Databases and Big Data Systems. / Lu, Jiaheng; Chen, Yuxing; Herodotou, Herodotos; Babu, Shivnath.

In: Proceedings of the VLDB Endowment, Vol. 12, No. 12, 26.08.2019, p. 1970-1973.

Research output: Contribution to journalArticleScientificpeer-review

TY - JOUR

T1 - Speedup Your Analytics

T2 - Automatic Parameter Tuning for Databases and Big Data Systems

AU - Lu, Jiaheng

AU - Chen, Yuxing

AU - Herodotou, Herodotos

AU - Babu, Shivnath

PY - 2019/8/26

Y1 - 2019/8/26

N2 - Database and big data analytics systems such as Hadoop and Spark have a large number of configuration parameters that control memory distribution, I/O optimization, parallelism, and compression. Improper parameter settings can cause significant performance degradation and stability issues. However, regular users and even expert administrators struggle to understand and tune them to achieve good performance. In this tutorial, we review existing approaches on automatic parameter tuning for databases, Hadoop, and Spark, which we classify into six categories: rule-based, cost modeling, simulation-based, experiment-driven, machine learning, and adaptive tuning. We describe the foundations of different automatic parameter tuning algorithms and present pros and cons of each approach. We also highlight real-world applications and systems, and identify research challenges for handling cloud services, resource heterogeneity, and real-time analytics.

AB - Database and big data analytics systems such as Hadoop and Spark have a large number of configuration parameters that control memory distribution, I/O optimization, parallelism, and compression. Improper parameter settings can cause significant performance degradation and stability issues. However, regular users and even expert administrators struggle to understand and tune them to achieve good performance. In this tutorial, we review existing approaches on automatic parameter tuning for databases, Hadoop, and Spark, which we classify into six categories: rule-based, cost modeling, simulation-based, experiment-driven, machine learning, and adaptive tuning. We describe the foundations of different automatic parameter tuning algorithms and present pros and cons of each approach. We also highlight real-world applications and systems, and identify research challenges for handling cloud services, resource heterogeneity, and real-time analytics.

KW - 113 Computer and information sciences

U2 - 10.14778/3352063.3352112

DO - 10.14778/3352063.3352112

M3 - Article

VL - 12

SP - 1970

EP - 1973

JO - Proceedings of the VLDB Endowment

JF - Proceedings of the VLDB Endowment

SN - 2150-8097

IS - 12

ER -