Abstract
Database and big data analytics systems such as Hadoop and Spark have a large number of configuration parameters that control memory distribution, I/O optimization, parallelism, and compression. Improper parameter settings can cause significant performance degradation and stability issues. However, regular users and even expert administrators struggle to understand and tune them to achieve good performance.
In this tutorial, we review existing approaches on automatic parameter tuning for databases, Hadoop, and Spark, which we classify into six categories:
rule-based, cost modeling, simulation-based, experiment-driven, machine learning, and adaptive tuning. We describe the foundations of different automatic parameter tuning algorithms and present pros and cons of each approach. We also highlight real-world applications and systems, and identify research challenges for handling cloud services, resource heterogeneity, and real-time analytics.
In this tutorial, we review existing approaches on automatic parameter tuning for databases, Hadoop, and Spark, which we classify into six categories:
rule-based, cost modeling, simulation-based, experiment-driven, machine learning, and adaptive tuning. We describe the foundations of different automatic parameter tuning algorithms and present pros and cons of each approach. We also highlight real-world applications and systems, and identify research challenges for handling cloud services, resource heterogeneity, and real-time analytics.
Original language | English |
---|---|
Journal | Proceedings of the VLDB Endowment |
Volume | 12 |
Issue number | 12 |
Pages (from-to) | 1970-1973 |
Number of pages | 4 |
ISSN | 2150-8097 |
DOIs | |
Publication status | Published - 26 Aug 2019 |
MoE publication type | A1 Journal article-refereed |
Event | International Conference on Very Large Data Bases - Los Angeles, United States Duration: 26 Aug 2019 → 30 Aug 2019 Conference number: 45 https://vldb.org/2019/ |
Fields of Science
- 113 Computer and information sciences