Speedup Your Analytics: Automatic Parameter Tuning for Databases and Big Data Systems

Jiaheng Lu, Yuxing Chen, Herodotos Herodotou, Shivnath Babu

Research output: Contribution to journalArticleScientificpeer-review


Database and big data analytics systems such as Hadoop and Spark have a large number of configuration parameters that control memory distribution, I/O optimization, parallelism, and compression. Improper parameter settings can cause significant performance degradation and stability issues. However, regular users and even expert administrators struggle to understand and tune them to achieve good performance.
In this tutorial, we review existing approaches on automatic parameter tuning for databases, Hadoop, and Spark, which we classify into six categories:
rule-based, cost modeling, simulation-based, experiment-driven, machine learning, and adaptive tuning. We describe the foundations of different automatic parameter tuning algorithms and present pros and cons of each approach. We also highlight real-world applications and systems, and identify research challenges for handling cloud services, resource heterogeneity, and real-time analytics.
Original languageEnglish
JournalProceedings of the VLDB Endowment
Issue number12
Pages (from-to)1970-1973
Number of pages4
Publication statusPublished - 26 Aug 2019
MoE publication typeA1 Journal article-refereed
EventInternational Conference on Very Large Data Bases - Los Angeles, United States
Duration: 26 Aug 201930 Aug 2019
Conference number: 45

Fields of Science

  • 113 Computer and information sciences

Cite this