A Survey on Automatic Parameter Tuning for Big Data Processing Systems

Herodotos Herodotou, Yuxing Chen, Jiaheng Lu

Research output: Contribution to journalArticleScientificpeer-review

Abstract

Big data processing systems (e.g., Hadoop, Spark, Storm) contain a vast number of configuration parameters controlling parallelism, I/O behavior, memory settings, and compression. Improper parameter settings can cause significant performance degradation and stability issues. However, regular users and even expert administrators grapple with understanding and tuning them to achieve good performance. We investigate existing approaches on parameter tuning for both batch and stream data processing systems and classify them into six categories: rule-based, cost modeling, simulation-based, experiment-driven, machine learning, and adaptive tuning. We summarize the pros and cons of each approach and raise some open research problems for automatic parameter tuning.
Original languageEnglish
Article number43
JournalACM Computing Surveys
Volume53
Issue number2
Pages (from-to)1-37
Number of pages37
ISSN0360-0300
DOIs
Publication statusPublished - Apr 2020
MoE publication typeA1 Journal article-refereed

Fields of Science

  • 113 Computer and information sciences
  • Parameter tuning
  • self-tuning
  • MapReduce
  • Spark
  • Storm
  • stream
  • MAPREDUCE
  • PERFORMANCE
  • OPTIMIZATION
  • MANAGEMENT
  • SIMULATION
  • TOOLKIT

Cite this