Cost-effective Resource Provisioning for Spark Workloads

Yuxing Chen, Jiaheng Lu, Chen Chen, Mohammad Ashraful Hoque, Sasu Tarkoma

Tutkimustuotos: Artikkeli kirjassa/raportissa/konferenssijulkaisussaKonferenssiartikkeliTieteellinenvertaisarvioitu

Abstrakti

Spark is one of the prevalent big data analytical platforms. Configuring proper resource provision for Spark jobs is challenging but essential for organizations to save time, achieve high resource utilization, and remain cost-effective. In this paper, we study the challenge of determining the proper parameter values that meet the performance requirements of workloads while minimizing both resource cost and resource utilization time. We propose a simulation-based cost model to predict the performance of jobs accurately. We achieve low-cost training by taking advantage of simulation framework, i.e., Monte Carlo (MC) simulation, which uses a small amount of data and resources to make a reliable prediction for larger datasets and clusters. The salient feature of our method is that it allows us to invest low training cost while obtaining an accurate prediction. Through experiments with six benchmark workloads, we demonstrate that the cost model yields less than 7% error on average prediction accuracy and the recommendation achieves up to 5x resource cost saving.
Alkuperäiskielienglanti
OtsikkoCIKM '19 : Proceedings of the 28th ACM International Conference on Information and Knowledge Management
Sivumäärä4
JulkaisupaikkaNew York, NY
KustantajaACM
Julkaisupäivä3 marrask. 2019
Sivut2477-2480
ISBN (painettu)978-1-4503-6976-3
DOI - pysyväislinkit
TilaJulkaistu - 3 marrask. 2019
OKM-julkaisutyyppiA4 Artikkeli konferenssijulkaisuussa
TapahtumaACM International Conference on Information and Knowledge Management - Beijing, Kiina
Kesto: 3 marrask. 20197 marrask. 2019
Konferenssinumero: 28
http://www.cikm2019.net/

Tieteenalat

  • 113 Tietojenkäsittely- ja informaatiotieteet

Siteeraa tätä