An approach to Machine Learning with Big Data

Ella Emilia Peltonen

Research output: ThesisMaster's thesisTheses

Abstract

Cloud computing offers important resources, performance, and services nowadays when it has became popular to collect, store and analyze large data sets. This thesis builds on Berkeley Data Analysis Stack (BDAS) as a cloud computing environment designed for Big Data handling and analysis. Especially two parts of the BDAS, the cluster resource manager Mesos and the distribution manager Spark will be introduced. They offer important features, such as efficiency, multi-tenancy, and fault tolerance, for cloud computing. The Spark system expands MapReduce, the well-known cloud computing paradigm.

Machine learning algorithms can predict trends and anomalies of large data sets. This thesis will present one of them, a distributed decision tree algorithm, implemented on the Spark system. As an example case, the decision tree will be used on the versatile energy consumption data from mobile devices, such as smart phones and tablets, of the Carat project. The data consists of information about the usage of the device, such as which applications have been running, network connections, battery temperatures, and screen brightness, for example.

The decision tree aims to find chains of data features that might lead to energy consumption anomalies. Results of the analysis can be used to advise users on how to improve their battery life. This thesis will present selected analysis results together with advantages and disadvantages of the decision tree analysis.
Original languageEnglish
Publication statusPublished - 2 Oct 2013
MoE publication typeG2 Master's thesis, polytechnic Master's thesis

Fields of Science

  • 113 Computer and information sciences

Cite this

Peltonen, Ella Emilia. / An approach to Machine Learning with Big Data. 2013. 65 p.
@phdthesis{a61d650605eb430dbc944e00a58ae560,
title = "An approach to Machine Learning with Big Data",
abstract = "Cloud computing offers important resources, performance, and services nowadays when it has became popular to collect, store and analyze large data sets. This thesis builds on Berkeley Data Analysis Stack (BDAS) as a cloud computing environment designed for Big Data handling and analysis. Especially two parts of the BDAS, the cluster resource manager Mesos and the distribution manager Spark will be introduced. They offer important features, such as efficiency, multi-tenancy, and fault tolerance, for cloud computing. The Spark system expands MapReduce, the well-known cloud computing paradigm.Machine learning algorithms can predict trends and anomalies of large data sets. This thesis will present one of them, a distributed decision tree algorithm, implemented on the Spark system. As an example case, the decision tree will be used on the versatile energy consumption data from mobile devices, such as smart phones and tablets, of the Carat project. The data consists of information about the usage of the device, such as which applications have been running, network connections, battery temperatures, and screen brightness, for example.The decision tree aims to find chains of data features that might lead to energy consumption anomalies. Results of the analysis can be used to advise users on how to improve their battery life. This thesis will present selected analysis results together with advantages and disadvantages of the decision tree analysis.",
keywords = "113 Computer and information sciences, Data analysis, Big Data, Cloud Computing, Machine learning",
author = "Peltonen, {Ella Emilia}",
year = "2013",
month = "10",
day = "2",
language = "English",

}

An approach to Machine Learning with Big Data. / Peltonen, Ella Emilia.

2013. 65 p.

Research output: ThesisMaster's thesisTheses

TY - THES

T1 - An approach to Machine Learning with Big Data

AU - Peltonen, Ella Emilia

PY - 2013/10/2

Y1 - 2013/10/2

N2 - Cloud computing offers important resources, performance, and services nowadays when it has became popular to collect, store and analyze large data sets. This thesis builds on Berkeley Data Analysis Stack (BDAS) as a cloud computing environment designed for Big Data handling and analysis. Especially two parts of the BDAS, the cluster resource manager Mesos and the distribution manager Spark will be introduced. They offer important features, such as efficiency, multi-tenancy, and fault tolerance, for cloud computing. The Spark system expands MapReduce, the well-known cloud computing paradigm.Machine learning algorithms can predict trends and anomalies of large data sets. This thesis will present one of them, a distributed decision tree algorithm, implemented on the Spark system. As an example case, the decision tree will be used on the versatile energy consumption data from mobile devices, such as smart phones and tablets, of the Carat project. The data consists of information about the usage of the device, such as which applications have been running, network connections, battery temperatures, and screen brightness, for example.The decision tree aims to find chains of data features that might lead to energy consumption anomalies. Results of the analysis can be used to advise users on how to improve their battery life. This thesis will present selected analysis results together with advantages and disadvantages of the decision tree analysis.

AB - Cloud computing offers important resources, performance, and services nowadays when it has became popular to collect, store and analyze large data sets. This thesis builds on Berkeley Data Analysis Stack (BDAS) as a cloud computing environment designed for Big Data handling and analysis. Especially two parts of the BDAS, the cluster resource manager Mesos and the distribution manager Spark will be introduced. They offer important features, such as efficiency, multi-tenancy, and fault tolerance, for cloud computing. The Spark system expands MapReduce, the well-known cloud computing paradigm.Machine learning algorithms can predict trends and anomalies of large data sets. This thesis will present one of them, a distributed decision tree algorithm, implemented on the Spark system. As an example case, the decision tree will be used on the versatile energy consumption data from mobile devices, such as smart phones and tablets, of the Carat project. The data consists of information about the usage of the device, such as which applications have been running, network connections, battery temperatures, and screen brightness, for example.The decision tree aims to find chains of data features that might lead to energy consumption anomalies. Results of the analysis can be used to advise users on how to improve their battery life. This thesis will present selected analysis results together with advantages and disadvantages of the decision tree analysis.

KW - 113 Computer and information sciences

KW - Data analysis

KW - Big Data

KW - Cloud Computing

KW - Machine learning

M3 - Master's thesis

ER -