Machine Learning techniques to Model Data Intensive Application Performance

Nowadays, Big Data are becoming more and more important. Many sectors of our economy are now guided by data-driven decision processes. Spark is becoming the reference framewrok while at the infrastructural layer, cloud computing provides flexible and cost-effective solutions for allocating on-demand large clusters, often based on GPGPUs. In order to obtain an efficient use of such resources, it is required a model of such systems being at the same time precise and efficient to use.

One common way to model multi-class systems makes use analytical models like queueing networks or Petri nets. However, despite having a great accuracy in performance prediction, their significant computational complexity limits their usage. Machine learning techniques can solve this problem and develop models being accurate and scalable at the same time.

This project involves the development and validation of models for Big Data clusters based on Spark or based on GPGPUs to support deep learning applications training. The project will develop benchmarking scripts to gather operational data and will compare multiple machine learning algorithms like Support Vector Regression, Linear regression and random forests.

Progetto di Ingegneria Informatica

GeoInformatic Project

Machine Learning techniques to Model Data Intensive Application Performance