Early Task Failure Prediction in Cloud Computing Using Machine Learning and Feature Analytics

Authors

DOI:

https://doi.org/10.63512/sustjst.2024.2003

Keywords:

Task Failure Prediction, Machine Learning, Cloud Computing, Resource Allocation, Google Borg Dataset, Ensemble Methods

Abstract

In cloud computing on big data, failure of tasks after resource allocation results in wastage of computation and performance degradation. We present a machine learning approach in predicting the probability of task failure after requesting resources using the Google Borg Cluster Trace 2019 dataset. We trained four supervised models- Logistic Regression, Random Forest, XGBoost, and a Feedforward Neural Network on scheduling and resource request metadata. The Random Forest classifier performed optimally with an accuracy of 85.06% and a recall of failure at 86%, which accurately detected high-risk tasks. Apart from prediction, the study investigates internal structure and pattern of the dataset using feature analysis and clustering techniques. These can be used for model improvement and informing scheduling strategy improvement. The paper shows that task failure prediction at an early stage is possible and useful in minimizing wastage of resources and improving reliability. Future work will look into smarter ways to recognize patterns in the data and extract meaningful features. Bringing in domain knowledge and better handling of class imbalance could make predictions more accurate and reliable.

Downloads

Published

2026-05-16

How to Cite

Das, P., Emon, M. R., & Chowdhury, A. R. (2026). Early Task Failure Prediction in Cloud Computing Using Machine Learning and Feature Analytics. SUST Journal of Science and Technology (SUST JST), 34(2). https://doi.org/10.63512/sustjst.2024.2003

Issue

Section

Articles