e-ISSN 2231-8526
ISSN 0128-7680
Avita Katal, Susheela Dahiya and Tanupriya Choudhury
Pertanika Journal of Science & Technology, Volume 31, Issue 5, August 2023
DOI: https://doi.org/10.47836/pjst.31.5.27
Keywords: Classification, cloud data center, clustering, Gaussian mixture model, K Means, workload
Published on: 31 July 2023
Advancements in virtualization technology have led to better utilization of existing infrastructure. It allows numerous virtual machines with different workloads to coexist on the same physical server, resulting in a pool of server resources. It is critical to understand enterprise workloads to correctly create and configure existing and future support in such pools. Managing resources in a cloud data center is one of the most difficult tasks. The dynamic nature of the cloud environment, as well as the high level of uncertainty, has created these challenges. These applications’ diverse Quality of Service (QoS) requirements make data center management difficult. Accurate forecasting of future resource demand is required to meet QoS needs and ensure better resource utilization. Consequently, data center workload modeling and categorization are needed to meet software quality solutions cost-effectively. This paper uses traces of Bitbrain’s data to characterize and categorize workload. Clustering (K Means and Gaussian mixture model) and Classification strategies (K Nearest Neighbors, Logistic Regression, Decision Trees, Random Forest, and Support Vector Machine) characterize and model the workload traces. K Means shows better results as compared to GMM when compared to the Calinski Harabasz index and Davies-Bouldin score. The results showed that the Decision Tree achieves the maximum accuracy of 99.18%, followed by K Nearest Neighbor (KNN), Random Forest (RF), Support Vector Machine (SVM) Logistic Regression (LR), Multi-Layer Perceptron (MLP), and Back Propagation Neural Networks.
Abrahao, B., & Zhang, A. (2004) Characterizing application workloads on CPU utilization for utility computing (HPL-2004-157). Hewlett-Packard Company. https://www.hpl.hp.com/techreports/2004/HPL-2004-157.html
Ali-Eldin, A., Rezaie, A., Mehta, A., Razroev, S., Luna, S. S. de, Seleznjev, O., Tordsson, J., & Elmroth, E. (2014, March 11-14). How will your workload look like in 6 years? Analyzing Wikimedia’s workload. [Paper presentation]. 2014 IEEE International Conference on Cloud Engineering, Boston, USA. https://doi.org/10.1109/IC2E.2014.50
Bennani, M. N., & Menascé, D. A. (2005, June 13-16). Resource allocation for autonomic data centers using analytic performance models. [Paper presentation]. Second International Conference on Autonomic Computing, ICAC’05. Seattle, USA. https://doi.org/10.1109/ICAC.2005.50
Bienia, C., Kumar, S., Singh, J. P., & Li, K. (2008, October 25-29). The PARSEC benchmark suite: Characterization and architectural implications. [Paper presentation]. Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques. Toronto, Canada. https://doi.org/10.1145/1454115.1454128
Birke, R., Chen, L. Y., & Smirni, E. (2014, May 5-9). Multi-resource characterization and their (in) dependencies in production datacenters. [Paper presentation]. IEEE/IFIP Network Operations and Management Symposium (NOMS), Krakow, Poland. https://doi.org/10.1109/NOMS.2014.6838300
Bodnarchuk, R., & Bunt, R. (1991, May 21-24). A synthetic workload model for a distributed system file server. [Paper presentation]. Proceedings of the 1991 ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, California, USA. https://doi.org/10.1145/107971.107978
Calzarossa, M. C., Massari, L., & Tessera, D. (2016). Workload characterization. ACM Computing Surveys (CSUR), 48(3), 1-43. https://doi.org/10.1145/2856127
Cheng, Y., Chai, Z., & Anwar, A. (2018, August 27-28). Characterizing co-located datacenter workloads: An Alibaba case study. [Paper presentation]. Proceedings of the 9th Asia-Pacific Workshop on Systems, Jeju, Korea. https://doi.org/10.1145/3265723.3265742
Delimitrou, C., & Kozyrakis, C. (2011, June 20-24). Cross-examination of datacenter workload modeling techniques. [Paper presentation]. International Conference on Distributed Computing Systems Workshops, Minneapolis, USA. https://doi.org/10.1109/ICDCSW.2011.45
Huang, S., & Feng, W. (2009, May 18-21). Energy-efficient cluster computing via accurate workload characterization. [Paper presentation]. 9th IEEE/ACM International Symposium on Cluster Computing and the Grid, Shanghai, China. https://doi.org/10.1109/CCGRID.2009.88
Ismaeel, S., Al-Khazraji, A., & Miri, A. (2019, April 15-17). An efficient workload clustering framework for large-scale data centers. [Paper presentation]. 8th International Conference on Modeling Simulation and Applied Optimization, Manama, Bahrain. https://doi.org/10.1109/ICMSAO.2019.8880305
Ismaeel, S., & Miri, A. (2019, January 7-9). Real-time energy-conserving VM-provisioning framework for cloud-data centers. [Paper presentation]. IEEE 9th Annual Computing and Communication Workshop and Conference, Las Vegas, USA. https://doi.org/10.1109/CCWC.2019.8666614
Jackson, K. R., Ramakrishnan, L., Muriki, K., Canon, S., Cholia, S., Shalf, J., Wasserman, H. J., & Wright, N. J. (2010, November 30 – December 3). Performance analysis of high performance computing applications on the Amazon Web Services cloud. [Paper presentation]. IEEE Second International Conference on Cloud Computing Technology and Science, Indianapolis, USA. https://doi.org/10.1109/CLOUDCOM.2010.69
Mishra, A. K., Hellerstein, J. L., Cirne, W., & Das, C. R. (2010). Towards characterizing cloud backend workloads. ACM SIGMETRICS Performance Evaluation Review, 37(4), 34-41. https://doi.org/10.1145/1773394.1773400
Moro, A., Mumolo, E., & Nolich, M. (2009, September 16-18). Ergodic continuous hidden markov models for workload characterization. [Paper presentation]. Proceedings of the 6th International Symposium on Image and Signal Processing and Analysis, Salzburg, Austria. https://doi.org/10.1109/ISPA.2009.5297771
Onan, A. (2019). Consensus Clustering-based undersampling approach to imbalanced learning. Scientific Programming, 2019, 1-14. https://doi.org/10.1155/2019/5901087
Onan, A., & KorukoGlu, S. (2017). A feature selection model based on genetic rank aggregation for text sentiment classification. Journal of Information Science, 43(1), 25-38. https://doi.org/10.1177/0165551515613226
Panneerselvam, J., Liu, L., Antonopoulos, N., & Bo, Y. (2014, December 8-11). Workload analysis for the scope of user demand prediction model evaluations in cloud environments. [Paper presentation]. IEEE/ACM 7th International Conference on Utility and Cloud Computing, London, United Kingdom. https://doi.org/10.1109/UCC.2014.144
Patel, J., Jindal, V., Yen, I. L., Bastani, F., Xu, J., & Garraghan, P. (2015, March 25-27). Workload estimation for improving resource management decisions in the cloud. [Paper presentation]. IEEE 12th International Symposium on Autonomous Decentralized Systems, Taichung, Taiwan. https://doi.org/10.1109/ISADS.2015.17
Rasheduzzaman, M., Islam, M. A., Islam, T., Hossain, T., & Rahman, R. M. (2014, February 21-22). Task shape classification and workload characterization of google cluster trace. [Paper presentation]. IEEE International Advance Computing Conference, Gurgaon, India. https://doi.org/10.1109/IADCC.2014.6779441
Reiss, C., Tumanov, A., Tumanov, A., Ganger G. R., & Katz, R. (2012). Towards understanding heterogeneous clouds at scale: Google trace analysis. ResearchGate. https://www.researchgate.net/publication/265531801_Towards_Understanding_Heterogeneous_Clouds_at_Scale_Google_Trace_Analysis
Shekhawat, V. S., Gautam, A., & Thakrar, A. (2018, December 1-2). Datacenter workload classification and characterization: An empirical approach. [Paper presentation]. IEEE 13th International Conference on Industrial and Information Systems, Rupnagar, India. https://doi.org/10.1109/ICIINFS.2018.8721402
Shen, S., van Beek, V., & Iosup, A. (2015, May 4-7). Statistical characterization of business-critical workloads hosted in cloud datacenters. [Paper presentation]. IEEE/ACM 15th International Symposium on Cluster, Cloud, and Grid Computing, Shenzhen, China. https://doi.org/10.1109/CCGRID.2015.60
Wang, K., Lin, M., Ciucu, F., Wierman, A., & Lin, C. (2015). Characterizing the impact of the workload on the value of dynamic resizing in data centers. Performance Evaluation, 85-86, 1-18. https://doi.org/10.1016/J.PEVA.2014.12.001
Yin, J., Lu, X., Zhao, X., Chen, H., & Liu, X. (2015). BURSE: A bursty and self-similar workload generator for cloud computing. IEEE Transactions on Parallel and Distributed Systems, 26(3), 668-680. https://doi.org/10.1109/TPDS.2014.2315204
Zhang, H., Jiang, G., Yoshihira, K., & Chen, H. (2014). Proactive workload management in hybrid cloud computing. IEEE Transactions on Network and Service Management, 11(1), 90-100. https://doi.org/10.1109/TNSM.2013.122313.130448
Zhang, Q., Hellerstein, J., & Boutaba, R. (2011) Characterizing task usage shapes in Google compute clusters. Google Research. https://research.google/pubs/pub37201/
ISSN 0128-7680
e-ISSN 2231-8526