e-ISSN 2231-8526
ISSN 0128-7680
Sharifah Nurulhikmah Syed Yasin, Aiman Azhar and Rajeswari Raju
Pertanika Journal of Science & Technology, Volume 33, Issue S3, December 2025
DOI: https://doi.org/10.47836/pjst.33.S3.10
Keywords: Algorithm, breast cancer, machine learning, prediction, random forest
Published on: 2025-04-24
Breast cancer (BC) is a fatal invasive disease among women that impacts women globally. It is listed as a significant disease among Malaysian women. Early detection and accurate diagnosis are important to improve the treatment outcome of a patient, as advanced stages of BC can increase fatality rates. The conventional methods of diagnosis are effective, but they face challenges such as high cost, radiation exposure, and the need for specialized operators. Therefore, this study focuses on developing a BC prediction system using a Random Forest (RF) algorithm. It is trained using the "BC Wisconsin (Diagnostic) Data Set" from Kaggle, consisting of 570 records with eight critical attributes selected for prediction. The algorithm and system are developed using Python and evaluated on accuracy, precision, recall, and F1-score, achieving 91.23%, 90.70%, 86.67%, and 88.89%, respectively. The algorithm was integrated with AdaBoost and XGBoost to add the experimental value, resulting in a better result than a single RF. Expert validation by a specialist confirmed the reliability of the dataset and accuracy of the prediction system, highlighting its potential to be a valuable tool for early BC detection. The study concludes that the RF-based system provides robust predictions, making it a promising approach for enhancing BC diagnostic processes.
Breast cancer. (2024). World Health Organization. https://www.who.int/news-room/fact-sheets/detail/breast-cancer
Dai, B., Chen, R.-C., Zhu, S.-Z., & Zhang, W.-W. (2018). Using Random Forest Algorithm for breast cancer diagnosis. In International Symposium on Computer, Consumer and Control (IS3C) (pp. 449-452). IEEE. https://doi.org/10.1109/IS3C.2018.00119
Duan, H., Zhang, Y., Qiu, H., Fu, X., Liu, C., Zang, X., Xu, A., Wu, Z., Li, X., Zhang, Q., Zhang, Z., & Cui, F. (2024). Machine learning-based prediction model for distant metastasis of breast cancer. Computers in Biology and Medicine, 169(January), 107943. https://doi.org/10.1016/j.compbiomed.2024.107943
He, Z., Chen, Z., Tan, M., Elingarami, S., Liu, Y., Li, T., Deng, Y., He, N., Li, S., Fu, J., & Li, W. (2020). A review on methods for diagnosis of breast cancer cells and tissues. Cell Proliferation, 53(7), 1-16. https://doi.org/10.1111/cpr.12822
Ishwaran, H., & Lu, M. (2019). Standard errors and confidence intervals for variable importance in random forest regression, classification, and survival. Statistics in Medicine, 38(4), 558-582. https://doi.org/10.1002/sim.7803
Kinra, P. (2019). Market analysis of breast cancer. Oncology & Cancer Case Reports, 6(1), 1-5. https://www.iomcworld.org/open-access/market-analysis-of-breast-cancer.pdf
Li, X., Li, X., Zhang, K., Guan, Y., Fan, M., Wu, Q., Li, Y., Holmdahl, R., Lu, S., Zhu, W., Wang, X., & Meng, L. (2024). Autoantibodies against Endophilin A2 as a novel biomarker are beneficial to early diagnosis of breast cancer. Clinica Chimica Acta, 560(March), 119748. https://doi.org/10.1016/j.cca.2024.119748
Macaulay, B. O., Aribisala, B. S., Akande, S. A., Akinnuwesi, B. A., & Olabanjo, O. A. (2021). Breast cancer risk prediction in African women using Random Forest Classifier. Cancer Treatment and Research Communications, 28, 100396. https://doi.org/10.1016/j.ctarc.2021.100396
Mentch, L., & Zhou, S. (2020). Randomization as regularization: A degrees of freedom explanation for random forest success. Journal of Machine Learning Research, 21, 1-36.
Minnoor, M., & Baths, V. (2022). Diagnosis of Breast cancer using random forests. Procedia Computer Science, 218(2022), 429-437. https://doi.org/10.1016/j.procs.2023.01.025
Mohamed, E. S., Naqishbandi, T. A., Bukhari, S. A. C., Rauf, I., Sawrikar, V., & Hussain, A. (2023). A hybrid mental health prediction model using Support Vector Machine, Multilayer Perceptron, and Random Forest algorithms. Healthcare Analytics, 3(March), 100185. https://doi.org/10.1016/j.health.2023.100185
Moosavi, A., Huang, S., Vahabi, M., Motamedivafa, B., Tian, N., Mahmood, R., Liu, P., & Sun, C. L. F. (2024). Prospective human validation of artificial intelligence interventions in cardiology: A scoping review. JACC: Advances, 3(9), 101202. https://doi.org/10.1016/j.jacadv.2024.101202
Natras, R., Soja, B., & Schmidt, M. (2022). Ensemble machine learning of Random Forest, AdaBoost and XGBoost for vertical total electron content forecasting. Remote Sensing, 14(15), 1-34. https://doi.org/10.3390/rs14153547
National Cancel Registry. (2019). National Cancer Registry Report 2012-2016. https://www.moh.gov.my/moh/resources/Penerbitan/Laporan/Umum/2012-2016 (MNCRR)/Summary_MNCR_2012-2016_-_06112020.pdf
Park, K. H., Loibl, S., Sohn, J., Park, Y. H., Jiang, Z., Tadjoedin, H., Nag, S., Saji, S., Md. Yusof, M., Villegas, E. M. B., Lim, E. H., Lu, Y. S., Ithimakin, S., Tseng, L. M., Dejthevaporn, T., Chen, T. W. W., Lee, S. C., Galvez, C., Malwinder, S., … Harbeck, N. (2024). Pan-Asian adapted ESMO clinical practice guidelines for the diagnosis, treatment and follow-up of patients with early breast cancer. ESMO Open, 9(5), 102974. https://doi.org/10.1016/j.esmoop.2024.102974
Rashid, M. M., Yaseen, O. M., Saeed, R. R., & Alasaady, M. T. (2024). An improved ensemble machine learning approach for diabetes diagnosis. Pertanika Journal of Science and Technology, 32(3), 1335-1350. https://doi.org/10.47836/pjst.32.3.19
Rashidi, H. H., Tran, N. K., Betts, E. V., Howell, L. P., & Green, R. (2019). Artificial intelligence and machine learning in pathology: The present landscape of supervised methods. Academic Pathology, 6, 2374289519873088. https://doi.org/10.1177/2374289519873088
Shehab, M., Abualigah, L., Shambour, Q., Abu-Hashem, M. A., Shambour, M. K. Y., Alsalibi, A. I., & Gandomi, A. H. (2022). Machine learning in medical applications: A review of state-of-the-art methods. Computers in Biology and Medicine, 145(November 2021), 105458. https://doi.org/10.1016/j.compbiomed.2022.105458
Sumwiza, K., Twizere, C., Rushingabigwi, G., Bakunzibake, P., & Bamurigire, P. (2023). Enhanced cardiovascular disease prediction model using random forest algorithm. Informatics in Medicine Unlocked, 41(March), 101316. https://doi.org/10.1016/j.imu.2023.101316
Vazquez-Zapien, G. J., Mata-Miranda, M. M., Garibay-Gonzalez, F., & Sanchez-Brito, M. (2022). Artificial intelligence model validation before its application in clinical diagnosis assistance. World Journal of Gastroenterology, 28(5), 602-604. https://doi.org/10.3748/wjg.v28.i5.602
Yifan, D., Jialin, L., & Boxi, F. (2021). Forecast model of breast cancer diagnosis based on RF-AdaBoost. In 2021 IEEE 3rd International Conference on Communications, Information System and Computer Engineering (CISCE) (pp. 716-719). IEEE. https://doi.org/10.1109/CISCE52179.2021.9445847
Zuo, T., Zeng, H., Li, H., Liu, S., Yang, L., Xia, C., Zheng, R., Ma, F., Liu, L., Wang, N., Xuan, L., & Chen, W. (2017). The influence of stage at diagnosis and molecular subtype on breast cancer patient survival: A hospital-based multi-center study. Chinese Journal of Cancer, 36(1), 1-10. https://doi.org/10.1186/s40880-017-0250-3
ISSN 0128-7680
e-ISSN 2231-8526