PERTANIKA JOURNAL OF SCIENCE AND TECHNOLOGY

 

e-ISSN 2231-8526
ISSN 0128-7680

Home / Regular Issue / JST Vol. 33 (S3) 2025 / JST(S)-0655-2024

 

Breast Cancer Prediction: A Random Forest-based System with Expert Validation

Sharifah Nurulhikmah Syed Yasin, Aiman Azhar and Rajeswari Raju

Pertanika Journal of Science & Technology, Volume 33, Issue S3, December 2025

DOI: https://doi.org/10.47836/pjst.33.S3.10

Keywords: Algorithm, breast cancer, machine learning, prediction, random forest

Published on: 2025-04-24

Breast cancer (BC) is a fatal invasive disease among women that impacts women globally. It is listed as a significant disease among Malaysian women. Early detection and accurate diagnosis are important to improve the treatment outcome of a patient, as advanced stages of BC can increase fatality rates. The conventional methods of diagnosis are effective, but they face challenges such as high cost, radiation exposure, and the need for specialized operators. Therefore, this study focuses on developing a BC prediction system using a Random Forest (RF) algorithm. It is trained using the "BC Wisconsin (Diagnostic) Data Set" from Kaggle, consisting of 570 records with eight critical attributes selected for prediction. The algorithm and system are developed using Python and evaluated on accuracy, precision, recall, and F1-score, achieving 91.23%, 90.70%, 86.67%, and 88.89%, respectively. The algorithm was integrated with AdaBoost and XGBoost to add the experimental value, resulting in a better result than a single RF. Expert validation by a specialist confirmed the reliability of the dataset and accuracy of the prediction system, highlighting its potential to be a valuable tool for early BC detection. The study concludes that the RF-based system provides robust predictions, making it a promising approach for enhancing BC diagnostic processes.

  • Breast cancer. (2024). World Health Organization. https://www.who.int/news-room/fact-sheets/detail/breast-cancer

    Dai, B., Chen, R.-C., Zhu, S.-Z., & Zhang, W.-W. (2018). Using Random Forest Algorithm for breast cancer diagnosis. In International Symposium on Computer, Consumer and Control (IS3C) (pp. 449-452). IEEE. https://doi.org/10.1109/IS3C.2018.00119

    Duan, H., Zhang, Y., Qiu, H., Fu, X., Liu, C., Zang, X., Xu, A., Wu, Z., Li, X., Zhang, Q., Zhang, Z., & Cui, F. (2024). Machine learning-based prediction model for distant metastasis of breast cancer. Computers in Biology and Medicine, 169(January), 107943. https://doi.org/10.1016/j.compbiomed.2024.107943

    He, Z., Chen, Z., Tan, M., Elingarami, S., Liu, Y., Li, T., Deng, Y., He, N., Li, S., Fu, J., & Li, W. (2020). A review on methods for diagnosis of breast cancer cells and tissues. Cell Proliferation, 53(7), 1-16. https://doi.org/10.1111/cpr.12822

    Ishwaran, H., & Lu, M. (2019). Standard errors and confidence intervals for variable importance in random forest regression, classification, and survival. Statistics in Medicine, 38(4), 558-582. https://doi.org/10.1002/sim.7803

    Kinra, P. (2019). Market analysis of breast cancer. Oncology & Cancer Case Reports, 6(1), 1-5. https://www.iomcworld.org/open-access/market-analysis-of-breast-cancer.pdf

    Li, X., Li, X., Zhang, K., Guan, Y., Fan, M., Wu, Q., Li, Y., Holmdahl, R., Lu, S., Zhu, W., Wang, X., & Meng, L. (2024). Autoantibodies against Endophilin A2 as a novel biomarker are beneficial to early diagnosis of breast cancer. Clinica Chimica Acta, 560(March), 119748. https://doi.org/10.1016/j.cca.2024.119748

    Macaulay, B. O., Aribisala, B. S., Akande, S. A., Akinnuwesi, B. A., & Olabanjo, O. A. (2021). Breast cancer risk prediction in African women using Random Forest Classifier. Cancer Treatment and Research Communications, 28, 100396. https://doi.org/10.1016/j.ctarc.2021.100396

    Mentch, L., & Zhou, S. (2020). Randomization as regularization: A degrees of freedom explanation for random forest success. Journal of Machine Learning Research, 21, 1-36.

    Minnoor, M., & Baths, V. (2022). Diagnosis of Breast cancer using random forests. Procedia Computer Science, 218(2022), 429-437. https://doi.org/10.1016/j.procs.2023.01.025

    Mohamed, E. S., Naqishbandi, T. A., Bukhari, S. A. C., Rauf, I., Sawrikar, V., & Hussain, A. (2023). A hybrid mental health prediction model using Support Vector Machine, Multilayer Perceptron, and Random Forest algorithms. Healthcare Analytics, 3(March), 100185. https://doi.org/10.1016/j.health.2023.100185

    Moosavi, A., Huang, S., Vahabi, M., Motamedivafa, B., Tian, N., Mahmood, R., Liu, P., & Sun, C. L. F. (2024). Prospective human validation of artificial intelligence interventions in cardiology: A scoping review. JACC: Advances, 3(9), 101202. https://doi.org/10.1016/j.jacadv.2024.101202

    Natras, R., Soja, B., & Schmidt, M. (2022). Ensemble machine learning of Random Forest, AdaBoost and XGBoost for vertical total electron content forecasting. Remote Sensing, 14(15), 1-34. https://doi.org/10.3390/rs14153547

    National Cancel Registry. (2019). National Cancer Registry Report 2012-2016. https://www.moh.gov.my/moh/resources/Penerbitan/Laporan/Umum/2012-2016 (MNCRR)/Summary_MNCR_2012-2016_-_06112020.pdf

    Park, K. H., Loibl, S., Sohn, J., Park, Y. H., Jiang, Z., Tadjoedin, H., Nag, S., Saji, S., Md. Yusof, M., Villegas, E. M. B., Lim, E. H., Lu, Y. S., Ithimakin, S., Tseng, L. M., Dejthevaporn, T., Chen, T. W. W., Lee, S. C., Galvez, C., Malwinder, S., … Harbeck, N. (2024). Pan-Asian adapted ESMO clinical practice guidelines for the diagnosis, treatment and follow-up of patients with early breast cancer. ESMO Open, 9(5), 102974. https://doi.org/10.1016/j.esmoop.2024.102974

    Rashid, M. M., Yaseen, O. M., Saeed, R. R., & Alasaady, M. T. (2024). An improved ensemble machine learning approach for diabetes diagnosis. Pertanika Journal of Science and Technology, 32(3), 1335-1350. https://doi.org/10.47836/pjst.32.3.19

    Rashidi, H. H., Tran, N. K., Betts, E. V., Howell, L. P., & Green, R. (2019). Artificial intelligence and machine learning in pathology: The present landscape of supervised methods. Academic Pathology, 6, 2374289519873088. https://doi.org/10.1177/2374289519873088

    Shehab, M., Abualigah, L., Shambour, Q., Abu-Hashem, M. A., Shambour, M. K. Y., Alsalibi, A. I., & Gandomi, A. H. (2022). Machine learning in medical applications: A review of state-of-the-art methods. Computers in Biology and Medicine, 145(November 2021), 105458. https://doi.org/10.1016/j.compbiomed.2022.105458

    Sumwiza, K., Twizere, C., Rushingabigwi, G., Bakunzibake, P., & Bamurigire, P. (2023). Enhanced cardiovascular disease prediction model using random forest algorithm. Informatics in Medicine Unlocked, 41(March), 101316. https://doi.org/10.1016/j.imu.2023.101316

    Vazquez-Zapien, G. J., Mata-Miranda, M. M., Garibay-Gonzalez, F., & Sanchez-Brito, M. (2022). Artificial intelligence model validation before its application in clinical diagnosis assistance. World Journal of Gastroenterology, 28(5), 602-604. https://doi.org/10.3748/wjg.v28.i5.602

    Yifan, D., Jialin, L., & Boxi, F. (2021). Forecast model of breast cancer diagnosis based on RF-AdaBoost. In 2021 IEEE 3rd International Conference on Communications, Information System and Computer Engineering (CISCE) (pp. 716-719). IEEE. https://doi.org/10.1109/CISCE52179.2021.9445847

    Zuo, T., Zeng, H., Li, H., Liu, S., Yang, L., Xia, C., Zheng, R., Ma, F., Liu, L., Wang, N., Xuan, L., & Chen, W. (2017). The influence of stage at diagnosis and molecular subtype on breast cancer patient survival: A hospital-based multi-center study. Chinese Journal of Cancer, 36(1), 1-10. https://doi.org/10.1186/s40880-017-0250-3

ISSN 0128-7680

e-ISSN 2231-8526

Article ID

JST(S)-0655-2024

Download Full Article PDF

Share this article

Related Articles