International Journal of Innovative Research in Computer and Communication Engineering

ISSN Approved Journal | Impact factor: 8.771 | ESTD: 2013 | Follows UGC CARE Journal Norms and Guidelines

| Monthly, Peer-Reviewed, Refereed, Scholarly, Multidisciplinary and Open Access Journal | High Impact Factor 8.771 (Calculated by Google Scholar and Semantic Scholar | AI-Powered Research Tool | Indexing in all Major Database & Metadata, Citation Generator | Digital Object Identifier (DOI) |


TITLE AI-Driven CI/CD Platform Reliability: A Framework for Intelligent Fault Detection, Predictive Maintenance, and Automated Recovery
ABSTRACT Continuous Integration and Continuous Delivery (CI/CD) pipelines are the operational backbone of modern software delivery. As organizations scale these pipelines to support hundreds of microservices and thousands of daily deployments, ensuring platform reliability becomes a critical engineering challenge. This paper presents a framework for AI-Driven CI/CD Platform Reliability that integrates intelligent fault detection, predictive maintenance, and automated recovery mechanisms. The proposed system employs machine learning models trained on historical pipeline telemetry, build logs, and infrastructure metrics to identify anomalies before they cause service disruption. A predictive maintenance module anticipates component degradation and schedules preventive actions, while an automated recovery engine applies validated remediation playbooks in response to detected faults. Experimental evaluation on simulated and production-trace datasets demonstrates significant reductions in mean time to recovery (MTTR), pipeline failure rates, and manual operator interventions. The framework advances the state of intelligent DevOps by combining anomaly detection, time-series forecasting, and reinforcement-learning-based self-healing into a unified, extensible platform.
AUTHOR SWAPNEEL G, SUJAN M A, SIDDARAJU H S, VISHWANATH B S, TALAKAL KRUSHNA, VINOD D, PROF. MANJULA P UG Students, Dept. of CSE, Jain Institute of Technology, Davangere, Karnataka, India Assistant Professor, Dept. of CSE, Jain Institute of Technology, Davangere, Karnataka, India
VOLUME 184
DOI DOI: 10.15680/IJIRCCE.2026.1405063
PDF pdf/63_AI-Driven CI CD Platform Reliability A Framework for Intelligent Fault Detection, Predictive Maintenance, and Automated Recovery.pdf
KEYWORDS
References 1. D. Sculley et al., "Hidden technical debt in machine learning systems," Advances in Neural Information Processing Systems (NeurIPS), vol. 28, pp. 2503-2511, 2015.
2. C. Nita-Rotaru and D. Raiciu, "Fault detection and diagnosis in distributed systems: A survey," ACM Computing Surveys, vol. 55, no. 3, pp. 1-38, 2022. https://doi.org/10.1145/3491212
3. Q. Luo et al., "An empirical analysis of flaky tests," in Proc. ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE), pp. 643-653, 2014. https://doi.org/10.1145/2635868.2635920
4. B. Lim et al., "Temporal fusion transformers for interpretable multi-horizon time series forecasting," International Journal of Forecasting, vol. 37, no. 4, pp. 1748-1764, 2021. https://doi.org/10.1016/j.ijforecast.2021.03.012
5. J. Schulman et al., "Proximal policy optimization algorithms," arXiv preprint arXiv:1707.06347, 2017. https://arxiv.org/abs/1707.06347
6. T. Chen and C. Guestrin, "XGBoost: A scalable tree boosting system," in Proc. ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 785-794, 2016. https://doi.org/10.1145/2939672.2939785
7. N. Pitarello et al., "AIOps: Real-world challenges and research innovations," IEEE Software, vol. 39, no. 3, pp. 26-33, 2022. https://doi.org/10.1109/MS.2021.3134783
8. A. Gulenko et al., "Detecting anomalous behavior of black-box services modeled with distance-based online clustering," in Proc. IEEE International Conference on Cloud Computing (CLOUD), pp. 912-915, 2018. https://doi.org/10.1109/CLOUD.2018.00138
9. H. Mi et al., "Toward fine-grained, unsupervised, scalable performance diagnosis for production cloud computing systems," IEEE Transactions on Parallel and Distributed Systems, vol. 24, no. 6, pp. 1245-1255, 2013. https://doi.org/10.1109/TPDS.2013.68
10. M. Zhao et al., "Log-based anomaly detection with robust feature extraction and online learning," IEEE Transactions on Information Forensics and Security, vol. 16, pp. 2300-2311, 2021. https://doi.org/10.1109/TIFS.2021.3053371
11. E. Bezemer et al., "An empirical study of unspecified dependencies in make-based build systems," Empirical Software Engineering, vol. 22, no. 6, pp. 3117-3150, 2017. https://doi.org/10.1007/s10664-017-9510-8
12. B. Beyer et al., "Site Reliability Engineering: How Google Runs Production Systems," O'Reilly Media, 2016. ISBN: 978-1491929124
13. V. Chandola, A. Banerjee, and V. Kumar, "Anomaly detection: A survey," ACM Computing Surveys, vol. 41, no. 3, pp. 1-58, 2009. https://doi.org/10.1145/1541880.1541882
14. K. Zhang et al., "Robust log-based anomaly detection on unstable log data," in Proc. ESEC/FSE, pp. 807-817, 2019. https://doi.org/10.1145/3338906.3338931
15. F. Dang et al., "Cloudseer: Workflow monitoring of cloud infrastructures via interleaved logs," ACM SIGARCH Computer Architecture News, vol. 44, no. 2, pp. 489-502, 2016. https://doi.org/10.1145/2980024.2872407
image
Copyright © IJIRCCE 2020.All right reserved