International Journal of Innovative Research in Computer and Communication Engineering

ISSN Approved Journal | Impact factor: 8.771 | ESTD: 2013 | Follows UGC CARE Journal Norms and Guidelines

| Monthly, Peer-Reviewed, Refereed, Scholarly, Multidisciplinary and Open Access Journal | High Impact Factor 8.771 (Calculated by Google Scholar and Semantic Scholar | AI-Powered Research Tool | Indexing in all Major Database & Metadata, Citation Generator | Digital Object Identifier (DOI) |


TITLE An Automated and Scalable Server Health Monitoring System
ABSTRACT Automation is critical for keeping system stability and performance across vast networks and server infrastructures in today's IT environments. Manual monitoring of server resources (CPU, memory, disk usage etc.) is time consuming and prone to human errors. In order to relief these issues, this project aims to implement an automated System Health Monitoring framework with Python, Prometheus and Grafana. An automation script implemented in Python connects via SSH to multiple servers, gathers live system metrics, parses them, and exposes them to Prometheus for continuous monitoring. These metrics are visualized and explored via Grafana, which gives the systems team immediate insight into how well the system is performing as well as what resources are being consumed. It helps in early failure detection, helps with operational efficiency & reduces manual monitoring tasks. Such an automation approach incorporates high reliability, scalability and improved visibility across multi-vendor environments (Linux, AIX and Solaris servers).
AUTHOR DR. THILAGAVATHY A, KOYI VISHNU VARDHAN REDDY, MANNEPALLI SASI KUMAR, MOSES PRANEETH RAJ Associate Professor, Department of Science and Engineering, R.M.K. Engineering College, Chennai, Tamil Nadu, India UG Students, Department of Science and Engineering, R.M.K. Engineering College, Chennai, Tamil Nadu, India
VOLUME 182
DOI DOI: 10.15680/IJIRCCE. 2026.1403072
PDF pdf/72_An Automated and Scalable Server Health Monitoring System.pdf
KEYWORDS
References 1. Mohammed Daffalla Elradi, Prometheus and Grafana: A Metrics-Focused Monitoring Stack, Journal of Computer Allied Intelligence, Vol. 3, No. 3, 2025.
2. Pragathi B. C., Hrithik Maddirala, Sneha M, Server Performance Monitoring Using Prometheus and Grafana, International Journal of Computer Applications (IJCA), 2024.
3. Heli Barrett, Jere Matthews, Aino Ford, Observability and Metrics-Based Monitoring in Cloud Systems, Springer Journal of Cloud Computing, 2022.
4. Harold Castro, Design and Implementation of Scalable Monitoring Systems Using Prometheus, IEEE Access, 2021.
5. EjiElo Ogbuefi, Oyejide Timothy Odofin, Secure and Efficient Remote System Monitoring in Cloud Environments, Elsevier Future Generation Computer Systems, 2022.
6. Abraham Ayodeji Abayomi, Agentless Infrastructure Monitoring Using Open-Source Tools, International Journal of Engineering Research and Technology (IJERT), 2023.
7. James Turnbull, Monitoring with Prometheus, O’Reilly Media, 2018.Tiia Leppänen, Monitoring and Visualization Using Prometheus and Grafana, Bachelor’s Thesis, 2021.
8. Burns, B., Grant, B., Oppenheimer, D., Brewer, E., and Wilkes, J., “Borg, Omega, and Kubernetes,” Communications of the ACM, Vol. 59, No. 5, pp. 50–57, 2016.
9. Turnbull, J., The Art of Monitoring, Turnbull Press, 2016.
10. Sigelman, B. H., et al., “Dapper, a Large-Scale Distributed Systems Tracing Infrastructure,” Google Research, 2010.
11. Pahl, C., “Containerization and the PaaS Cloud,” IEEE Cloud Computing, Vol. 2, No. 3, pp. 24–31, 2015.
12. Zabbix LLC, “Zabbix Monitoring Solution Architecture and Performance Analysis,” White Paper, 2021.
13. Nguyen, T. T., Kim, S., and Park, J., “Performance Monitoring and Alerting in Distributed Systems Using Prometheus,” International Journal of Distributed Sensor Networks, 2020.
14. IBM Corporation, “AIX Performance Monitoring and Tuning Guide,” IBM Documentation, 2022.
15. Oracle Corporation, “Solaris System Performance Monitoring and Analysis,” Oracle Technical White Paper, 2021.
16. Behl, A., and Behl, K., Cybersecurity and Cyberwar: What Everyone Needs to Know, Oxford University Press, 2017.
17. Paramiko Developers, “Paramiko: Python SSHv2 Protocol Library,” Official Documentation, 2024.
18. Prometheus Authors, “Prometheus Monitoring System Documentation,” The Linux Foundation, 2024.
19. Grafana Labs, “Grafana Observability Platform Documentation,” Grafana Labs, 2024.
20. Docker Inc., “Docker Compose: Defining and Running Multi-Container Applications,” Docker Documentation, 2023.
21. Kleppmann, M., Designing Data-Intensive Applications, O’Reilly Media, 2017.
image
Copyright © IJIRCCE 2020.All right reserved