International Journal of Innovative Research in Computer and Communication Engineering
ISSN Approved Journal | Impact factor: 8.771 | ESTD: 2013 | Follows UGC CARE Journal Norms and Guidelines
| Monthly, Peer-Reviewed, Refereed, Scholarly, Multidisciplinary and Open Access Journal | High Impact Factor 8.771 (Calculated by Google Scholar and Semantic Scholar | AI-Powered Research Tool | Indexing in all Major Database & Metadata, Citation Generator | Digital Object Identifier (DOI) |
| TITLE | Deep Learning-Based Image Caption Generator with AI- Powered Text -to-Speech Integration |
|---|---|
| ABSTRACT | This work presents an image and live-video captioning system that automatically describes visual content and converts the generated text into speech. The system uses BLIP for image captioning and YOLO for real-time object detection in video streams. Captions created by these models are transformed into audio using gTTS for images and pyttsx3 for video. A simple Streamlit interface enables users to upload an image or activate the webcam to receive instant text and voice output. The system is designed to enhance accessibility for users—especially those with visual impairments—by providing fast, clear, and meaningful narration of visual scenes. Experiments show that the system delivers accurate descriptions and smooth audio responses across a wide range of inputs. |
| AUTHOR | CHAITRA K C, PUNITH T A, ABHISHEKA B J, SHASHANK M R, RAGHOOTTAM S GAD |
| VOLUME | 176 |
| DOI | DOI: 10.15680/IJIRCCE.2025.1311096 |
| pdf/96_Deep Learning-Based Image Caption Generator with AI- Powered Text -to-Speech Integration.pdf | |
| KEYWORDS |