International Journal of Innovative Research in Computer and Communication Engineering

ISSN Approved Journal | Impact factor: 8.771 | ESTD: 2013 | Follows UGC CARE Journal Norms and Guidelines

| Monthly, Peer-Reviewed, Refereed, Scholarly, Multidisciplinary and Open Access Journal | High Impact Factor 8.771 (Calculated by Google Scholar and Semantic Scholar | AI-Powered Research Tool | Indexing in all Major Database & Metadata, Citation Generator | Digital Object Identifier (DOI) |


TITLE Deep Learning-Based Image Caption Generator with AI- Powered Text -to-Speech Integration
ABSTRACT This work presents an image and live-video captioning system that automatically describes visual content and converts the generated text into speech. The system uses BLIP for image captioning and YOLO for real-time object detection in video streams. Captions created by these models are transformed into audio using gTTS for images and pyttsx3 for video. A simple Streamlit interface enables users to upload an image or activate the webcam to receive instant text and voice output. The system is designed to enhance accessibility for users—especially those with visual impairments—by providing fast, clear, and meaningful narration of visual scenes. Experiments show that the system delivers accurate descriptions and smooth audio responses across a wide range of inputs.
AUTHOR CHAITRA K C, PUNITH T A, ABHISHEKA B J, SHASHANK M R, RAGHOOTTAM S GAD
VOLUME 176
DOI DOI: 10.15680/IJIRCCE.2025.1311096
PDF pdf/96_Deep Learning-Based Image Caption Generator with AI- Powered Text -to-Speech Integration.pdf
KEYWORDS
image
Copyright © IJIRCCE 2020.All right reserved