Deep Learning-Based Image Caption Generator with AI- Powered Text -to-Speech Integration

International Journal of Innovative Research in Computer and Communication Engineering

ISSN Approved Journal | Impact factor: 8.771 | ESTD: 2013 | Follows UGC CARE Journal Norms and Guidelines

| Monthly, Peer-Reviewed, Refereed, Scholarly, Multidisciplinary and Open Access Journal | High Impact Factor 8.771 (Calculated by Google Scholar and Semantic Scholar | AI-Powered Research Tool | Indexing in all Major Database & Metadata, Citation Generator | Digital Object Identifier (DOI) |

TITLE	Deep Learning-Based Image Caption Generator with AI- Powered Text -to-Speech Integration
ABSTRACT	This work presents an image and live-video captioning system that automatically describes visual content and converts the generated text into speech. The system uses BLIP for image captioning and YOLO for real-time object detection in video streams. Captions created by these models are transformed into audio using gTTS for images and pyttsx3 for video. A simple Streamlit interface enables users to upload an image or activate the webcam to receive instant text and voice output. The system is designed to enhance accessibility for users—especially those with visual impairments—by providing fast, clear, and meaningful narration of visual scenes. Experiments show that the system delivers accurate descriptions and smooth audio responses across a wide range of inputs.
AUTHOR	CHAITRA K C, PUNITH T A, ABHISHEKA B J, SHASHANK M R, RAGHOOTTAM S GAD
VOLUME	176
DOI	DOI: 10.15680/IJIRCCE.2025.1311096
PDF	pdf/96_Deep Learning-Based Image Caption Generator with AI- Powered Text -to-Speech Integration.pdf
KEYWORDS

About Us

The primary objective of IJIRCCE is to serve as an international scholarly platform that enables researchers, innovators, students, and research scholars to disseminate their research findings and technological advancements to a global academic audience.

About Us

GET IN TOUCH

Useful Links

ARTICLES

About Us

GET IN TOUCH

Useful Links