International Journal of Innovative Research in Computer and Communication Engineering
ISSN Approved Journal | Impact factor: 8.771 | ESTD: 2013 | Follows UGC CARE Journal Norms and Guidelines
| Monthly, Peer-Reviewed, Refereed, Scholarly, Multidisciplinary and Open Access Journal | High Impact Factor 8.771 (Calculated by Google Scholar and Semantic Scholar | AI-Powered Research Tool | Indexing in all Major Database & Metadata, Citation Generator | Digital Object Identifier (DOI) |
| TITLE | An Adaptive and Noise-Resilient AI Framework for Deepfake Voice Detection in Secure Multi-Agency Defence Communications |
|---|---|
| ABSTRACT | The proliferation of AI-generated synthetic voices poses a critical and escalating threat to secure multi-agency defence communications, where voice-based authentication and command relay are integral to operational integrity. Traditional speaker verification systems are highly vulnerable to advanced neural text-to-speech (TTS) and voice conversion (VC) techniques. This paper proposes an Adaptive and Noise-Resilient AI Framework for Deepfake Voice Detection (ANRF-DVD) that employs a hybrid deep learning architecture combining Temporal Convolutional Networks (TCN), Bidirectional Long Short-Term Memory (BiLSTM), and a Transformer-based attention mechanism to detect synthetic speech in real-time under adverse acoustic conditions. The framework integrates multi-domain acoustic feature fusion — encompassing Mel-Frequency Cepstral Coefficients (MFCC), Constant-Q Cepstral Coefficients (CQCC), and raw waveform embeddings — with an adaptive noise suppression module calibrated for tactical radio channel distortions. Trained on a composite dataset of 210,000 utterances spanning six voice spoofing categories, the proposed model achieves an Equal Error Rate (EER) of 1.24% and an Area Under the Receiver Operating Characteristic Curve (AUC-ROC) of 99.61%, surpassing state-of-the-art anti-spoofing baselines. The framework processes audio segments in under 28 milliseconds on an NVIDIA Jetson AGX Xavier edge platform, enabling deployment in latency-critical defence communication nodes. A real-world evaluation across three simulated tactical communication scenarios demonstrates a detection accuracy of 97.8% under battlefield noise profiles (SNR: 5–25 dB). This work represents a significant advancement in securing AI-enabled voice authentication pipelines for national defence and inter-agency coordination systems. |
| AUTHOR | PROF. MANJULA P, SAGAR K S, VINAY KARTHIK K R Asst. Professor, Department of Computer Science and Engineering, Jain Institute of Technology, Davangere, Karnataka, India Department of Computer Science and Engineering, Jain Institute of Technology, Davangere Karnataka, India |
| VOLUME | 184 |
| DOI | DOI: 10.15680/IJIRCCE.2026.1405067 |
| pdf/67_An Adaptive and Noise-Resilient AI Framework for Deepfake Voice Detection in Secure Multi-Agency Defence Communications.pdf | |
| KEYWORDS | |
| References | [1] A. van den Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, N. Kalchbrenner, A. Senior, and K. Kavukcuoglu, "WaveNet: A generative model for raw audio," arXiv:1609.03499, 2016. [2] J. Shen, R. Pang, R. J. Weiss, M. Schuster, N. Jaitly, Z. Yang, Z. Chen, Y. Zhang, Y. Wang, R. Skerrv-Ryan, R. A. Saurous, Y. Agiomyrgiannakis, and Y. Wu, "Natural TTS synthesis by conditioning WaveNet on mel spectrogram predictions," in Proc. IEEE ICASSP, 2018, pp. 4779–4783. [3] J. Kim, J. Kong, and J. Son, "Conditional variational autoencoder with adversarial learning for end-to-end text-to-speech," in Proc. ICML, 2021, pp. 5530–5540. [4] X. Wang, J. Yamagishi, M. Todisco, H. Delgado, A. Nautsch, N. Evans, M. Sahidullah, V. Vestman, T. Kinnunen, K. A. Lee, L. Juvela, P. Alku, Y.-H. Peng, H.-T. Hwang, Y. Tsao, H.-M. Wang, S. Le Maguer, M. Beker, F. Henderson, R. Clark, Y. Zhang, Q. Wang, Y. Jia, K. Tan, H. Zen, and Y. Wu, "ASVspoof 2019: A large-scale public database of synthesized, converted and replayed speech," Comput. Speech Lang., vol. 64, p. 101114, 2020. [5] J. Delgado, M. Todisco, M. Sahidullah, N. Evans, T. Kinnunen, K. A. Lee, and J. Yamagishi, "ASVspoof 2017 Version 2.0: Meta-data analysis and baseline enhancements," in Proc. Odyssey, 2018, pp. 296–303. [6] M. Sahidullah, T. Kinnunen, and C. Hanilci, "A comparison of features for synthetic speech detection," in Proc. Interspeech, 2015, pp. 2087–2091. [7] C.-I. Lai, N. Chen, J. Villalba, and N. Dehak, "ASSERT: Anti-spoofing with squeeze-excitation and residual networks," in Proc. Interspeech, 2019, pp. 1013–1017. [8] H. Tak, J. Patino, M. Todisco, A. Nautsch, N. Evans, and A. Larcher, "End-to-end anti-spoofing with RawNet2," in Proc. IEEE ICASSP, 2021, pp. 6369–6373. [9] J. Jung, H.-S. Heo, H. Tak, H.-J. Shim, J. S. Chung, B.-J. Lee, H.-J. Yu, and N. Evans, "AASIST: Audio anti-spoofing using integrated spectro-temporal graph attention networks," in Proc. IEEE ICASSP, 2022, pp. 6367–6371. [10] H. Wang, H. Dinkel, S. Wang, S. Kang, and Y. Qian, "Investigating self-supervised front-ends for speech spoofing countermeasures," in Proc. Odyssey, 2022, pp. 100–106. [11] J. Yi, J. Fu, J. Tao, Z. Zheng, D. Zhang, C. Lv, and C. Fan, "Improved RawNet with feature map scaling for text-independent speaker verification using raw waveforms," in Proc. Interspeech, 2022, pp. 858–862. [12] R. K. Das, J. Yang, and H. Li, "Long range acoustic and deep spectral features for anti-spoofing detection," in Proc. IEEE ICASSP, 2019, pp. 6186–6190. [13] A. Nautsch, A. Jimenez, A. Treiber, J. Kolberg, C. Jasserand, E. Kindt, H. Delgado, M. Todisco, T. Schneider, Z. B. Acar, T. Schneider, and N. Evans, "Preserving privacy with privacy-preserving speaker verification," IEEE Signal Process. Mag., vol. 36, no. 5, pp. 18–27, 2019. [14] Z. Khanjani, G. Watson, and V. P. Janeja, "Audio deepfakes: A survey," Front. Big Data, vol. 5, p. 1001063, 2023. [15] C. Veaux, J. Yamagishi, and K. MacDonald, "CSTR VCTK Corpus: English multi-speaker corpus for CSTR voice cloning toolkit," University of Edinburgh, The Centre for Speech Technology Research, 2017. [16] V. Panayotov, G. Chen, D. Povey, and S. Khudanpur, "LibriSpeech: An ASR corpus based on public domain audio books," in Proc. IEEE ICASSP, 2015, pp. 5206–5210. [17] J. Jung, H.-S. Heo, H.-J. Shim, and H.-J. Yu, "Improved RawNet with feature map scaling for speaker verification," in Proc. Interspeech, 2020, pp. 3583–3587. |