An Adaptive and Noise-Resilient AI Framework for Deepfake Voice Detection in Secure Multi-Agency Defence Communications

International Journal of Innovative Research in Computer and Communication Engineering

ISSN Approved Journal | Impact factor: 8.771 | ESTD: 2013 | Follows UGC CARE Journal Norms and Guidelines

| Monthly, Peer-Reviewed, Refereed, Scholarly, Multidisciplinary and Open Access Journal | High Impact Factor 8.771 (Calculated by Google Scholar and Semantic Scholar | AI-Powered Research Tool | Indexing in all Major Database & Metadata, Citation Generator | Digital Object Identifier (DOI) |

TITLE	An Adaptive and Noise-Resilient AI Framework for Deepfake Voice Detection in Secure Multi-Agency Defence Communications
ABSTRACT	The proliferation of AI-generated synthetic voices poses a critical and escalating threat to secure multi-agency defence communications, where voice-based authentication and command relay are integral to operational integrity. Traditional speaker verification systems are highly vulnerable to advanced neural text-to-speech (TTS) and voice conversion (VC) techniques. This paper proposes an Adaptive and Noise-Resilient AI Framework for Deepfake Voice Detection (ANRF-DVD) that employs a hybrid deep learning architecture combining Temporal Convolutional Networks (TCN), Bidirectional Long Short-Term Memory (BiLSTM), and a Transformer-based attention mechanism to detect synthetic speech in real-time under adverse acoustic conditions. The framework integrates multi-domain acoustic feature fusion — encompassing Mel-Frequency Cepstral Coefficients (MFCC), Constant-Q Cepstral Coefficients (CQCC), and raw waveform embeddings — with an adaptive noise suppression module calibrated for tactical radio channel distortions. Trained on a composite dataset of 210,000 utterances spanning six voice spoofing categories, the proposed model achieves an Equal Error Rate (EER) of 1.24% and an Area Under the Receiver Operating Characteristic Curve (AUC-ROC) of 99.61%, surpassing state-of-the-art anti-spoofing baselines. The framework processes audio segments in under 28 milliseconds on an NVIDIA Jetson AGX Xavier edge platform, enabling deployment in latency-critical defence communication nodes. A real-world evaluation across three simulated tactical communication scenarios demonstrates a detection accuracy of 97.8% under battlefield noise profiles (SNR: 5–25 dB). This work represents a significant advancement in securing AI-enabled voice authentication pipelines for national defence and inter-agency coordination systems.
AUTHOR	PROF. MANJULA P, SAGAR K S, VINAY KARTHIK K R Asst. Professor, Department of Computer Science and Engineering, Jain Institute of Technology, Davangere, Karnataka, India Department of Computer Science and Engineering, Jain Institute of Technology, Davangere Karnataka, India
VOLUME	184
DOI	DOI: 10.15680/IJIRCCE.2026.1405067
PDF	pdf/67_An Adaptive and Noise-Resilient AI Framework for Deepfake Voice Detection in Secure Multi-Agency Defence Communications.pdf
KEYWORDS
References	[1] A. van den Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, N. Kalchbrenner, A. Senior, and K. Kavukcuoglu, "WaveNet: A generative model for raw audio," arXiv:1609.03499, 2016. [2] J. Shen, R. Pang, R. J. Weiss, M. Schuster, N. Jaitly, Z. Yang, Z. Chen, Y. Zhang, Y. Wang, R. Skerrv-Ryan, R. A. Saurous, Y. Agiomyrgiannakis, and Y. Wu, "Natural TTS synthesis by conditioning WaveNet on mel spectrogram predictions," in Proc. IEEE ICASSP, 2018, pp. 4779–4783. [3] J. Kim, J. Kong, and J. Son, "Conditional variational autoencoder with adversarial learning for end-to-end text-to-speech," in Proc. ICML, 2021, pp. 5530–5540. [4] X. Wang, J. Yamagishi, M. Todisco, H. Delgado, A. Nautsch, N. Evans, M. Sahidullah, V. Vestman, T. Kinnunen, K. A. Lee, L. Juvela, P. Alku, Y.-H. Peng, H.-T. Hwang, Y. Tsao, H.-M. Wang, S. Le Maguer, M. Beker, F. Henderson, R. Clark, Y. Zhang, Q. Wang, Y. Jia, K. Tan, H. Zen, and Y. Wu, "ASVspoof 2019: A large-scale public database of synthesized, converted and replayed speech," Comput. Speech Lang., vol. 64, p. 101114, 2020. [5] J. Delgado, M. Todisco, M. Sahidullah, N. Evans, T. Kinnunen, K. A. Lee, and J. Yamagishi, "ASVspoof 2017 Version 2.0: Meta-data analysis and baseline enhancements," in Proc. Odyssey, 2018, pp. 296–303. [6] M. Sahidullah, T. Kinnunen, and C. Hanilci, "A comparison of features for synthetic speech detection," in Proc. Interspeech, 2015, pp. 2087–2091. [7] C.-I. Lai, N. Chen, J. Villalba, and N. Dehak, "ASSERT: Anti-spoofing with squeeze-excitation and residual networks," in Proc. Interspeech, 2019, pp. 1013–1017. [8] H. Tak, J. Patino, M. Todisco, A. Nautsch, N. Evans, and A. Larcher, "End-to-end anti-spoofing with RawNet2," in Proc. IEEE ICASSP, 2021, pp. 6369–6373. [9] J. Jung, H.-S. Heo, H. Tak, H.-J. Shim, J. S. Chung, B.-J. Lee, H.-J. Yu, and N. Evans, "AASIST: Audio anti-spoofing using integrated spectro-temporal graph attention networks," in Proc. IEEE ICASSP, 2022, pp. 6367–6371. [10] H. Wang, H. Dinkel, S. Wang, S. Kang, and Y. Qian, "Investigating self-supervised front-ends for speech spoofing countermeasures," in Proc. Odyssey, 2022, pp. 100–106. [11] J. Yi, J. Fu, J. Tao, Z. Zheng, D. Zhang, C. Lv, and C. Fan, "Improved RawNet with feature map scaling for text-independent speaker verification using raw waveforms," in Proc. Interspeech, 2022, pp. 858–862. [12] R. K. Das, J. Yang, and H. Li, "Long range acoustic and deep spectral features for anti-spoofing detection," in Proc. IEEE ICASSP, 2019, pp. 6186–6190. [13] A. Nautsch, A. Jimenez, A. Treiber, J. Kolberg, C. Jasserand, E. Kindt, H. Delgado, M. Todisco, T. Schneider, Z. B. Acar, T. Schneider, and N. Evans, "Preserving privacy with privacy-preserving speaker verification," IEEE Signal Process. Mag., vol. 36, no. 5, pp. 18–27, 2019. [14] Z. Khanjani, G. Watson, and V. P. Janeja, "Audio deepfakes: A survey," Front. Big Data, vol. 5, p. 1001063, 2023. [15] C. Veaux, J. Yamagishi, and K. MacDonald, "CSTR VCTK Corpus: English multi-speaker corpus for CSTR voice cloning toolkit," University of Edinburgh, The Centre for Speech Technology Research, 2017. [16] V. Panayotov, G. Chen, D. Povey, and S. Khudanpur, "LibriSpeech: An ASR corpus based on public domain audio books," in Proc. IEEE ICASSP, 2015, pp. 5206–5210. [17] J. Jung, H.-S. Heo, H.-J. Shim, and H.-J. Yu, "Improved RawNet with feature map scaling for speaker verification," in Proc. Interspeech, 2020, pp. 3583–3587.

About Us

The primary objective of IJIRCCE is to serve as an international scholarly platform that enables researchers, innovators, students, and research scholars to disseminate their research findings and technological advancements to a global academic audience.

About Us

GET IN TOUCH

Useful Links

ARTICLES

About Us

GET IN TOUCH

Useful Links