
VishGuard AI detects fake or manipulated voice in vishing attacks using TensorFlow, Flask, and librosa. It uses CNN/CNN+LSTM on log-mel spectrograms and delivers real-time predictions via a web interface.
Python | TensorFlow | Keras | Flask | librosa | scikit-learn | NumPy | Matplotlib | Seaborn | Jupyter Notebook | HTML | CSS | JavaScript
Voice phishing, commonly called vishing, is one of the fastest-growing social engineering threats today. Attackers use AI-generated voices to impersonate bank executives, government officials, or family members, and victims often cannot tell the difference. VishGuard AI is a final year project built to detect exactly these kinds of attacks using deep learning on audio signals.
The system trains two neural network architectures, a CNN and a CNN+LSTM, on 128-band log-mel spectrograms extracted from labelled audio files. After training, the best-performing model is automatically selected by F1 score and deployed through a Flask web application where users can upload audio or record directly in the browser and receive a real-time prediction.
If you are looking for a cybersecurity or AI final year project that goes beyond the usual classification demos, this one has enough depth to hold up during viva. You can explore more cybersecurity and AI projects on the cybersecurity projects and AI and ML projects pages.
The project takes a raw audio file, resamples it to 16 kHz, trims silence, normalises the amplitude, and pads or truncates it to exactly four seconds. It then extracts a 128-band log-mel spectrogram from the processed audio and feeds it into the trained model. The model outputs a probability score for two classes: bonafide (genuine human voice) and spoof (synthetic or wavelet-manipulated fake voice). Results are displayed alongside a colour-coded spectrogram rendered directly in the browser.
The CNN model uses three convolutional blocks with batch normalisation, ReLU activation, and max pooling, followed by global average pooling and a dense classification head with dropout. It is fast, generalises well on small datasets, and learns local spectral patterns in the mel spectrogram.
The CNN+LSTM model extends this by reshaping the convolutional output and passing it through two LSTM layers before the classification head. This allows the model to capture temporal dynamics in the spectrogram, which is useful for detecting synthesis artefacts that change over time. Both models are trained with early stopping and a learning rate scheduler to avoid overfitting on the 100-file dataset.
The project uses the Fake Audio dataset which contains 50 bonafide recordings and 50 spoof recordings across five matched speaker pairs. The spoof files are generated through wavelet-domain manipulation using the db10 wavelet at four decomposition levels. The training set is augmented to roughly 280 samples using noise injection, time shifting, and pitch shifting. The 70/15/15 train-validation-test split is stratified to maintain class balance.
The source code includes the complete Jupyter notebook with 10 annotated training cells, the Flask application with all four HTML templates, and a requirements file. After training, the notebook saves the best model as best_model.h5 and a config.pkl that ensures identical preprocessing during inference. The project runs locally on CPU in 5 to 20 minutes of training time and requires roughly 4 GB of RAM.
You can also get a pre-built project report, custom presentation slides, or a research paper draft through our research paper publishing service. If you need the project set up on your machine with a walkthrough of the code, the project setup and explanation service covers exactly that.
This project works well for BCA, MCA, BTech CSE, and BSc IT students looking for a final year project in cybersecurity or deep learning. The combination of audio processing, two neural architectures, a REST API, and a full web UI gives you enough material to write a solid report and defend the project in front of an evaluation panel. Students who want a more unusual topic than the standard disease prediction or chatbot project will find this one stands out.
If you are comparing similar projects, take a look at the deepfake detection project and the PhishGuard AI phishing email detection project for related work in media authenticity and fraud detection.
Add any of these professional upgrades to save time and impress your evaluators.
We'll install and configure the project on your PC via remote session (Google Meet, Zoom, or AnyDesk).
1-hour live session to explain logic, flow, database design, and key features.
Want to know exactly how the setup works? Review our detailed step-by-step process before scheduling your session.
Fully customized to match your college format, guidelines, and submission standards.
Need feature changes, UI updates, or new features added?
Charges vary based on complexity.
We'll review your request and provide a clear quote before starting work.