AI-Powered Fake Job Detector | NLP & Machine Learning Final Year Project

AI-Powered Fake Job Detector | NLP & Machine Learning Final Year Project

JobGuard AI is an AI-powered system that detects fake job postings using NLP and machine learning. It analyzes job descriptions in real time and helps users identify recruitment scams with confidence-based predictions.

Technology Used

Python | Flask | Scikit-learn | NLTK | TF-IDF | Imbalanced-learn (SMOTE) | SciPy | SQLite | Joblib | HTML5 | CSS3 | Vanilla JavaScript

codeAj
codeAjVerified
🏆2K+ Projects Sold
Google Review

499

1999

Get complete project source code + Installation guide + chat support

Project Files

Get Project Files

JobGuard AI – Fake Job Posting Detector

Every day, thousands of fresh graduates and job seekers scroll through online job boards looking for their first break. Hidden among the genuine listings are scam postings built to harvest personal data, charge fake "registration fees", or run phishing attacks. JobGuard AI is a complete, end-to-end machine learning project that reads a job posting and tells you whether it is genuine or fraudulent before anyone falls for it.

This is a full-stack data science build, not a notebook demo. It pairs a properly trained NLP classifier with a production-style Flask web application, a SQLite prediction history, and a clean dark-gold interface. If you are a final year CSE or IT student looking for an AI/ML project that actually solves a real problem and looks polished in front of an evaluator, this one checks every box. You can explore more options like it in our AI & Machine Learning projects collection.

What the project actually does

The model is trained on the Employment Scam Aegean Dataset (EMSCAD), which holds 17,880 real-world job listings. Only about 4.8% of them are fraudulent, so the data is heavily imbalanced. That single fact shapes the entire engineering approach. A lazy model that labels everything "Real" would score 95% accuracy while catching zero scams, which is useless. JobGuard AI instead optimises for the F1 score on the fake class, so it is judged on how well it catches actual fraud, not on a misleading accuracy number.

Under the hood, raw text from the title, company profile, description, requirements, and benefits is merged, cleaned, lemmatised, and converted into TF-IDF features. These are combined with meta signals like whether the company has a logo, whether the role is remote-only, and how many scam keywords appear in the text. SMOTE balances the training data without ever touching the test set, and a Linear SVM does the final classification.

Key Features

  • Real-time fraud prediction — paste any job description and get an instant Real or Fake verdict with a confidence score.
  • Explainable risk factors — instead of a black-box answer, it surfaces reasons such as suspicious phrases, missing company logo, remote-only posting, or an unusually short description.
  • Four-model comparison built in — Linear SVM, Random Forest, Logistic Regression, and Multinomial Naive Bayes are all trained and benchmarked, so you can defend exactly why one model was chosen.
  • Handles imbalanced data the right way — SMOTE oversampling on training data only, with the test set kept completely clean for honest evaluation.
  • Prediction history and live stats — every check is stored in a SQLite database and shown in a running history with real-vs-fake counters.
  • Animated Obsidian-Gold web UI — a dark, minimal interface with a confidence gauge and counter animations, built in plain HTML, CSS, and vanilla JavaScript (no heavy frontend framework to set up).
  • Report-ready EDA — class balance, word clouds, confusion matrices, ROC curves, and TF-IDF term plots are generated automatically and saved for your documentation.

Model Performance

The chosen Linear SVM reaches 98.4% overall accuracy, but more importantly it scores 0.84 F1 on the fake class with 83% precision and 85% recall. In plain terms: it catches most scams while keeping false alarms low. Random Forest had a slightly higher ROC-AUC but missed a third of all fake postings, which is unacceptable for a safety tool. The project documents this trade-off clearly, which is exactly the kind of reasoning that earns marks in a viva.

Real-World Applications

  • Job portals and recruitment platforms — auto-flag suspicious listings before they go live to users.
  • Campus placement cells — screen off-campus opportunities shared with students for safety.
  • Browser safety extensions — the future-work roadmap includes embedding detection directly into LinkedIn and Indeed.
  • HR and trust-and-safety teams — a first-pass filter that pairs naturally with recruitment tooling like our AI-powered career and recruitment systems.
  • Academic submissions — a complete, well-documented final year project covering NLP, classification, imbalanced learning, and web deployment.

Why students pick this project

It covers the full data science lifecycle in one place: data cleaning, NLP preprocessing, feature engineering, handling class imbalance, model comparison, evaluation, and finally serving the model through a real web app with a database behind it. That breadth makes it easy to write a strong project report and answer almost any question an examiner throws at you. You get the source code, the trained model artifacts, the dataset, generated plots, and a setup guide, so there is nothing left half-finished.

Need it installed on your own machine, explained line by line, or customised to your college format? Our team handles setup sessions, documentation, presentations, and research paper guidance through CodeAj's project support services. You can also browse the full catalogue of 500+ ready projects on the CodeAj Marketplace.

Tech Stack at a Glance

Python and Flask power the backend, scikit-learn and imbalanced-learn handle the machine learning, NLTK does the language processing, SQLite stores predictions, and a vanilla HTML/CSS/JavaScript frontend keeps the interface fast and dependency-free. Everything runs locally on Python 3.10+ with a single requirements file.

Extra Add-Ons Available – Elevate Your Project

Add any of these professional upgrades to save time and impress your evaluators.

Project Setup

We'll install and configure the project on your PC via remote session (Google Meet, Zoom, or AnyDesk).

Source Code Explanation

1-hour live session to explain logic, flow, database design, and key features.

Want to know exactly how the setup works? Review our detailed step-by-step process before scheduling your session.

999

Custom Documents (College-Tailored)

  • Custom Project Report: ₹1,500
  • Custom Research Paper: ₹1,000
  • Custom PPT: ₹800

Fully customized to match your college format, guidelines, and submission standards.

Project Modification

Need feature changes, UI updates, or new features added?

Charges vary based on complexity.

We'll review your request and provide a clear quote before starting work.

Project Files

⭐ 98% SUCCESS RATE
  • Full Development
  • Documentation
  • Presentation Prep
  • 24/7 Support
Chat with us