
JobGuard AI is an AI-powered system that detects fake job postings using NLP and machine learning. It analyzes job descriptions in real time and helps users identify recruitment scams with confidence-based predictions.
Python | Flask | Scikit-learn | NLTK | TF-IDF | Imbalanced-learn (SMOTE) | SciPy | SQLite | Joblib | HTML5 | CSS3 | Vanilla JavaScript
Every day, thousands of fresh graduates and job seekers scroll through online job boards looking for their first break. Hidden among the genuine listings are scam postings built to harvest personal data, charge fake "registration fees", or run phishing attacks. JobGuard AI is a complete, end-to-end machine learning project that reads a job posting and tells you whether it is genuine or fraudulent before anyone falls for it.
This is a full-stack data science build, not a notebook demo. It pairs a properly trained NLP classifier with a production-style Flask web application, a SQLite prediction history, and a clean dark-gold interface. If you are a final year CSE or IT student looking for an AI/ML project that actually solves a real problem and looks polished in front of an evaluator, this one checks every box. You can explore more options like it in our AI & Machine Learning projects collection.
The model is trained on the Employment Scam Aegean Dataset (EMSCAD), which holds 17,880 real-world job listings. Only about 4.8% of them are fraudulent, so the data is heavily imbalanced. That single fact shapes the entire engineering approach. A lazy model that labels everything "Real" would score 95% accuracy while catching zero scams, which is useless. JobGuard AI instead optimises for the F1 score on the fake class, so it is judged on how well it catches actual fraud, not on a misleading accuracy number.
Under the hood, raw text from the title, company profile, description, requirements, and benefits is merged, cleaned, lemmatised, and converted into TF-IDF features. These are combined with meta signals like whether the company has a logo, whether the role is remote-only, and how many scam keywords appear in the text. SMOTE balances the training data without ever touching the test set, and a Linear SVM does the final classification.
The chosen Linear SVM reaches 98.4% overall accuracy, but more importantly it scores 0.84 F1 on the fake class with 83% precision and 85% recall. In plain terms: it catches most scams while keeping false alarms low. Random Forest had a slightly higher ROC-AUC but missed a third of all fake postings, which is unacceptable for a safety tool. The project documents this trade-off clearly, which is exactly the kind of reasoning that earns marks in a viva.
It covers the full data science lifecycle in one place: data cleaning, NLP preprocessing, feature engineering, handling class imbalance, model comparison, evaluation, and finally serving the model through a real web app with a database behind it. That breadth makes it easy to write a strong project report and answer almost any question an examiner throws at you. You get the source code, the trained model artifacts, the dataset, generated plots, and a setup guide, so there is nothing left half-finished.
Need it installed on your own machine, explained line by line, or customised to your college format? Our team handles setup sessions, documentation, presentations, and research paper guidance through CodeAj's project support services. You can also browse the full catalogue of 500+ ready projects on the CodeAj Marketplace.
Python and Flask power the backend, scikit-learn and imbalanced-learn handle the machine learning, NLTK does the language processing, SQLite stores predictions, and a vanilla HTML/CSS/JavaScript frontend keeps the interface fast and dependency-free. Everything runs locally on Python 3.10+ with a single requirements file.
Add any of these professional upgrades to save time and impress your evaluators.
We'll install and configure the project on your PC via remote session (Google Meet, Zoom, or AnyDesk).
1-hour live session to explain logic, flow, database design, and key features.
Want to know exactly how the setup works? Review our detailed step-by-step process before scheduling your session.
Fully customized to match your college format, guidelines, and submission standards.
Need feature changes, UI updates, or new features added?
Charges vary based on complexity.
We'll review your request and provide a clear quote before starting work.