
A Python final year project that fetches live pollution data from the WAQI API, processes it through a trained XGBoost model, and delivers real-time AQI readings plus a next-day forecast for any Indian city.
Python | Flask | XGBoost | scikit-learn | SQLite | pandas | NumPy | Vanilla JS | HTML5 | CSS3 | WAQI API | joblib | Jupyter Notebook
ClearSky AI is a full-stack web application built as a final year project for students in BCA, MCA, BTech CSE, and BSc IT programs. It pulls live pollution data from the World Air Quality Index (WAQI) API for any Indian city and runs it through a trained XGBoost gradient-boosted regression model to predict the next day's AQI. The output is a clean, functional system that covers real-time monitoring, machine learning inference, weather data, and persistent search history — all in one place.
The model was trained on six years of daily pollution records from 29 major Indian cities, sourced from the Central Pollution Control Board (CPCB) dataset. That historical depth gives it a solid read on India's seasonal pollution patterns, which makes the predictions more grounded than a generic regression model would be.
If you want a AI and machine learning final year project that covers a live data source, a trained ML model, and a working web interface together, ClearSky AI covers all three. It ships with source code and is available with a pre-built college-format project report and optional mentorship from the CodeAj team.
The main dashboard lets users search any Indian city by name — autocomplete covers 60-plus cities — and immediately see the current AQI, a breakdown of nine individual pollutants (PM2.5, PM10, NO2, SO2, CO, O3, NO, NOx, NH3), live weather conditions, and a machine learning forecast for the next day. A separate Weather tab goes deeper: UV index, dew point, Beaufort-scale wind classification, humidity with a progress bar, and contextual outdoor activity recommendations based on actual readings.
A manual prediction page lets users enter their own pollutant values and get a predicted AQI from the same model. This is useful for offline testing, for classrooms where instructors want to show how individual pollutants affect the index, or for students who want to explore the model's behavior before their viva.
All searches are saved to a local SQLite database and shown in a Recent Searches sidebar, so frequently monitored cities are one click away on every session.
The XGBoost model was trained on the city_day.csv file from the CPCB dataset — 29,531 daily records across 29 Indian cities from January 2015 to July 2020. After dropping null AQI rows and applying city-level median imputation for missing pollutant values, 24,824 records went into training. The target variable is AQI_Tomorrow, created by shifting the AQI column backward by one day within each city group to prevent data leakage across cities.
The feature vector has 14 inputs: PM2.5, PM10, NO, NO2, NOx, NH3, CO, SO2, O3, Benzene, Toluene, Xylene, Month, and DayOfWeek. The model runs 500 trees at a learning rate of 0.05, max depth 6, with L1 and L2 regularization. A StandardScaler is applied before inference. Both the scaler and the trained model are saved as .pkl files via joblib and loaded at Flask startup.
The full training pipeline lives in a Jupyter Notebook with ten clearly labelled cells covering data loading, cleaning, EDA, feature engineering, model training, evaluation, and artifact export. The notebook can be run end-to-end to reproduce the model from scratch — which is exactly what viva panels want to see. You can browse more final year projects with source code on the CodeAj marketplace to compare tech stacks.
/api/aqi, /api/predict, /api/history, /api/cities) that work independently from the UIClearSky AI was designed as an academic project, but the architecture maps directly to real scenarios. Environmental monitoring agencies use similar Flask-based stacks to display live pollution data publicly. Smart city dashboards integrate prediction endpoints to trigger alerts when forecast AQI crosses a health threshold. Health apps use pollutant breakdowns to give users location-aware outdoor activity advice.
For final year students, this project covers topics that come up repeatedly in viva sessions: missing value handling with domain-specific imputation, supervised regression with a gradient-boosted ensemble, REST API design in Flask, SQLite integration without an ORM, and a frontend communicating with the backend through fetch calls. It covers enough ground to hold up under detailed technical questioning. You can also look at related air quality route planner project or the Indian climate monitor project for alternative approaches in this domain.
Most AQI projects available online stop at training a model in a notebook. ClearSky AI goes further — the trained model is connected to a live data source, wrapped in a production-ready Flask application, and delivered through a UI that a non-technical examiner can actually interact with. The codebase is structured clearly, the Jupyter Notebook reproduces everything in one run, and the dataset is publicly verifiable through CPCB and Kaggle. These three things together make it much easier to defend in a viva than a project with opaque data or a model that cannot be re-trained from scratch.
CodeAj provides this project with source code, a pre-built college-format project report, and optional add-on services. You can explore the full range of Python and AI final year projects on the marketplace, or check our project services page for report writing, setup sessions, and research paper support.
Add any of these professional upgrades to save time and impress your evaluators.
We'll install and configure the project on your PC via remote session (Google Meet, Zoom, or AnyDesk).
1-hour live session to explain logic, flow, database design, and key features.
Want to know exactly how the setup works? Review our detailed step-by-step process before scheduling your session.
Fully customized to match your college format, guidelines, and submission standards.
Need feature changes, UI updates, or new features added?
Charges vary based on complexity.
We'll review your request and provide a clear quote before starting work.