← Back to cohort

Razi Haider Bhatti

FAST · 2019 · i19 - 1762

—

Phone

—

GitHub

—

Academic

Program

—

CGPA

—

Year

2019

Education

—

Address

—

DOB

—

Career

Current role

—

Target role

—

Skills

Python, Streamlit, Scikit-learn, MySQL, Github, Tableau

Verbatim text

The exact text the LLM saw on the page (or the booklet text from the old import). This is what powers semantic search.

DriftCach: Enabling Drift in CacheJoin for Near Real Time Data Warehouse
Our project focuses on improving the way data is joined together in data warehousing using machine learning. To do this, we are creating a new algorithm that optimizes the process by storing popular items in a special cache module. By doing this, we can reduce the time and effort required to search for data on disk, which would otherwise slow down the process. Unlike previous work, our algorithm uses machine learning to predict which items will be most popular at any given time, making sure they are stored in the cache with minimal delay. This ensures that we always have the right data at our fingertips, and can speed up the process of data joining even further. Overall, this boosts the speed of the ETL layer.
Features include:
Forecasting of popularity of different products before time using machine learning.
Filling cache with popular items so that more join operations are performed on main memory rather than on disk.
Overall speeds up the ETL layer by optimizing semi-stream join operations which reduces disk I/O.
A Tableau powered near real-time dashboard for Exploratory Data Analysis.
A frontend to monitor the performance or joining speed of the algorithm.

DRIFT CACH
Driftcach forecasts drift (change in frequency of data) in the datastream using Machine Learning to optimize ETL LAYER for near realtime data warehousing

Technology Used:
Python, Streamlit, Scikit-learn, MySQL, Github, Tableau
Supervisor Name:
Dr. M. Asif Naeem
Group Members:
Razi Haider Bhatti (i19 - 1762)

AI enrichment

Razi Haider Bhatti is a student who developed a machine learning-based caching algorithm to optimize ETL processes and reduce disk I/O in data warehousing. The project involved forecasting data drift using Python and Scikit-learn, implementing a Streamlit frontend, and creating a Tableau dashboard for real-time monitoring.

Skills (AI)

["Python", "Scikit-learn", "Streamlit", "Tableau", "MySQL", "Machine Learning", "Data Warehousing", "ETL Optimization", "Git"]

Status: ai_done

Provenance

Source file: FAST - School of Computing -Graduate Directory-2023.pdf
From job #14 page 442
Created: 1778112746