← Back to cohort

Muhammad Tauha Kashif

NUST · 2022

muhammad.tauha@outlook.com

Phone

923334727249

https://www.linkedin.com/in/muhammad-tauha-

GitHub

—

Academic

Program

Bachelors in Software Engineering

CGPA

—

Year

2022

Education

School of Electrical Engineering and Computer Sciences

Address

House 25, Block 16 , Sector b-i, township , Lahore , Pakistan

DOB

—

Career

Current role

—

Target role

—

Skills

AWS, GCP, Data Engineering, CI/CD, Jenkins, Docker, Prometheus, Grafana, ETL, PostgreSQL, Pandas, Dask, Polars, Parquet, React, TypeScript, FastAPI, WebSocket, ChromaDB, LangGraph, LLM, Python, Node.js, Git

Verbatim text

The exact text the LLM saw on the page (or the booklet text from the old import). This is what powers semantic search.

Muhammad Tauha Kashif
Cell: 923334727249 |  Email: muhammad.tauha@outlook.com
LinkedIn: https://www.linkedin.com/in/muhammad-tauha-
Address: House 25, Block 16 , Sector b-i, township , Lahore , Pakistan
PROFESSIONAL PROFILE
Fresh Software Engineering graduate passionate about data engineering, with hands-on experience in AWS and GCP pipelines,
real-time processing, and dashboarding. Focused on developing eﬃcient data processing systems that transform raw data into
actionable insights. Experienced in batch data processing, cloud infrastructure optimization, and building analytics dashboards that
drive business decisions.
EDUCATION
Bachelors in Software Engineering
School of Electrical Engineering and Computer Sciences , Islamabad (2022)
INTERNSHIP EXPERIENCE
Software Productivity Strategists (SPS) Inc.
01-Jul-2025 - 22-Sep-2025
DevOps/DataOps Intern • Shipped CI/CD for containerized data services with Jenkins (multibranch) and reproducible Docker builds;
gated quality via pre-commit (Black, Ruﬀ). • Instrumented ETL containers + hosts with Prometheus/Grafana (cAdvisor, Node
Exporter, Alertmanager); built scrape conﬁgs & dashboards for latency, errors, and capacity. • Standardized project scaﬀolding (.env,
.dockerignore, Makefile, READMEs) so new services go from repo ’ deploy with fewer steps and cleaner diffs.
Buildables
22-Jul-2025 - 18-Oct-2025
Data Engineering Intern • Built production-grade ETL pipelines with hash-based change detection (MD5) and PostgreSQL upserts,
processing incremental loads while maintaining full audit trails and execution metadata for data lineage tracking. • Designed star
schema data warehouse implementing SCD Type-2 for historical tracking; wrote complex analytical queries using CTEs, window
functions, and joins to derive business insights from e-commerce transaction data. • Optimized large dataset processing by
benchmarking Pandas vs Dask vs Polars on 2M+ records, achieving 10-15x performance gains through lazy evaluation and Parquet
columnar storage—reducing ﬁle sizes by 75%. • Developed modular data quality framework with custom cleaners for standardizing
inconsistent formats (dates, currencies, names), achieving 99.7% parse success across messy real-world datasets. • Containerized
entire data stack using Docker Compose with isolated PostgreSQL instances, automated schema initialization, and health checks—
ensuring reproducible deployments across environments.
FINAL YEAR PROJECT
AI Development Environment Troubleshooting Copilot
- Autonomous AI Agent System: Developed an intelligent troubleshooting copilot that automates diagnosis and resolution of
development environment conﬁguration issues (Docker, package managers, CLI toolchains) using LLM-powered workﬂow
orchestration. - System Proﬁling & Context Extraction: Built modular CLI utilities for capturing structured error traces and
comprehensive system snapshots (hardware, processes, services, network, installed packages) to provide rich diagnostic context. -
Hybrid Web Architecture: Engineered full-stack solution with React/TypeScript frontend and FastAPI backend, featuring real-time
WebSocket event streaming, chat-based interface, and agent workﬂow visualization. - RAG-Enhanced Troubleshooting Pipeline:
Implemented vector-based semantic retrieval system using ChromaDB with e5-base-v2 embeddings for context-aware error
diagnosis, combined with LangGraph-based multi-stage workﬂow orchestration (initialization → context gathering → step generation
→ error resolution). - Structured Data Pipeline: Designed JSONL-based training data format for error normalization and context
requirement detection across multiple domains (Python, Node.js, Docker, Git), enabling future ﬁne-tuning and knowledge base
expansion. - Production-Ready Features: Integrated error recovery mechanisms, command safety veriﬁcation, diﬀ-based state

AI enrichment

Muhammad Tauha Kashif is a recent Software Engineering graduate with internship experience in Data Engineering and DevOps, focusing on cloud pipelines and ETL systems. He has demonstrated skills in building production-grade data workflows, optimizing performance with tools like Dask and Polars, and implementing monitoring solutions using Prometheus and Grafana.

Skills (AI)

["Data Engineering", "ETL Pipelines", "Python", "PostgreSQL", "Docker", "CI/CD", "Jenkins", "Prometheus", "Grafana", "AWS", "GCP", "Dask", "Polars", "SQL", "React", "FastAPI", "LangGraph", "RAG"]

Status: ai_done

Provenance

Source file: SEECS - Software Engineering-2026(1).pdf
From job #260 page 1
Created: 1778138736