Gagan Saini

Data Scientist & ML Engineer

I build ML systems that reduce cost, automate complexity, and surface actionable insights from large-scale operational data.

Delhi - NCR, India

View Projects Contact Me

About Me

Data Scientist with 2.5 years of professional experience creating ML algorithms and statistical models that help lower costs, simplify complexities, and reveal business intelligence from big data — resulting in a savings of 25% on client cloud computing costs, 40% on report creation, and 30% on incident resolution time. Presently pursuing MSc in Data Science from University of Aberdeen (June 2026), supported by five credentials, including Microsoft Certified: Azure Data Scientist Associate & Microsoft Certified: Azure AI Engineer Associate.

Experience

Data Engineer I — CloudEQ Software India Pvt. Ltd.

Jul 2022 – Dec 2024 | India

  • Designed a machine learning-based resource clustering algorithm in Python utilizing Scikit-learn, TensorFlow, and DBSCAN to detect utilization patterns for 500+ cloud resources, uncovering cost-saving patterns that led to a 25% decrease in client cloud spend
  • Created automated ETL flows using Python and SQL for the collection, cleaning, and transformation of massive infrastructural telemetry from AWS, Azure, and GCP API endpoints — saving up to 40% time on data preparation for statistical analysis
  • Created an anomaly detection system using Python applying threshold-based statistical models to live metrics data of 500+ monitored infrastructural elements, detecting performance degradation trends and decreasing incident response time by 30%
  • Used feature engineering and clustering algorithms in Python for multi-source cloud utilization data to create clusters of similar infrastructures, providing structured analytics data for further analysis
  • Implemented scalable data pipelines for ingestion of structured data using Microsoft Fabric Data Factory technology

Projects

Bayesian Synthetic Fraud Dataset Generator for Music Streaming

Built in collaboration with XYNQ for their ORIGIN fraud detection platform. An expert-elicited Bayesian Network with a hand-crafted DAG (10 nodes, 12 edges) generating 200,000 synthetic music upload records — grounded in published fraud cases including a federal prosecution involving $10M in misappropriated royalties.

  • Two-phase generation pipeline embedding real-world fraud signals: audio hash reuse (4.6×), IP rotation (40%), and temporal bursting
  • Validated with Spearman ρ, Cramér's V (Bergsma bias-corrected), and correlation ratio η — mean absolute deviation ≤ 0.01 across all key variables
  • Confirmed statistically significant class separation (user agent 0.77, duration 0.45) suitable for downstream fraud classifier training
  • Deterministic, dictionary-free phoneme-based text generation for reproducible artist names and track titles without hardcoded word lists

Tech: Python, Bayesian Networks, NumPy, Pandas, Statistical Validation

Certifications

Skills & Technologies

Languages: Python, SQL, Bash

ML / Data Science: Machine Learning, Scikit-learn, XGBoost, TensorFlow, Pandas, NumPy, Matplotlib, Statistical Modelling, Predictive Modelling, Regression, Clustering, Anomaly Detection, Bayesian Networks, Feature Engineering, EDA

Cloud: AWS (EC2, S3, Lambda, SageMaker, CloudWatch), Azure (ML Studio, Data Factory, Monitor), GCP (BigQuery, Dataflow, Operations Suite)

Data Engineering: ETL/ELT Pipelines, Apache Airflow, Data Warehousing, Data Modelling

DevOps / MLOps: Docker, Kubernetes, Terraform, Jenkins, CI/CD, GitHub Actions

Databases: PostgreSQL, MySQL, MongoDB, Redis

Tools: Git, Jupyter, VS Code, Linux

Education

MSc Data Science — University of Aberdeen, UK (Expected June 2026)

B.Tech Civil Engineering — Ajay Kumar Garg Engineering College, Dr. A.P.J. Abdul Kalam Technical University, India (2017–2021)

Contact

Email: sainigagan163@gmail.com

Phone: +91 9911774256

GitHub: github.com/sainigagan163

LinkedIn: linkedin.com/in/gagansaini29

Location: Delhi - NCR, India