Supercharge Your Model Training
-
Updated
Nov 12, 2025 - Python
Supercharge Your Model Training
Efficient Deep Learning Systems course materials (HSE, YSDA)
Designing IT and ML Applications using Systems Thinking Approach at IIT Bhilai (CS559)
Structured notes on designing scalable and fault-tolerant ML systems, to refresh your knowledge and help you prepare for a system design interview. Covers system design, MLOps, and case studies.
Experimental web application demonstrating how an offline-trained financial fraud detection model can be exposed through a web interface. Built with Flask and a pre-trained XGBoost model to showcase ML inference flow, feature engineering, and result communication — not a production fraud prevention system.
Benchmarking and optimizing transformer inference across PyTorch, ONNXRuntime, and TensorRT with latency/throughput analysis on GPU and CPU.
Deterministic decision gate for AI/ML systems. Risk-Gate enforces strict, schema-driven admissibility boundaries between AI/LLM intent and real system actions. It provides a fixed, human-owned decision structure with deterministic allow/block outcomes, explicit audit logging, and environment-specific policy via configuration — no ML, no heuristics,
Introduction to Machine Learning Systems - Educational materials for ML systems architecture, deployment, and production considerations.
An automated preprocessing pipeline for Telco Customer Churn data, including cleaning, feature engineering, and CI with GitHub Actions.
End-to-end personalized feed ranking system demonstrating retrieval → ranking pipelines, offline evaluation, realistic simulation, and business-aligned diagnostics inspired by large-scale social platforms.
Public engineering notes (ML systems, CV, MIT courses). Notes-only; sources linked.
End-to-end fraud anomaly detection system using FastAPI, Isolation Forest, Streamlit, Docker, and a CI/CD pipeline.
Production-style ML inference system for Pneumonia detection from chest X-rays, featuring custom CNN architectures, versioned model serving, preprocessing parity, observability, drift detection, and rollback using FastAPI and Docker.
Failure-first analysis of retrieval-augmented and agentic systems, focused on isolating and attributing failures across retrieval, planning, execution, memory, and policy layers.
A lightweight, reverse-mode Automatic Differentiation (AD) engine built from scratch using Python and NumPy. Supports dynamic computational graphs and complex linear algebra operations.
Scalable Training Telemetry and Metrics Visualization
Add a description, image, and links to the ml-systems topic page so that developers can more easily learn about it.
To associate your repository with the ml-systems topic, visit your repo's landing page and select "manage topics."