← Devansh Shukla — all projects

tiny-autograd

Reverse-mode autodiff engine from scratch, gradient-verified against PyTorch.

a Summary

A from-scratch reverse-mode automatic differentiation engine built to verify, not just reproduce, how backprop works: a scalar Value engine plus a numpy Tensor engine with broadcasting-aware gradients, an MLP/losses layer, and SGD/Momentum/Adam written from first principles.

Every operator's gradient is checked two independent ways — central finite differences and PyTorch float64 autograd — on the argument that the two oracles fail differently, so agreement with both is strong evidence of correctness.

b Results

Key results of tiny-autograd
measurementvaluenote
parity vs PyTorch~3.5e-15worst max gradient difference over 500 random expressions (machine epsilon)
per-op asserts< 1e-6finite-difference agreement ~1e-7
two-moons MLP (2-16-16-1)0.985 accuracytrained with its own Adam; CI-enforced test asserts > 0.95
edge casesdiamond graphs, variable reusegradient-accumulation tests
The nonlinear decision boundary the from-scratch MLP learns on two-moons — the engine trains a real classifier.
tinyautograd_decision_boundary.pngThe nonlinear decision boundary the from-scratch MLP learns on two-moons — the engine trains a real classifier.
MSE training loss over 120 epochs, converging from ~1.5 to ~0.1.
tinyautograd_loss_curve.pngMSE training loss over 120 epochs, converging from ~1.5 to ~0.1.

caveatPedagogical scale — the tensor engine covers add/mul/matmul/relu/sum; no GPU; only toy-task training.

c Code

broadcast-aware product rule and matmul gradients via backward closures — tinygrad_scratch/tensor.py. Full source and tests are on GitHub; the walkthrough notebook reproduces the results table above in Colab.

def __mul__(self, other):
    other = other if isinstance(other, Tensor) else Tensor(other)
    out = Tensor(self.data * other.data, (self, other), "*")
    def _backward():
        self.grad = self.grad + _unbroadcast(other.data * out.grad, self.data.shape)
        other.grad = other.grad + _unbroadcast(self.data * out.grad, other.data.shape)
    out._backward = _backward
    return out

def matmul(self, other):
    out = Tensor(self.data @ other.data, (self, other), "@")
    def _backward():
        self.grad = self.grad + out.grad @ other.data.T
        other.grad = other.grad + self.data.T @ out.grad
    out._backward = _backward
    return out
stack
Python, numpy (sole runtime dependency); dev: PyTorch (parity oracle), pytest, ruff
tests
28 pytest tests - GitHub Actions CI (ruff + pytest)
notebook
notebooks/01_walkthrough.ipynb on Colab