Convolutional Neural Network from Scratch in C++ — 92% MNIST Accuracy

C++ Deep Learning Autodiff MNIST Backpropagation

Overview

Built a complete neural network implementation from scratch in C++, with no machine learning library dependencies. The implementation includes: a tensor class with numerical operations, a reverse-mode automatic differentiation (autodiff) engine, layer definitions (fully-connected, convolutional, pooling, activation), a training loop with mini-batch stochastic gradient descent, and a data loader for the MNIST dataset. The network achieved 92% classification accuracy on the MNIST handwritten digit test set.

Background

Most deep learning practitioners work through high-level frameworks like PyTorch or TensorFlow, which abstract away the mathematical machinery underlying gradient computation and parameter updates. While these frameworks are invaluable for research productivity, they can obscure the details that matter most for understanding model behavior from the ground up.

This project was motivated by the desire to have a complete, first-principles understanding of how neural networks work computationally—not just mathematically on a chalkboard, but in terms of actual memory layout, numerical precision, and the specific data structures that make backpropagation efficient. C++ was chosen for its directness: there are no garbage collectors, no dynamic dispatch overheads, and no hidden copies, so every design decision is visible and consequential.

Technical Details

The implementation covers the full stack:

Tensor class: A multi-dimensional array class with shape tracking, memory management, and support for basic arithmetic, matrix multiplication, and broadcasting.
Autodiff engine: A dynamic computation graph using reverse-mode automatic differentiation. Each operation records its inputs and a backward function (closure) in a graph node. Backpropagation traverses the graph in reverse topological order, accumulating gradients.
Layers: Linear (fully-connected), Conv2D (with configurable kernel size, stride, and padding), MaxPool2D, ReLU, Softmax, and cross-entropy loss.
Optimizer: Mini-batch SGD with configurable learning rate and momentum.
Data pipeline: MNIST binary file parser, batch sampler, and basic data augmentation (random horizontal flips).

The final architecture was a small CNN: two convolutional blocks (conv + ReLU + max pool), followed by two fully-connected layers with a softmax output over 10 digit classes.

Discussion

The most instructive part of this project was implementing the autodiff engine. Unlike symbolic differentiation, reverse-mode autodiff requires tracking a runtime computation graph and ensuring that gradient flow is correctly accumulated when a node is referenced multiple times (i.e., handling shared subexpressions). Getting this right in C++ without reference counting headaches required careful design of the node ownership model.

The 92% accuracy figure is below what a well-tuned modern CNN achieves on MNIST (>99%), but the point was not to maximize accuracy with regularization tricks—it was to validate that the entire computational pipeline (forward pass, loss computation, backpropagation, weight update) was numerically correct and producing reasonable learning dynamics.

This project directly informed my later mechanistic interpretability research: having implemented the mathematical objects (activations, gradients, Jacobians) by hand gives strong intuitions for what patching experiments measure and what the limitations of gradient-based attribution methods are.

Interactive Playground

A WebAssembly-compiled version of the CNN for in-browser digit classification is planned. Check back later.