Automatic differentiation is the heart of modern machine learning. Lamp++'s autograd system builds computation graphs on-the-fly and computes gradients using the chain rule. This guide explains how it works and how to use it effectively.
The big picture
When you do math with Variables (instead of plain Tensors), Lamp++ secretly builds a computation graph. Each operation creates a node that remembers how to compute gradients. When you call .backward()
, it walks through this graph backward, computing derivatives using the chain rule.
Here's the simplest possible example:
#include "lamppp/lamppp.hpp"
int main() {
lmp::Tensor data({2.0f}, {1}, lmp::DeviceType::CPU, lmp::DataType::Float32);
y.backward();
std::cout << "Gradient: " << x.grad() << std::endl;
return 0;
}
Definition variable.hpp:48
Main tensor object for Lamppp.
Definition tensor.hpp:29
Variables: Tensors with gradient tracking
Variables are wrappers around Tensors that can track gradients. Every Variable has four key components:
var.data();
var.grad();
var.requires_grad();
var.grad_fn();
Creating Variables
lmp::Tensor tensor(data, {2, 3}, lmp::DeviceType::CPU);
lmp::Variable zeros_var = lmp::autograd::zeros({2, 3}, lmp::DeviceType::CPU,
lmp::DataType::Float32, true);
lmp::Variable ones_var = lmp::autograd::ones({2, 3}, lmp::DeviceType::CPU,
lmp::DataType::Float32, true);
lmp::Variable rand_var = lmp::autograd::rand({2, 3}, lmp::DeviceType::CPU,
lmp::DataType::Float32, true);
std::vector<std::vector<float>> nested_data = {{1.0f, 2.0f}, {3.0f, 4.0f}};
lmp::Variable tensor_var = lmp::autograd::tensor(nested_data, lmp::DeviceType::CPU,
lmp::DataType::Float32, true);
Gradient requirements
Only Variables with requires_grad=true
participate in gradient computation:
Rule**: If any input requires gradients, the result requires gradients.
Operations and the computation graph
Every operation on Variables creates a node in the computation graph:
What creates gradient nodes
Operations that create backward nodes:**
- Arithmetic:
+
, -
, *
, /
- Math functions:
exp()
, log()
, sqrt()
, sin()
, cos()
, etc.
- Reductions:
sum()
, max()
, min()
, prod()
- Matrix ops:
matmul()
, transpose()
Shape ops: reshape()
, squeeze()
, expand_dims()
, to()
(device transfer)
Operations that DON'T create gradient nodes:**
- Comparison ops:
==
, !=
, >
, <
, etc. (return boolean tensors without gradients)
- Data access:
.data()
, .shape()
, .device()
, etc.
The backward pass
The backward pass computes gradients using reverse-mode automatic differentiation:
y.backward();
std::cout << "dy/dx: " << x.grad() << std::endl;
Gradient accumulation
Gradients accumulate by default:
y1.backward();
std::cout << "First gradient: " << x.grad() << std::endl;
y2.backward();
std::cout << "Accumulated: " << x.grad() << std::endl;
x.zero_grad();
Important**: Always call zero_grad()
before computing new gradients if you don't want accumulation.
How gradients flow
Understanding gradient flow helps debug neural networks:
Training loop structure
for (int epoch = 0; epoch < num_epochs; ++epoch) {
for (auto& batch : data_loader) {
weights.zero_grad();
bias.zero_grad();
loss.backward();
float learning_rate = 0.01f;
weights =
lmp::Variable(weights.data() - learning_rate * weights.grad(),
true);
bias =
lmp::Variable(bias.data() - learning_rate * bias.grad(),
true);
}
}
Understanding the computation graph
When you call backward()
, Lamp++ performs a topological sort to ensure gradients are computed in the right order:
Working with gradients
Checking and manipulating gradients
std::cout << "Requires grad: " << var.requires_grad() << std::endl;
std::cout << "Has grad_fn: " << (var.grad_fn() != nullptr) << std::endl;
lmp::Tensor custom_grad(grad_data, var.data().shape(), var.data().device());
var.incr_grad(custom_grad);
var.zero_grad();
Debugging gradients
std::cout << "Gradient magnitude: " << lmp::sum(param.grad() * param.grad()) << std::endl;
auto grad_vector = param.grad().to_vector<float>();
bool has_nan = std::any_of(grad_vector.begin(), grad_vector.end(),
[](float x) { return std::isnan(x); });
Performance considerations
Memory usage
- Each operation stores references to its inputs for the backward pass
- Computation graphs can use significant memory for long sequences
- Clear gradients with
zero_grad()
when you don't need them
When to use tensors vs. variables
Use Tensors for:**
Complete example
For a complete neural network implementation using all these concepts, see the MNIST example in examples/mnist.cpp
. It demonstrates:
- Parameter initialization with
autograd::rand()
- Forward pass with
matmul()
, activation functions, and loss computation
- Backward pass and gradient-based parameter updates
- Proper gradient clearing in training loops
The key insight is that autograd turns the tedious job of computing derivatives into an automatic process, letting you focus on the interesting parts of machine learning: designing architectures and solving problems.