We've constructed our neural network model, but before we can put it to work, we need to compile it—a process that might seem peculiar if you're new to machine learning but is fundamental to model optimization.
Compilation is standard practice in both neural networks and traditional programming. At its core, it transforms your model code into a more efficient, executable form optimized for the specific computations ahead. During compilation, we'll configure several critical parameters that determine how our model learns. While the hyperparameters—the configuration settings that control the learning process—are beyond our current scope, they represent one of the most fascinating aspects of machine learning engineering and are worth exploring as you advance your skills.
Every neural network model includes a compile method, reflecting just how central this step is to the training pipeline. Let's walk through the key parameters we'll configure.
First, we'll set our optimizer to Adam—currently one of the most robust and widely-adopted optimization algorithms in deep learning. Adam combines the best features of momentum-based optimization with adaptive learning rates, making it particularly effective for the varied landscapes of neural network training.
Next comes the loss function, which deserves a moment of explanation. Think of loss as your model's internal compass—it measures how far off your predictions are from reality. The loss function quantifies the gap between what your model predicts and what actually happened, providing the feedback signal that drives learning. As training progresses, we want this loss to decrease steadily.
Finally, we need to define our success metrics. While you might choose precision, recall, or F1-score depending on your specific needs, we'll focus on accuracy—the straightforward percentage of correct predictions. For our digit recognition task, this gives us a clear, interpretable measure of performance that's easy to track and communicate.
Now that we've compiled our model, it's time for the exciting part: training. We'll use the fit method, passing in our prepared training data.
Our X_train contains the normalized training images—those pixel values we scaled to improve training stability. Our Y_train holds the corresponding labels, the ground-truth digits 0 through 9 that teach our model what each image actually represents. The epochs parameter (pronounced "EE-poks," though you'll hear variations) determines how many complete passes through our training data the model will make. We'll start with 5 epochs—enough to see meaningful improvement without overfitting.
Watch what happens as training begins. You'll see the model's accuracy climb from around 83% in the first epoch to 85%, then 86%. This isn't random—the model is genuinely learning to recognize patterns in the digit images. Simultaneously, the loss values decrease, confirming that our model's predictions are getting closer to the true labels with each iteration.
By the fourth epoch, we're hitting 98% accuracy—impressive performance that demonstrates the power of well-designed neural networks on image classification tasks. Notice how the loss continues to decrease even as accuracy plateaus, indicating the model is becoming more confident in its correct predictions.
The fifth epoch reveals something crucial about machine learning: diminishing returns. While we reach nearly 99% accuracy, the improvement from epoch 4 to 5 is just 0.3-0.4%—much smaller than the gains we saw earlier. This pattern is fundamental to neural network training and raises important questions about when to stop training, how to avoid overfitting, and how to balance computational cost with performance gains.
In our next section, we'll dive deep into these training dynamics, exploring how to choose optimal epoch numbers, interpret training curves, and make informed decisions about when your model has learned enough.