Now let's examine the predictions our model generates and analyze their structure. In our upcoming neural networks deep-dive, we'll conduct a comprehensive accuracy analysis at scale. For now, we'll focus on understanding the prediction output itself and what it reveals about our model's decision-making process.

We'll create a predictions variable by calling our model.predict method on the normalized testing images. This computational step requires the model to process each test image through its trained neural network layers, which takes a moment to complete.

The execution completes in roughly one second—though in Python machine learning terms, even brief processing delays can feel substantial when working with large datasets.

Let's examine the raw output structure. When we print the first testing value using predictions at index zero, we encounter a dense array of floating-point numbers that initially appears cryptic. These values represent probability distributions across our ten possible digit classifications.

The scientific notation reveals telling patterns: 1.13 × 10⁻⁷ represents an extremely low confidence (essentially zero), while 0.99 indicates 99% confidence. The array 9.99 × 10⁻¹ shows another high-confidence prediction. This raw format, while mathematically precise, requires interpretation to become actionable.

This probability array represents the model's confidence distribution across all ten possible digits (0-9). Most values hover near zero—sometimes as low as 0.000001%—indicating the model's strong conviction that the image doesn't represent those particular digits. The index position corresponds directly to the digit value: index 0 represents the digit zero, index 1 represents one, and so forth.

By manually counting through the array positions—zero, one, two, three, four, five, six, seven—we can identify that the model exhibits 99.96% confidence that our first test image represents the digit seven. This level of certainty suggests robust feature recognition within our trained network.


To make these predictions more readable, we'll implement a formatting transformation using Python list comprehension. This approach converts the raw probability values into a percentage format with appropriate decimal precision, making the results more intuitive for analysis.

Our formatting function applies three transformations: converts each prediction to float type, multiplies by 100 for percentage representation, and rounds to two decimal places for clean presentation. The syntax: round(float(prediction * 100), 2) handles this conversion elegantly.

A common implementation pitfall occurs with array handling—we need to specify predictions[0] rather than the entire predictions array, since we're examining a single prediction rather than all 10,000 test results simultaneously.

The formatted output reveals a clear decision pattern: 0%, 0%, 0.04%, 0%, 0%, 0%, 99.96%. Counting through positions zero through seven, we confirm 99.96% confidence for digit seven, with only a marginal 0.04% possibility of digit three. This decisive probability distribution indicates strong model performance.

To verify our manual counting, we can leverage NumPy's argmax function, which returns the index of the highest value in an array. Using np.argmax(predictions[0]) programmatically confirms our prediction: seven. This eliminates human counting errors and provides reliable index identification.

We can validate this prediction against our ground truth labels. Checking testing_labels[0] confirms the correct answer was indeed seven, demonstrating accurate model prediction for this sample.


For broader accuracy assessment, we'll generate predicted digits for multiple samples using list comprehension. The expression converts each prediction array into its most likely digit classification: [int(np.argmax(prediction)) for prediction in predictions]. This creates a clean list of predicted digit values for comparison.

Similarly, we'll format our correct answers for direct comparison: [int(label) for label in testing_labels]. This parallel structure enables systematic accuracy evaluation across our test dataset.

Examining the first 30 predictions against their correct answers reveals perfect accuracy—every single prediction matches its corresponding label. This pattern continues through samples 30-60, maintaining flawless performance across our initial evaluation set.

Extending our analysis through samples 60-90 and 90-120 continues to show perfect accuracy. The model correctly identified all 120 examined samples, suggesting exceptionally robust performance on our handwritten digit recognition task. This level of precision indicates our neural network has successfully learned to distinguish subtle features that differentiate each digit class.

This remarkable accuracy demonstrates the power of well-trained neural networks for image classification tasks. In our next lesson, we'll move beyond manual spot-checking to implement comprehensive accuracy metrics that provide statistical confidence in our model's performance across the entire test dataset. We'll also explore new problem domains and examine fine-tuning techniques—including the critical balance between optimization and overfitting that can make or break production machine learning systems.