How do we interpret the outputs of a neural network trained on classification?
Author(s)
Xie, Yudi
DownloadYudiXie_How_do_we_interpret_the_outputs_of_a_neural_network_trained_on_classification.pdf (2.522Mb)
Open Access Policy
Open Access Policy
Creative Commons Attribution-Noncommercial-Share Alike
Terms of use
Metadata
Show full item recordAbstract
Deep neural networks are widely used for classification tasks, but the interpretation of their output activations is often unclear. This tutorial article explains
how these outputs can be understood as approximations of the Bayesian posterior.
We showed that, in theory, the loss function for classification tasks – derived by
maximum likelihood – is minimized by the Bayesian posterior. We conducted
empirical studies training neural networks to classify synthetic data from a known
generative model. In a simple classification task, the network closely approximates the theoretically derived posterior. However, a few changes in the task can
make accurate approximation much more difficult. The ability of the networks to
approximate the posterior depends on multiple factors, such as the complexity of
the posterior and whether there is sufficient data for learning.
Description
Blogposts Track. ICLR 2025, 24-28 April, Singapore.
Date issued
2025-04-28Department
Massachusetts Institute of Technology. Department of Brain and Cognitive SciencesPublisher
International Conference on Learning Representations
Citation
Xie, Yudi. 2025. "How do we interpret the outputs of a neural network trained on classification?."
Version: Author's final manuscript