How do we interpret the outputs of a neural network trained on classification?

Xie, Yudi

Author(s)

Xie, Yudi

DownloadYudiXie_How_do_we_interpret_the_outputs_of_a_neural_network_trained_on_classification.pdf (2.522Mb)

Open Access Policy

Terms of use

Creative Commons Attribution-Noncommercial-ShareAlike http://creativecommons.org/licenses/by-nc-sa/4.0/

Metadata

Show full item record

Abstract

Deep neural networks are widely used for classification tasks, but the interpretation of their output activations is often unclear. This tutorial article explains how these outputs can be understood as approximations of the Bayesian posterior. We showed that, in theory, the loss function for classification tasks – derived by maximum likelihood – is minimized by the Bayesian posterior. We conducted empirical studies training neural networks to classify synthetic data from a known generative model. In a simple classification task, the network closely approximates the theoretically derived posterior. However, a few changes in the task can make accurate approximation much more difficult. The ability of the networks to approximate the posterior depends on multiple factors, such as the complexity of the posterior and whether there is sufficient data for learning.

Description

Blogposts Track. ICLR 2025, 24-28 April, Singapore.

Date issued

2025-04-28

URI

https://hdl.handle.net/1721.1/159032

Department

Massachusetts Institute of Technology. Department of Brain and Cognitive Sciences

Publisher

International Conference on Learning Representations

Citation

Xie, Yudi. 2025. "How do we interpret the outputs of a neural network trained on classification?."

Version: Author's final manuscript

Collections

MIT Open Access Articles