Do neural networks learn human concepts?

This page is a stub. It does not necessarily represent much of what is known on the topic.

Our understanding is that the degree to which neural networks learn concepts that are potentially understandable to humans is an open question.

Details

A very incomplete list of sources on the topic:

Acquisition of Chess Knowledge in AlphaZero (McGrath et al, 2021)¹
From the paper: ‘…In this work we provide evidence that human knowledge is acquired by the AlphaZero neural network as it trains on the game of chess. By probing for a broad range of human chess concepts we show when and where these concepts are represented in the AlphaZero network….’
Zoom in: An Introduction to Circuits (Olah et al, 2020)²
From the paper: ‘In contrast to the typical picture of neural networks as a black box, we’ve been surprised how approachable the network is on this scale. Not only do neurons seem understandable (even ones that initially seemed inscrutable), but the “circuits” of connections between them seem to be meaningful algorithms corresponding to facts about the world. You can watch a circle detector be assembled from curves. You can see a dog head be assembled from eyes, snout, fur and tongue. You can observe how a car is composed from wheels and windows. You can even find circuits implementing simple logic: cases where the network implements AND, OR or XOR over high-level visual features.’

Featured image: from Olah, et al., “Zoom In: An Introduction to Circuits”, Distill, 2020., CC-BY 4.0

McGrath, Thomas, Andrei Kapishnikov, Nenad Tomašev, Adam Pearce, Demis Hassabis, Been Kim, Ulrich Paquet, and Vladimir Kramnik. “Acquisition of Chess Knowledge in AlphaZero.” ArXiv:2111.09259 [Cs, Stat], November 27, 2021. http://arxiv.org/abs/2111.09259.
Olah, Chris, Nick Cammarata, Ludwig Schubert, Gabriel Goh, Michael Petrov, and Shan Carter. “Zoom In: An Introduction to Circuits.” Distill 5, no. 3 (March 10, 2020): e00024.001. https://doi.org/10.23915/distill.00024.001.