Traversed Edges Per Second (TEPS) is a benchmark for measuring a computer’s ability to communicate information internally. Given several assumptions, we can also estimate the human brain’s communication performance in terms of TEPS, and use this to meaningfully compare brains to computers. We estimate that (given these assumptions) the human brain performs around 0.18 – 6.4 * 1014 TEPS. This is within an order of magnitude more than existing supercomputers.
At current prices for TEPS, we estimate that it costs around $4,700 – $170,000/hour to perform at the level of the brain. Our best guess is that ‘human-level’ TEPS performance will cost less than $100/hour in seven to fourteen years, though this is highly uncertain.
Motivation: why measure the brain in TEPS?
Why measure communication?
Performance benchmarks such as floating point operations per second (FLOPS) and millions of instructions per second (MIPS) mostly measure how fast a computer can perform individual operations. However a computer also needs to move information around between the various components performing operations.1 This communication takes time, space and wiring, and so can substantially affect overall performance of a computer, especially on data intensive applications. Consequently when comparing computers it is useful to have performance metrics that emphasize communication as well as ones that emphasize computation. When comparing computers to the brain, there are further reasons to be interested in communication performance, as we shall see below.
Communication is a plausible bottleneck for the brain
In modern high performance computing, communication between and within processors and memory is often a significant cost.2 3 4 5 Our impression is that in many applications it is more expensive than performing individual bit operations, making operations per second a less relevant measure of computing performance.
We should expect computers to become increasingly bottlenecked on communication as they grow larger, for theoretical reasons. If you scale up a computer, it requires linearly more processors, but superlinearly more connections for those processors to communicate with one another quickly. And empirically, this is what happens: the computers which prompted the creation of the TEPS benchmark were large supercomputers.
It’s hard to estimate the relative importance of computation and communication in the brain. But there are some indications that communication is an important expense for the human brain as well. A substantial part of the brain’s energy is used to transmit action potentials along axons rather than to do non-trivial computation.6 Our impression is also that the parts of the brain responsible for communication (e.g. axons) comprise a substantial fraction of the brain’s mass. That substantial resources are spent on communication suggests that communication is high value on the margin for the brain. Otherwise, resources would likely have been directed elsewhere during our evolutionary history.
Today, our impression is that networks are typically implemented on single machines because communication between processors is otherwise very expensive. But the power of individual processors is not increasing as rapidly as costs are falling, and even today it would be economical to use thousands of machines if doing so could yield human-level AI. So it seems quite plausible that communication will become a very large bottleneck as neural networks scale further.
In sum, we suspect communication is a bottleneck for the brain for three reasons: the brain is a large computer, similar computing tasks tend to be bottlenecked in this way, and the brain uses substantial resources on communication.
If communication is a bottleneck for the brain, this suggests that it will also be a bottleneck for computers with similar performance to the brain. It does not strongly imply this: a different kind of architecture might be bottlenecked by different factors.
Cost-effectiveness of measuring communication costs
It is much easier to estimate communication within the brain than to estimate computation. This is because action potentials seem to be responsible for most of the long-distance communication7, and their information content is relatively easy to quantify. It is much less clear how many ‘operations’ are being done in the brain, because we don’t know in detail how the brain represents the computations it is doing.
Another issue that makes computing performance relatively hard to evaluate is the potential for custom hardware. If someone wants to do a lot of similar computations, it is possible to design custom hardware which computes much faster than a generic computer. This could happen with AI, making timing estimates based on generic computers too late. Communication may also be improved by appropriate hardware, but we expect the performance gains to be substantially smaller. We have not investigated this question.
Measuring the brain in terms of communication is especially valuable because it is a relatively independent complement to estimates of the brain’s performance based on computation. Moravec, Kurzweil and Sandberg and Bostrom have all estimated the brain’s computing performance, and used this to deduce AI timelines. We don’t know of estimates of the total communication within the brain, or the cost of programs with similar communication requirements on modern computers. These an important and complementary aspect of the cost of ‘human-level’ computing hardware.
Traversed edges per second (TEPS) is a metric that was recently developed to measure communication costs, which were seen as neglected in high performance computing.8 The TEPS benchmark measures the time required to perform a breadth-first search on a large random graph, requiring propagating information across every edge of the graph (either by accessing memory locations associated with different nodes, or communicating between different processors associated with different nodes).9 You can read about the benchmark in more detail at the Graph 500 site.
TEPS as a meaningful way to compare brains and computers
Basic outline of how to measure a brain in TEPS
Though a brain cannot run the TEPS benchmark, we can roughly assess the brain’s communication ability in terms of TEPS. The brain is a large network of neurons, so we can ask how many edges between the neurons (synapses) are traversed (transmit signals) every second. This is equivalent to TEPS performance in a computer in the sense that the brain is sending messages along edges in a graph. However it differs in other senses. For instance, a computer with a certain TEPS performance can represent many different graphs and transmit signals in them, whereas we at least do not know how to use the brain so flexibly. This calculation also makes various assumptions, to be discussed shortly.
One important interpretation of the brain’s TEPS performance calculated in this way is as a lower bound on communication ability needed to simulate a brain on a computer to a level of detail that included neural connections and firing. The computer running the simulation would need to be traversing this many edges per second in the graph that represented the brain’s network of neurons.
Most relevant communication is between neurons
The brain could be simulated at many levels of detail. For instance, in the brain, there is both communication between neurons and communication within neurons. We are considering only communication between neurons. This means we might underestimate communication taking place in the brain.
Our impression is that essentially all long-distance communication in the brain takes place between neurons, and that such long-distance communication is a substantial fraction of the brain’s communication. The reasons for expecting communication to be a bottleneck—that the brain spends much matter and energy on it; that it is a large cost in large computers; and that algorithms which seem similar to the brain tend to suffer greatly from communication costs—also suggest that long distance communication alone is a substantial bottleneck.
Traversing an edge is relevantly similar to spiking
We are assuming that a computer traversing an edge in a graph (as in the TEPS benchmark) is sufficient to functionally replicate a neuron spiking. This might not be true, for instance if the neuron spike sends more information than the edge traversal. This might happen if there were more perceptibly different times each second at which the neuron could send a signal. We could usefully refine the current estimate by measuring the information contained in neuron spikes and traversed edges.10
Distributions of edges traversed don’t make a material difference
The distribution of edges traversed in the brain is presumably quite different from the one used in the TEPS benchmark. We are ignoring this, assuming that it doesn’t make a large difference to the number of edges that can be traversed. This might not be true, if for instance the ‘short’ connections in the brain are used more often. We know of no particular reason to expect this, but it would be a good thing to check in future.
Graph characteristics are relevantly similar
Graphs vary in how many nodes they contain, how many connections exist between nodes, and how the connections are distributed. If these parameters are quite different for the brain and the computers tested on the TEPS benchmark, we should be more wary interpreting computer TEPS performance as equivalent to what the brain does. For instance, if the brain consisted of a very large number of nodes with very few connections, and computers could perform at a certain level on much smaller graphs with many connections, then even if the computer could traverse as many edges per second, it may not be able to carry out the edge traversals that the brain is doing.
However graphs with different numbers of nodes are more comparable than they might seem. Ten connected nodes with ten links each can be treated as one node with around ninety links. The links connecting the ten nodes are a small fraction of those acting as outgoing links, so whether the central ‘node’ is really ten connected nodes should make little difference to a computer’s ability to deal with the graph. The most important parameters are the number of edges and the number of times they are traversed.
We can compare the characteristics of brains and graphs in the TEPS benchmark. The TEPS benchmark uses graphs with up to 2 * 1012 nodes,11 while the human brain has around 1011 nodes (neurons). Thus the human brain is around twenty times smaller (in terms of nodes) than the largest graphs used in the TEPS benchmark.
The brain contains many more links than the TEPS benchmark graphs. TEPS graphs appear to have average degree 32 (that is, each node has 32 links on average),12 while the brain apparently has average degree around 3,600 – 6,400.13
The distribution of connections in the brain and the TEPS benchmark are probably different. Both are small-world distributions, with some highly connected nodes and some sparsely connected nodes, however we haven’t compared them in depth. The TEPS graphs are produced randomly, which should be a particularly difficult case for traversing edges in them (according to our understanding). If the brain has more local connections, traversing edges in it should be somewhat easier.
We expect the distribution of connections to make a small difference. In general, the time required to do a breadth first search depends linearly on the number of edges, and doesn’t depend on degree. The TEPS benchmark is essentially a breadth first search, so we should expect it basically have this character. However in a physical computer, degree probably matters somewhat. We expect that in practice that the cost scales with edges * log(edges), because the difficulty of traversing each edge should scale with log(edges) as edges become more complex to specify. A graph with more local connections and fewer long-distance connections is much like a smaller graph, so that too should not change difficulty much.
How many TEPS does the brain perform?
We can calculate TEPS performed by the brain as follows:
TEPS = synapse-spikes/second in the brain
= Number of synapses in the brain * Average spikes/second in synapses
≈ Number of synapses in the brain * Average spikes/second in neurons
= 1.8-3.2 x 10^14 * 0.1-2
= 0.18 – 6.4 * 10^14
That is, the brain performs at around 18-640 trillion TEPS.
Note that the average firing rate of neurons is not necessarily equal to the average firing rate in synapses, even though each spike involves both a neuron and synapses. Neurons have many synapses, so if neurons that fire faster tend to have more or less synapses than slower neurons, the average rates will diverge. We are assuming here that average rates are similar. This could be investigated further.
For comparison, the highest TEPS performance by a computer is 2.3 * 10^13 TEPS (23 trillion TEPS)14, which according to the above figures is within the plausible range of brains (at the very lower end of the range).
That the brain performs at around 18-640 trillion TEPS means that if communication is in fact a major bottleneck for brains, and also for computer hardware functionally replicating brains, then existing hardware can probably already perform at the level of a brain, or at least at one thirtieth of that level.
Cost of ‘human-level’ TEPS performance
We can also calculate the price of a machine equivalent to a brain in TEPS performance, given current prices for TEPS:
Price of brain-equivalence = TEPS performance of brain * price of TEPS
= TEPS performance of brain/billion * price of GTEPS
= 0.18 – 6.4 * 10^14/10^9 * $0.26/hour
= $0.047 – 1.7 * 10^5/hour
= $4,700 – $170,000/hour
For comparison, supercomputers seem to cost around $2,000-40,000/hour to run, if we amortize their costs across three years.15 So the lower end of this range is within what people pay for computing applications (naturally, since the brain appears to be around as powerful as the largest supercomputers, in terms of TEPS). The lower end of the range is still about 1.5 orders of magnitude more than what people regularly pay for labor. Though the highest paid CEOs appear to make at least $12k/hour.16
Timespan for ‘human-level’ TEPS to arrive
Our best guess is that TEPS/$ grows by a factor of ten every four years, roughly. Thus for computer hardware to compete on TEPS with a human who costs $100/hour should take about seven to thirteen years.17 We are fairly unsure of the growth rate of TEPS however.
- “According to Richard Murphy, a Principal Member of the Technical Staff at Sandia, “The Graph500’s goal is to promote awareness of complex data problems.” He goes on to explain, “Traditional HPC benchmarks – HPL being the preeminent – focus more on compute performance. Current technology trends have led to tremendous imbalance between the computer’s ability to calculate and to move data around, and in some sense produced a less powerful system as a result. Because “big data” problems tend to be more data movement and less computation oriented, the benchmark was created to draw awareness to the problem.”…And yet another perspective comes from Intel’s John Gustafson, a Director at Intel Labs in Santa Clara, CA, “The answer is simple: Graph 500 stresses the performance bottleneck for modern supercomputers. The Top 500 stresses double precision floating-point, which vendors have made so fast that it has become almost completely irrelevant at predicting performance for the full range of applications. Graph 500 is communication-intensive, which is exactly what we need to improve the most. Make it a benchmark to win, and vendors will work harder at relieving the bottleneck of communication.”” – Marvyn, The Case for the Graph 500 – Really Fast or Really Productive? Pick One
- “Unfortunately, due to a lack of locality, graph applications are often memory-bound on shared-memory systems or communication-bound on clusters.” – Beamer et al, Graph Algorithm Platform
- “While traditional performance benchmarks for high-performance computers measure the speed of arithmetic operations, memory access time is a more useful performance gauge for many large problems today. The Graph 500 benchmark has been developed to measure a computer’s performance in memory retrieval…Results are explained in detail in terms of the machine architecture, which demonstrates that the Graph 500 benchmark indeed provides a measure of memory access as the chief bottleneck for many applications.” Angel et al (2012), The Graph 500 Benchmark on a Medium-Size Distributed-Memory Cluster with High-Performance Interconnect
- “The Graph 500 was created to chart how well the world’s largest computers handle such data intensive workloads…In a nutshell, the Graph 500 benchmark looks at “how fast [a system] can trace through random memory addresses,” Bader said. With data intensive workloads, “the bottleneck in the machine is often your memory bandwidth rather than your peak floating point processing rate,” he added.” Jackson (2012) World’s most powerful big data machines charted on Graph 500
- “Making transistors — the tiny on-off switches of silicon chips — smaller and smaller has enabled the computer revolution and the $1 trillion-plus electronics industry. But if some smart scientist doesn’t figure out how to make copper wires better, progress could grind to a halt. In fact, the copper interconnection between transistors on a chip is now a bigger challenge than making the transistors smaller.” Takahashi (2012) Copper wires might be the bottleneck in the way of Moore’s Law
- See Lennie (2003), table 1. Spikes and resting potentials appear to make up around 40% of energy use in the brain. Around 30% of energy in spikes is spent on axons, and we suspect more of the energy on resting potentials is spent on axons. Thus we estimate that at least 10% of energy in the brain is used on communication. We don’t know a lot about the other components of energy use in this chart, so the fraction could be much higher.
- “To achieve long distance, rapid communication, neurons have evolved special abilities for sending electrical signals (action potentials) along axons. This mechanism, called conduction, is how the cell body of a neuron communicates with its own terminals via the axon. Communication between neurons is achieved at synapses by the process of neurotransmission.” – Stufflebeam (2008), Neurons, Synapses, Action Potentials and Neurotransmission
- “According to Richard Murphy, a Principal Member of the Technical Staff at Sandia, “The Graph500’s goal is to promote awareness of complex data problems.” He goes on to explain, “Traditional HPC benchmarks – HPL being the preeminent – focus more on compute performance. Current technology trends have led to tremendous imbalance between the computer’s ability to calculate and to move data around, and in some sense produced a less powerful system as a result. Because “big data” problems tend to be more data movement and less computation oriented, the benchmark was created to draw awareness to the problem.”- Marvyn, The Case for the Graph 500 – Really Fast or Really Productive? Pick One
“The Graph 500 was created to chart how well the world’s largest computers handle such data intensive workloads…In a nutshell, the Graph 500 benchmark looks at “how fast [a system] can trace through random memory addresses,” Bader said. With data intensive workloads, “the bottleneck in the machine is often your memory bandwidth rather than your peak floating point processing rate,” he added.” Jackson (2012) World’s most powerful big data machines charted on Graph 500
“While traditional performance benchmarks for high-performance computers measure the speed of arithmetic operations, memory access time is a more useful performance gauge for many large problems today. The Graph 500 benchmark has been developed to measure a computer’s performance in memory retrieval…Results are explained in detail in terms of the machine architecture, which demonstrates that the Graph 500 benchmark indeed provides a measure of memory access as the chief bottleneck for many applications.” Angel et al (2012), The Graph 500 Benchmark on a Medium-Size Distributed-Memory Cluster with High-Performance Interconnect
- From Graph 500 specifications page:
The benchmark performs the following steps:
- Generate the edge list.
- Construct a graph from the edge list (timed, kernel 1).
- Randomly sample 64 unique search keys with degree at least one, not counting self-loops.
- For each search key:
- Compute the parent array (timed, kernel 2).
- Validate that the parent array is a correct BFS [breadth first search] search tree for the given search tree.
- Compute and output performance information.
- One author personally expects this to make a difference of less than about a factor of two. He would be surprised if action potentials transferred a lot more information than edge traversals in the TEPS benchmark. Also, in general, increasing time resolution only increases the information contained in a signal logarithmically. That is, if neurons can send signals at twice as many different times, this only adds one bit of information to their message. However we have not investigated this topic.
- According to the Graph 500, 2014 list sorted by problem scale, ‘Problem scale’ refers to base two logarithm of the number of graph vertices, and the largest problem scale is 41 (for Sequoia). 241 = 2.2 * 1012
- This page (section 3.4) at the Graph 500 site suggests that ‘edgefactor’ is 16 for the parameter settings they use, and that ‘edgefactor’ is half of degree. Note that our count for the ‘degree’ of a neuron also reflects both incoming and outgoing synapses.
- The brain has 1.8-3.2 x 10¹⁴ synapses and 1011 neurons, implying each neuron is connected to an average of 1.8-3.2 x 10¹⁴ * 2/ 1011 synapses, which is 3,600 – 6,400
- According to the Graph 500 November 2014 rankings, Sequoia at Lawrence Livermore National Laboratory can perform at 23,751 GTEPS.
- “The K Computer in Japan, for example, cost more than $1 billion to build and $10 million to operate each year. Livermore told us it spent roughly $250 million on Sequoia.” – Ars Technica, 2012. This makes the K computer over $38,000/hour.
“In other UK supercomputer news today Daresbury Laboratory in Cheshire has become home to the UK’s most powerful supercomputer…The cost of this system appears to be 10 times (£37.5 million) the above mentioned grant to develop the Emerald GPU supercomputer.” – Hexus, 2012. This places Blue Joule at around $2,100/hour to run. We evaluated the costs of several other supercomputers, and they fell roughly in this range.
- According to Forbes, seven CEOs earn more than $50M per year. If we assume they work 80 hour weeks and take no holidays, this is around $12k/hour
- 4*log(47) – 4*log(1,700)
This seems like it might be a useful thing to cite. Who is the author?
I suggest comparing FP64 vs BF16 or Int8 makes a huge difference to your calculations.
I further suggest our brain may be closer to an Int8 than an FP64 processor.
If so, supercomputers have surpassed brain performance in processing speed (not efficiency yet)
Perhaps, a review is warranted?