In November 2017, we estimate the price for one GFLOPS to be between $0.03 and $3 for single or double precision performance, using GPUs (therefore excluding some applications). Amortized over three years, this is $3.4 x 10-5 -$3.4 x 10-6 /GFLOPShour.
We have written about long term trends and short term trends in the costs of computing hardware. We are interested in evaluating the current prices more thoroughly, both to validate the trend data, and because current hardware prices are particularly important to know about.
We separately investigated CPUs, GPUs, computing as a service, and supercomputers. We used somewhat different methods to estimate the price in these categories, based on the data available. We did not find any definitive source on the most cost-effective in any category, or in general, so our examples are probably not the very cheapest. Nevertheless, these figures give a crude sense for the cost of computation in the contemporary market. Our data is here.
For CPUs and GPUs, we include only the original recommended retail price of the CPU or GPU, and not other computer components (i.e. we do not even include the cost of CPUs in the price of GPUs). In 2015 we compared prices between one complete rack server and the set of four processors inside it, and found the complete server was around 36% more expensive ($30,000 vs. $22,000). We expect this is representative at this scale, but diminishes with scale.
For computing services, we list the cheapest price for renting the instance for a long period, with no additional features. We do not include spot prices.
For supercomputers, we list costs cited where we could find them, which don’t tend to come with elaboration. We expect that they only include upfront costs, and that most of the costs are for hardware.
We have not included the costs of energy or other ongoing expenses in any prices. Non-energy costs are hard to find, and we suspect a relatively small and consistent fraction of costs. In 2015 we estimated energy costs to be around 10% of hardware costs.1
We are interested in empirical performance figures from benchmark tests, but often the only data we could find was for theoretical maximums. We try to use figures for LINPACK and sometimes for DGEMM benchmarks, depending on which are available. LINPACK relies heavily on DGEMM, suggesting DGEMM is fairly comparable.2
Graphics processing units (GPUs) and Xeon Phi machines
We collected performance and price figures from Wikipedia3, which are available here (see ‘Wikipedia GeForce, Radeon, Phi simplified’). These are theoretical performance figures, which we understand to generally be between somewhat optimistic and ten times too high. So this data suggests real prices of around $0.03-$0.3/GFLOPS. We collected both single and double precision figures, but the cheapest were similar.
Note that GPUs are typically significantly restricted in the kinds of applications they can run efficiently; this performance is achieved for highly regular computations that can be carried out in parallel throughout a GPU (of the sort that are required for rendering scenes, but which have also proved useful in scientific computing). Xeon Phi units are similar to GPUs, and have broader application,4 but in this dataset were not among the cheapest machines.
Central processing units (CPUs)
We looked at a small number of popular CPUs on Geekbench from the past five years, and found the cheapest to be around $0.71/GFLOPS.5 However there appear to be 5x disparities between different versions of Geekbench, so we do not trust these numbers a great deal (these figures are from the version we have seen to give relatively high performance figures, and thus low implied prices).
We did not investigate these numbers in great depth, or search far for cheaper CPUs, because CPUs seem to be expensive relative to GPUs, and this minimal investigation, plus our previous investigation in 2015, support this.
Computing as service
Another way to purchase FLOPS is via virtual computers.
Amazon Elastic Cloud Compute (EC2) is a major seller of virtual computing. Based on their current pricing, as of October 5th, 2017, renting a c4.8xlarge instance costs $0.621 per hour (if you purchase it for three years, and pay upfront).
According to a Geekbench report from 2015, a c4.8xlarge instance delivers around 97.5 GFLOPS.6 We do not know if ‘c4.8xlarge’ referred to the same computing hardware in 2015, and we do know that the current version of Geekbench gives substantially different answers to the one in use here. However we estimate that the hardware should be less than twice as good as it was, and Geekbench seems unlikely to underestimate performance by more than an order of magnitude.
This implies that a GFLOPShour costs $6.3 x 10-3 , or optimistically as little as $3.2 x 10-4 . This is much higher than a GPU, at $3.4 x 10-6 for a GFLOPShour, if we suppose the hardware is used over around three years. Amazon is probably not the cheapest provider of cloud computing, however the difference seems to be something like a factor of two,7 which is not enough to make cloud computing competitive with GPUs.
In sum, virtual computing appears to cost two to three orders of magnitude more than GPUs. This high price is presumably partly because there are non-hardware costs which we have not accounted for in the prices of buying hardware, but are naturally included in the cost of renting it. However it is unlikely that these additional costs make up a factor of one hundred to one thousand, so cloud computing does not seem competitive.
A top supercomputer can perform a GFLOPS for around $3, in 2017. (See Price performance trend in top supercomputers)
Tensor processing units (TPUs)
Tensor processing units appear to perform a GFLOPS for around $1, in February 2018. However it is unclear how this GFLOPS is measured, which makes it somewhat harder to compare (e.g. whether it is single precision or double precision). Such a high price is also at odds with rumors we have heard that TPUs are an especially cheap source of computing, so possibly TPUs are more efficient for a particular set of applications other than the ones where most of these machines have been measured.
In 2015, we estimated GPUs to cost around $3/GFLOPS, i.e. 10-100 times more than we would currently estimate. We do not believe that there has been nearly that much improvement in the past two years, so this discrepancy must be due to error and noise. We remain uncertain about the source of all of the difference, so until we resolve that question, it is plausible that our current GPU estimate errs. If so, the price should still be no higher than $3/GFLOPS (our previous estimate, and our current estimate for supercomputer prices).
The lowest estimated GFLOPS prices we know of are $0.03-$3/GFLOPS, for GPUs and TPUs.
This is a summary of all of the prices we found:
|Type of computer||Source||Type of performance||Current price ($/GFLOPS)||Comments|
|GPUs and Xeon Phi (single precision)||Wikipedia||Theoretical peak||.03-0.3||$0.03/GFLOPS is given, but is underestimate|
|GPUs and Xeon Phi (double precision)||Wikipedia||Theoretical peak||0.3-0.8||Upward sloping; probably not optimized for (in GPUs)|
|Cloud||Amazon EC2 and Geekbench||Empirical||158||Expensive so less relevant; shallow investigation|
|Supercomputing||Top500 and misc prices||Empirical||2.94||Expensive, so less relevant; shallow investigation|
|CPUs||Geekbench and misc prices||Empirical||0.71||Unreliable, 5x disagreements between Geekbench versions|
|TPUs||Google Cloud Platform Blog||Unclear||0.95|
- The Intel Xeon E5-2699 uses 527.8 watts and costs $5,190. The processor can be bought here for $5,190 as of April 1 2015. Its energy consumption is 527.8 watts under load, or 90.9 watts idle. Over three years, with $0.05/kWh this is $694, or 13% of the hardware cost.
Titan also uses 13% of its hardware costs in energy over three years. Titan cost about $4000 dollars per hour amortized over 3 years, and consumes about 10M watts, at a cost of $500 per hour (assuming $0.05 per kWh), which is also 13% of its hardware cost.
- For instance, this presentation (page ‘Results on a single node’) reports Linpack performance of 95% and 89% of DGEMM performance for their hardware in two tests.
- Wikipedia pages: Xeon Phi, List of Nvidia Graphics Processing Units, List of AMD Graphics Processing Units
Other sources are visible in the last column of our dataset (see ‘Wikipedia GeForce, Radeon, Phi simplified’ sheet)
- “Since it was originally based on an earlier GPU design by Intel, it shares application areas with GPUs. The main difference between Xeon Phi and a GPGPU like Nvidia Tesla is that Xeon Phi, with an x86-compatible core, can, with less modification, run software that was originally targeted at a standard x86 CPU.” – Wikipedia
- See ‘Geekbench 4 History’ tab
- Geekbench Browser allows users to measure performance in FLOPS using a variety of tasks. 97.5 is the multi-core DGEMM score a user reported for c4.8xlarge. We use a multi-core score because the cost cited is for purchasing all of the cores. On other tasks, Geekbench reports scores from 46 to 199 GFLOPS.
- We wrote in 2015: “Other sources of virtual computing seem to be similarly priced. An informal comparison of computing providers suggests that on a set of “real-world java benchmarks” three providers are quite closely comparable, with all between just above Amazon’s price and just under half Amazon’s price for completing the benchmarks, across different instance sizes. This analysis also suggests Amazon is a relatively costly provider…”