A comparison of high-end GTX and Tesla GPUs for double-performance computing
Many of our customers have discovered that high-end NVIDIA GTX cards, such as the GTX Titan, offer a relatively low-cost alternative to NVIDIA Tesla cards such as the K20 or K40. Like the Teslas, the Titans offer double-precision calculations with comparable TFLOPS and number of CUDA cores. Unlike the Tesla cards, a Titan costs around $1,000, while a Tesla card can cost three or four times more.
Take the new GTX Titan Black, for example, as compared to the Tesla K20, K20X and K40. The Titan Black card offers approximately 2,880 CUDA cores, more than the K20 and K20x at 2,496 and 2,688 cores respectively, and exactly the same count as the K40. This new Titan card also provides approximately 1.3 peak double-precision TFLOPS, which is more than the K20's 1.17 TFLOPS, nearly identical to the K20X's 1.31 TFLOPS and slightly under the K40's 1.43 TFLOPS. The Titan Black's 6GB of GDDR5 memory is also comparable to the 5GB of the K20 and 6GB of the K20X, although it does fall somewhat short of the K40's 12GB.
But all in all, the Titan sits comfortably in the middle of the specs of the various Tesla cards. So, our customers often ask, why purchase a Tesla at all? Why not just use one of these high-end GTX Titan cards for all cluster development and deployment needs?
The answer to this question is threefold. First, and most importantly, the high-end GTX GPUs like the Tesla do not use ECC (error checking and correction) memory. ECC memory includes extra memory bits designed to detect and fix memory errors, which is of paramount importance to the successful completion of high performance, double-precision code. ECC memory ensures that the results of computations run on a Tesla are the same every time; the same tasks run on a high-end GTX card like the Titan can vary from job to job. Clearly, for scientific computing, the Tesla offers the best consistency.
Secondly, since Titans are not designed for constant high performance computing in a dense environment, we've seen a much shorter longevity with these cards when used for this purpose as compared to the Teslas. One of the reasons for this is that Titans use active cooling (with fans) as opposed to the passively cooled (no fans) Teslas; when installed in a standard rack-mount server case, the small space causes the Titan's fans to overwork, thus increasing the likelihood of their failure and the subsequent overheating of the Titan card. If several Titans fail, then the overall cost paid is equivalent to simply purchasing a Tesla at the outset. Additionally, NVIDIA provides full support, bug fixes and feature requests for high performance computing with all varieties of Tesla, whereas the Titan cards are supported only by their third-party manufacturers.
Lastly, Tesla cards are optimized for cluster usage, including full support for InfiniBand and RDMA (remote direct memory access) to allow for high-throughput and low-latency inter-node communication. They also include built-in tools for GPU and cluster management. If a developer's intent is to use CUDA to design a program for use in a cluster, a Tesla will best support this effort.
Our opinion at this point is that while Tesla cards should be considered for any high performance computing tasks that require consistency and accuracy, GTX Titans have their place as well. Titan cards are certainly valuable for initial development efforts, when budgets are often smaller and a code's success is indeterminate. A Titan can be placed in a workstation, using a chassis large enough to facilitate the Titan's fan cooling, and used to test code; if and when the code reaches the deployment stage, a Tesla can then be employed. Please consult your Advanced Clustering sales professional at
for more information and how to get one for your own use.