New features of NVIDIA's Tesla K20 GPU promise fast, efficient performance
NVIDIA is gearing up for the release of its new Tesla K20 GPU, which should start shipping at the end of this year. Unlike its currently available K10, which is designed for single-precision computing, the K20 is optimized for parallel-computing with two new features: Hyper-Q and Dynamic Parallelism.
The Hyper-Q feature allows the GPU to tackle up to 32 message passing interface (MPI) processes simultanously, as opposed to earlier GPU models that could only handle one MPI process at a time. In recent tests, NVIDIA engineers saw a 2.5 times increase in speed with Hyper-Q while running molecular simulation code.
Without this new feature, MPI processing wasn't much faster than with CPUs alone, and the GPU was vastly underutilized. With Hyper-Q turned on, the GPU works at a much higher capacity.
The second new feature available in the K20, Dynamic Parallelism, allows the GPU to distribute work among its cores rather than sending requests back to the CPU each time subsequent calculations need to be made. The GPU launches new threads as needed on its own, thus freeing the CPUs to perform other work.
With no need to continuously transfer data between the GPU and CPU, NVIDIA testing shows that performance doubles. CUDA code can be reduced to half the size since the constant GPU-CPU communication is reduced and doesn't have to be managed as tightly.
For more information on Hyper-Q and Dynamic Parallelism, along with performance graphs and more detailed examples, please visit NVIDIA's blog posts on these subjects:
If you have any questions about this up-and-coming product, please contact us at