In this post, I will introduce the thoughputs and compute capabilities on NVIDIA’s GPUs. The post doesn’t contain hardware details.
It might be a common sense that half precision floats will run faster on GPUs, like this post by Intel.
However, it is a different story on NVIDIA’s GPUs. For example, you may find that the GeForce 10 series have high GFlops using single precision floats, but poor GFlops using half floats.
But GeForce 20 series increase the performance of half floats.
Actually compute capabilities defines thoughputs. This version numbers of compute capabilities identify the features supported by the GPU hardware and are used by applications at runtime to determine which hardware features and/or instructions are available on the present GPUs.
See more details in this official doc.