Documenting some various GPU hardware
Name | TFLOPs single precision | TFLOPS tensor perf (FP16) | TFLOPS (FP16-Sparse) | Tensor cores | CUDA cores | RAM |
RTX 3080Ti | 34.1 | 136 | 273 | 320 | 10,240 | 12 GB |
V100 (specs) | 14 | 112 | 640 | 5,120 | 16 GB | |
RTX 3070 | 20.31 | 184 | 5,888 | 8 GB | ||
GTX 1080 | 8.8 | 2,560 | 8 GB | |||
PS5 | 10.3 | |||||
Xbox X | 12.1 | |||||
PS4 | 1.8 | |||||
Xbox One | 1.4 |
Note that the V100 is used in the AWS p3.2xlarge instance type. The V100 numbers are in general smaller than the 3080Ti, and with the WSL2 tensorflow 2.12 libraries, the 3080Ti out-performs the V100 on the 50,000 epoch test 736 seconds to 928 seconds – here the 3080Ti is 26% faster.) (Caveat – extremely small test set – only my ml-style-transfer code.)
(Using the “Windows Native tensorflow 2.11” libraries, the V100 out-performed the 3080Ti on the 50,000 epoch test 928 seconds to 1063 seconds – here the V100 is 12% faster).
It looks like the p3.2xlarge has been around since late 2017. It started at $3.06/hour, and is still the same price today (2023/Apr). The V100 prices seems to have dropped from $6,000 in 2019 to $3,500 today.
Node Replacement Factor (NRF) – nvidia documentation