Teraflops Comparison

Documenting some various GPU hardware

NameTFLOPs
single precision
TFLOPS
tensor perf (FP16)
TFLOPS
(FP16-Sparse)
Tensor
cores
CUDA coresRAM
RTX 3080Ti34.113627332010,24012 GB
V100
(specs)
141126405,12016 GB
RTX
3070
20.311845,8888 GB
GTX 10808.82,5608 GB
PS510.3
Xbox X12.1
PS41.8
Xbox One1.4

Note that the V100 is used in the AWS p3.2xlarge instance type. The V100 numbers are in general smaller than the 3080Ti, and with the WSL2 tensorflow 2.12 libraries, the 3080Ti out-performs the V100 on the 50,000 epoch test 736 seconds to 928 seconds – here the 3080Ti is 26% faster.) (Caveat – extremely small test set – only my ml-style-transfer code.)

(Using the “Windows Native tensorflow 2.11” libraries, the V100 out-performed the 3080Ti on the 50,000 epoch test 928 seconds to 1063 seconds – here the V100 is 12% faster).

It looks like the p3.2xlarge has been around since late 2017. It started at $3.06/hour, and is still the same price today (2023/Apr). The V100 prices seems to have dropped from $6,000 in 2019 to $3,500 today.

Node Replacement Factor (NRF) – nvidia documentation

This entry was posted in Uncategorized. Bookmark the permalink.