Teraflops Comparison

Documenting some various GPU hardware

Name	TFLOPs single precision	TFLOPS tensor perf (FP16)	TFLOPS (FP16-Sparse)	Tensor cores	CUDA cores	RAM
RTX 3080Ti	34.1	136	273	320	10,240	12 GB
V100 (specs)	14	112		640	5,120	16 GB
RTX 3070	20.31			184	5,888	8 GB
GTX 1080	8.8				2,560	8 GB
PS5	10.3
Xbox X	12.1
PS4	1.8
Xbox One	1.4

Note that the V100 is used in the AWS p3.2xlarge instance type. The V100 numbers are in general smaller than the 3080Ti, and with the WSL2 tensorflow 2.12 libraries, the 3080Ti out-performs the V100 on the 50,000 epoch test 736 seconds to 928 seconds – here the 3080Ti is 26% faster.) (Caveat – extremely small test set – only my ml-style-transfer code.)

(Using the “Windows Native tensorflow 2.11” libraries, the V100 out-performed the 3080Ti on the 50,000 epoch test 928 seconds to 1063 seconds – here the V100 is 12% faster).

It looks like the p3.2xlarge has been around since late 2017. It started at $3.06/hour, and is still the same price today (2023/Apr). The V100 prices seems to have dropped from $6,000 in 2019 to $3,500 today.

Node Replacement Factor (NRF) – nvidia documentation

This entry was posted in Uncategorized. Bookmark the permalink.

Teraflops Comparison

Recent Posts

Recent Comments

Archives

Categories

Meta