美文网首页
how to find out the maximum FLOP

how to find out the maximum FLOP

作者: 小怪兽狂殴奥特曼 | 来源:发表于2024-06-19 11:49 被阅读0次

Maximum FLOPS

the maximum FLOP of a GPU can be found out by the following fomula:

maximum_flop = CUDA_core_number * clock_speed *2

let's take RTX3070 as example.
RTX3070 has two types of clock speed:
base clock speed: 1500MHz
boost clock speed: 1725 MHz

and RTX3070 has 5888 cuda core.

for single-precision float32, its maximum_flop = 1725 MHz * 5888 * 2 = 20.32T FLOP/s

why multiply by 2?
CUDA core can perform two floating point operations in each clock cycle. specifically, CUDA core can perform one fused multiply-add(FMA) operations and one addition.

Maximum Bandwidth

the maximum Bandwidth of a GPU can be found out by the following fomula:

maximum_bandwidth = memory_clock_speed * memory_interface_width / 8

RTX3070 has following specification:

  • memory clock speed:14Gbps
  • memory_interface_width: 256 bit

then maximum_bandwidth = 14 Gbps * 256 / 8 = 448GB/s

Computing Memory Ratio算存比

computing_memory_ratio = max_flop / max_bandwidth

for rtx3070 on dealing with 32-bit floating point, its computing_memory_ratio = (20.32 T FLOP) / (0.448 TB / 4) = 181.4
which means for each memory accessing, we can perform 181 computing operations.
any operation with computing_memory_ratio exceeds 181.4 is a computing-bound operation, otherwise is a memory-bound operation
refer:CUDA: From Correctness to Performance

相关文章

网友评论

      本文标题:how to find out the maximum FLOP

      本文链接:https://www.haomeiwen.com/subject/kiuqcjtx.html