flops: The abbreviation of floating point operations (s table plural) refers to the number of floating point operations, which is understood as the amount of computation. Can be used:Measure the complexity of the algorithmic modelThe higher the value, the higher the computational complexity of the network. FLOPS doesn't take into account a few important factors that have a considerable impact on speed – MAC (Memory Access Cost) Parallelism Platform Convolutional Layerflops: Floating point operations per second, which refers to the number of floating point operations per second, understood as the speed of computation. is oneA metric to measure the performance of your hardware。There are other units:
One MFLOPS (megaflops) is equal to one million (= 10 6) floating-point operations per second, one GFLOPS (gigaflops) is equal to one billion (= 10 9) floating-point operations per second, and one TFLOPS (teraflops) is equal to one trillion (= 10 12) floating-point operations per second, (1 terala).
One PFLOPS (petaflops) is equal to one quadrillion (= 10 15) floating-point operations per second, one EFLOPS (exaflops) is equal to one hundred quintals (= 10 18) floating-point operations per second, and one zflops (zettaflops) is equal to 100,000 quintos (= 10 21) floating-point operations per second.
GFLOPS is one of the more popular units in ***. gflops(giga floating-point operations per second)The number of floating-point operations that can be performed per second is a measure of the floating-point arithmetic power of a computing device. Equivalent to computing speed. The higher the value, the higher the hardware performance and the faster the speed. The magnitude is usually in the range of m(10e6), g(10e9), t(10e12). For example: 96 gflops means 9 per second6 g floating-point operations (9.6 billion floating-point operations). Limitations of FLOPS: FLOPS does not fully reflect the computing performance of the hardware, because FLOPS does not reflect many factors that affect execution performance. For example, the performance of the IO, the architecture of the memory, the cache coherence, 、...Wait. (multiply accumulate operations) is often confused with the concept of flops, in fact 1macs consists of a multiplicative operation and an addition operation, which contains approximately 2flops. Usually MACS has a 2x relationship with FLOPS. Abbreviation for the number of fixed-point multiplication and accumulation operations performed per second, it is a measure of the computer's fixed-point processing power, and this amount is often used in those calculations that require a large number of fixed-point multiplication and accumulation operations, and is recorded as MACS. One GMACS is equal to 1 billion (=10 9) times per second, the full name of multiply accumulation operations is Multiply Accumulate Operations, that is, multiply and add operations, 1MACS contains a multiplication operation and an addition operation, which contains about 2flops. Usually MACS has a 2x relationship with FLOPS. MACS and MADDS mean the same thing. Assuming that a simple CNN network is used to do forward, the utilization of hardware resources is called utilization, so the calculation of utilization is simply described as the amount of computing of the network, which is usually the number of times multiplied and accumulated.
Measure the time it takes for the network to run.
Multiply the number of accumulations divided by the time it takes to calculate the GFLOPS of the network
Divide the computed network GFLOPS by the theoretical GFLOPS of hardware resources, i.e., utilization.
Convolution operations account for more than 90% of the network forward computation. Focus on how to calculate the amount of computation of convolution. To simplify the problem, the following discussion argues that convolution uses a sliding window and ignores the overhead of nonlinear calculations. Suppose that for a CNN network, the parameters of the convolutional layer include: Cin of the input feature map, wide Hin, high Win, Cout of the output feature map, wide Hout, high Wout, size k of the convolution kernel, convolutional kernel channel equal to cin, and number of convolution kernels equal to Cout. Then the amount of convolution between the convolution kernel and the feature map is as follows
Where 1 represents the offset. The bias values correspond to 1 for each convolution kernel, and there are a total of couts. (wx+b) divides flops by 10e9 to get gflops.