Difference: FloatingPointUnit (31 vs. 32)

Revision 322022-11-16 - PeterSchmid

Line: 1 to 1
 
META TOPICPARENT name="WebHome"
%DASHBOARD{ section="banner"
Line: 274 to 274
 

Added:
>
>

Performance Estimation

STM32WB @ 32 MHz

: test ( -- n ) \ test 1000 times sin return n in ms
  osKernelGetTickCount  cr
  pi 2e f* 1000e f/  \ 2*pi/1000
  cr
  1000 0 do
    dup i s>f f*      drop
\    dup i s>f f* fsin drop
\    i .  dup i s>f f* fsin fs.   cr
\    i .  dup i s>f f* fsin hex.  cr
  loop
  drop
  osKernelGetTickCount swap -
;

With fsin it takes about 7 ms, without about 1 ms. Therefore a fsin word takes about 6 us.

Basic operations like f/ are defined as inline. First check fsin and f/ with the builtin disassembler:

see fsin
08007BE8: B500  push { lr }
08007BEA: 4630  mov r0 r6
08007BEC: F025  bl  0802D694
08007BEE: FD52
08007BF0: 4606  mov r6 r0
08007BF2: BD00  pop { pc }
 ok.
The FPU instructions are unknown to the disassembler
see f/
0800745A: EE00
0800745C: 6A90
0800745E: CF40  ldmia r7 { r6 }
08007460: EE00
08007462: 6A10
08007464: EE80
08007466: 0A20
08007468: EE10
0800746A: 6A10
0800746C: 4770  bx lr
From fpu.s on GitHub
@ -----------------------------------------------------------------------------
        Wortbirne Flag_foldable_2|Flag_inline, "f/"
f_slash:
        @ ( r1 r2 -- r3 ) Divide r1 by r2, giving the quotient r3.
@ -----------------------------------------------------------------------------
	vmov 	s1, tos                      1
	drop               ldmia r7 { r6 }   1
	vmov 	s0, tos                      1
	vdiv.f32 s0, s0, s1                  14
	vmov 	tos, s0                      1
	bx 		lr
 
About 20 cycles (625 ns @ 32 MHz) for a division, 10 (300 ns) for multiplication, and 5 (150 ns) for +/-.

 
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2025 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback