Difference: FloatingPointUnit (32 vs. 33)

Revision 332022-11-17 - PeterSchmid

Line: 1 to 1

META TOPICPARENT	name="WebHome"

%DASHBOARD{ section="banner"

Line: 205 to 205

d- ( r1 r2 -- r3 ) subtract r2 from r1, giving r3 x* ( r1 r2 -- r3 ) multiply r1 by r2 giving r3 x/ ( r1 r2 -- r3 ) divide r1 by r2, giving the quotient r3

Added:

>
>

x. ( r -- ) display, with a trailing space, the fixed-point number r x.n ( r n -- ) print a fixed-point number r with n fractional digits (truncated) x#S ( n1 -- n2 ) Adds 32 comma-digits to number output x# ( n1 -- n2 ) Adds one comma-digit to number output

Added:

>
>

d>s s>d

Words from fixpt-mat-lib.fs

sqrt ( r1 -- r2 ) r2 is the square root of r1 sin cos

Line: 234 to 241

+inf -inf

Deleted:

<
< *) fixpt-mat-lib.fs

Line: 250 to 256

;

27k 100k f|| fm. 21.3k ok.

Added:

>
> 2.2n 47k f* fm. 103u ok.

: f. ( r -- )  \ display, with a trailing space, the floating-point number r in fixed-point notation

Added:

>
> dup f0< if 45 emit fabs then dup

$3F000000 \ .5 precision 0 do $41200000 f/ \ 10.0 /

Line: 279 to 291

Performance Estimation

Changed:

<
< STM32WB @ 32 MHz

>
>

All measurements and calculation are based on the Cortex M4F MCU STM32WB55 @ 32 MHz.

Simple test program to estimate execution time of fsin and fsqrt:

: test ( -- n ) \ test 1000 times sin return n in ms
  osKernelGetTickCount  cr

Line: 296 to 310

;

Changed:

<
< With fsin it takes about 7 ms, without about 1 ms. Therefore a fsin word takes about 6 us.

>
>

With fsin it takes about 7 ms, without about 1 ms for 1000 iterations. Therefore a fsin word takes about 6 us.

fsqrt takes also about 2 ms for 1000 iterations. Therefore a fsin word takes about 1 us or less.

Basic operations like f/ are defined as inline. First check fsin and f/ with the builtin disassembler:

Line: 336 to 352

vdiv.f32 s0, s0, s1 14 vmov tos, s0 1 bx lr

Added:

>
> cycles 18 About 20 cycles (625 ns @ 32 MHz) for a division, 10 (300 ns) for multiplication, and 5 (150 ns) for +/-. vsqrt.f32 has 14 cycles.

Added:

>
>

include /fsr/fixpt-math-lib.fs  ok.

: test-fix ( -- n ) \ test 1000 times fixed-point sin return n in ms
  osKernelGetTickCount  cr
\ pi 2e f* 1000e f/  \ 2*pi/1000
  360,0 1000,0 x/
  cr
  1000 0 do
\   2dup i 0 swap x*      2drop
\   2dup i 0 swap x* sin  2drop
    2dup i 0 swap x* sqrt  2drop
\   i .  2dup i 0 swap x* sin x.   cr
\   i .  2dup i 0 swap x* sqrt x.   cr
\   i .  2dup i 0 swap x* sin hex. hex. cr
  loop
  2drop
  osKernelGetTickCount swap -
;
test-fix .

323

Changed:

<
< About 20 cycles (625 ns @ 32 MHz) for a division, 10 (300 ns) for multiplication, and 5 (150 ns) for +/-.

>
>

With sqrt it takes about 323 ms, without about 6 ms. Therefore a sqrt word takes about 317 us, with FPU it takes less than 1 us. A simple multiplication about 6 us (FPU 300 ns).

Only addition and subtraction are comparable:

see d+
080008B6: CF07  ldmia r7 { r0  r1  r2 }   1
080008B8: 1812  adds r2 r2 r0             1
080008BA: 414E  adcs r6 r1                1
080008BC: 3F04  subs r7 #4                1
080008BE: 603A  str r2 [ r7 #0 ]          1
080008C0: 4770  bx lr
                                   Cycles 5

Added:

>
>

@ ----------------------------------------------------------------------------- Wortbirne Flag_foldable_2|Flag_inline, "f+" f_add: @ ( r1 r2 -- r3 ) Add r1 to r2 giving the sum r3. @ ----------------------------------------------------------------------------- vmov s1, tos 1 drop 1 vmov s0, tos 1 vadd.f32 s0, s1 1 vmov tos, s0 1 bx lr Cycles 5

Conclusion

As long as you do only elementary arithmetic, fixed- and floating-point have comparable execution time (but division and multiplication is a magnitude slower). But for more elaborate calculation (trigonomteric, exponential functions) the execution time is for fixed-point at least two magnitudes slower.

If time is not an issue in either development or execution, you can easily do without the FPU.

Added:

>
>

View topic | History: r44 < r43 < r42 < r41 | More topic actions...