Line: 1 to 1 | ||||||||
---|---|---|---|---|---|---|---|---|
%DASHBOARD{ section="banner" | ||||||||
Line: 205 to 205 | ||||||||
d- ( r1 r2 -- r3 ) subtract r2 from r1, giving r3 x* ( r1 r2 -- r3 ) multiply r1 by r2 giving r3 x/ ( r1 r2 -- r3 ) divide r1 by r2, giving the quotient r3 | ||||||||
Added: | ||||||||
> > | ||||||||
x. ( r -- ) display, with a trailing space, the fixed-point number r x.n ( r n -- ) print a fixed-point number r with n fractional digits (truncated) x#S ( n1 -- n2 ) Adds 32 comma-digits to number output x# ( n1 -- n2 ) Adds one comma-digit to number output | ||||||||
Added: | ||||||||
> > | d>s
s>d
Words from fixpt-mat-lib.fs![]() | |||||||
sqrt ( r1 -- r2 ) r2 is the square root of r1 sin cos | ||||||||
Line: 234 to 241 | ||||||||
+inf -inf | ||||||||
Deleted: | ||||||||
< < | *) fixpt-mat-lib.fs | |||||||
Line: 250 to 256 | ||||||||
; 27k 100k f|| fm. 21.3k ok. | ||||||||
Added: | ||||||||
> > | 2.2n 47k f* fm. 103u ok. | |||||||
: f. ( r -- ) \ display, with a trailing space, the floating-point number r in fixed-point notation | ||||||||
Added: | ||||||||
> > | dup f0< if 45 emit fabs then dup | |||||||
$3F000000 \ .5 precision 0 do $41200000 f/ \ 10.0 / | ||||||||
Line: 279 to 291 | ||||||||
Performance Estimation | ||||||||
Changed: | ||||||||
< < | STM32WB @ 32 MHz | |||||||
> > | All measurements and calculation are based on the Cortex M4F MCU STM32WB55 @ 32 MHz.
Simple test program to estimate execution time of fsin and fsqrt : | |||||||
: test ( -- n ) \ test 1000 times sin return n in ms osKernelGetTickCount cr | ||||||||
Line: 296 to 310 | ||||||||
; | ||||||||
Changed: | ||||||||
< < | With fsin it takes about 7 ms, without about 1 ms. Therefore a fsin word takes about 6 us. | |||||||
> > | With fsin it takes about 7 ms, without about 1 ms for 1000 iterations. Therefore a fsin word takes about 6 us.
fsqrt takes also about 2 ms for 1000 iterations. Therefore a fsin word takes about 1 us or less. | |||||||
Basic operations like f/ are defined as inline. First check fsin and f/ with the builtin disassembler:
| ||||||||
Line: 336 to 352 | ||||||||
vdiv.f32 s0, s0, s1 14 vmov tos, s0 1 bx lr | ||||||||
Added: | ||||||||
> > | cycles 18
About 20 cycles (625 ns @ 32 MHz) for a division, 10 (300 ns) for multiplication, and 5 (150 ns) for +/-. vsqrt.f32 has 14 cycles. | |||||||
Added: | ||||||||
> > | include /fsr/fixpt-math-lib.fs ok. : test-fix ( -- n ) \ test 1000 times fixed-point sin return n in ms osKernelGetTickCount cr \ pi 2e f* 1000e f/ \ 2*pi/1000 360,0 1000,0 x/ cr 1000 0 do \ 2dup i 0 swap x* 2drop \ 2dup i 0 swap x* sin 2drop 2dup i 0 swap x* sqrt 2drop \ i . 2dup i 0 swap x* sin x. cr \ i . 2dup i 0 swap x* sqrt x. cr \ i . 2dup i 0 swap x* sin hex. hex. cr loop 2drop osKernelGetTickCount swap - ; test-fix . 323 | |||||||
Changed: | ||||||||
< < | About 20 cycles (625 ns @ 32 MHz) for a division, 10 (300 ns) for multiplication, and 5 (150 ns) for +/-. | |||||||
> > | With sqrt it takes about 323 ms, without about 6 ms. Therefore a sqrt word takes about 317 us, with FPU it takes less than 1 us. A simple multiplication about 6 us (FPU 300 ns).
Only addition and subtraction are comparable:
see d+ 080008B6: CF07 ldmia r7 { r0 r1 r2 } 1 080008B8: 1812 adds r2 r2 r0 1 080008BA: 414E adcs r6 r1 1 080008BC: 3F04 subs r7 #4 1 080008BE: 603A str r2 [ r7 #0 ] 1 080008C0: 4770 bx lr Cycles 5 | |||||||
Added: | ||||||||
> > | @ -----------------------------------------------------------------------------
Wortbirne Flag_foldable_2|Flag_inline, "f+"
f_add:
@ ( r1 r2 -- r3 ) Add r1 to r2 giving the sum r3.
@ -----------------------------------------------------------------------------
vmov s1, tos 1
drop 1
vmov s0, tos 1
vadd.f32 s0, s1 1
vmov tos, s0 1
bx lr
Cycles 5
ConclusionAs long as you do only elementary arithmetic, fixed- and floating-point have comparable execution time (but division and multiplication is a magnitude slower). But for more elaborate calculation (trigonomteric, exponential functions) the execution time is for fixed-point at least two magnitudes slower. If time is not an issue in either development or execution, you can easily do without the FPU. | |||||||
Added: | ||||||||
> > | ||||||||