[r6rs-discuss] [Formal] NaN is not a real number.
I am posting this as an individual member of the Scheme
community. I am not speaking for the R6RS editors, and
this message should not be confused with the editors'
eventual formal response.
John Cowan wrote:
> This argument is quite sound theoretically, but will have unfortunate
> performance consequences.
Indeed. In addition to the inefficiency cited by John,
the proposal would destroy the closure property of the
fl procedures on which flow analysis depends. The need
for this closure property was explained in SRFI 77, but
here is a concrete example, taken from the inner loops
of an FFT:
(let loop4 ((wr 1.0) (wi 0.0) (m 0))
(define (flvector-ref v i)
(real->flonum (bytes-ieee-double-native-ref v i)))
(define (flvector-set! v i x)
(bytes-ieee-double-native-set! v i x))
(if (< m mmax)
(begin
(let loop5 ((i m))
(if (< i n)
(let* ((j (+ i mmax))
(tempr (fl-
(fl* wr (flvector-ref data j))
(fl* wi (flvector-ref data (+ j 1)))))
(tempi (fl+
(fl* wr (flvector-ref data (+ j 1)))
(fl* wi (flvector-ref data j)))))
(flvector-set! data j
(fl- (flvector-ref data i) tempr))
(flvector-set! data (+ j 1)
(fl- (flvector-ref data (+ i 1)) tempi))
(flvector-set! data i
(fl+ (flvector-ref data i) tempr))
(flvector-set! data (+ i 1)
(fl+ (flvector-ref data (+ i 1)) tempi))
(loop5 (+ j mmax)))
(loop4 (fl+ (fl- (fl* wr wpr) (fl* wi wpi)) wr)
(fl+ (fl+ (fl* wi wpr) (fl* wr wpi)) wi)
(+ m 2)))))))
With the semantics described in the 5.91 draft, it is
easy for a compiler to generate efficient machine code
for the above. By efficient, I mean that none of the
inexact reals will be heap-allocated, and all 18 of
the calls to fl+, fl-, and fl* will be compiled into
a single machine instruction.
If the proposed change were adopted, so NaNs would no
longer be considered real, then all intermediate values
would probably have to be heap-allocated, and all 18
calls would have to check both their arguments and their
results. On the Sparc, each call would require about 27
instructions (as estimated from a similar path for generic
arithmetic in Larceny), and there would be more load and
control dependencies, so the inner loops would probably
run at least 30 times as slow.
Interpreters seldom perform any flow analysis, so their
implementations of the fl+, fl-, and fl* procedures will
probably perform full checking in all cases, which means
that the proposed change would have negligible effect on
the performance of interpreted systems.
In other words, the true cost of the proposed inefficiency
cannot be estimated by benchmarking interpreters.
Will
Received on Fri Sep 22 2006 - 13:43:56 UTC
This archive was generated by hypermail 2.3.0
: Wed Oct 23 2024 - 09:15:01 UTC