[R6RS] Timeline for R6RS SRFIs
Marc Feeley
feeley
Fri Jun 3 11:34:48 EDT 2005
On 3-Jun-05, at 10:41 AM, Manuel Serrano wrote:
> Marc wrote,
>
>
>>> As for port creation procedures, no new procedures are required. A
>>> port (as created by open-input-file, etc) would be viewed as a
>>> stream
>>> of octets and read-u8 and write-u8 would impose a character encoding
>>> on that stream of octets (either in an implementation defined way or
>>>
>>>
>>
>> Sorry, I meant read-char and write-char impose a character
>> encoding on
>> the octet stream.
>>
> Could you elaborate on that. I don't really understand what you
> mean. Sorry.
>
> --
> Manuel
>
What I mean is that all R6RS ports (at this point this means ports
attached to files) are conceptually a stream of octets. The stream
of octets can be read with the procedure read-u8. Now the procedure
read-char can be implemented in terms of read-u8 like this:
(define (read-char . other) ; implements latin1 encoding
(let ((port (if (null? other) (current-input-port) (car other))))
(let ((n (read-u8 port)))
(if (eof-object? n)
n
(integer->char n)))))
or like this
(define (read-char . other) ; implements utf8 encoding
(let ((port (if (null? other) (current-input-port) (car other))))
(let ((a (read-u8 port)))
(if (eof-object? a)
a
(cond ((<= a #x7f)
(integer->char a))
((<= a #xbf)
(let ((b (read-u8 port)))
(if (or (eof-object? b) (>= b #x80))
(error "invalid utf8 encoding")
(integer->char (+ (* 128 (modulo a 64))
(modulo b 128))))))
...etc)))))
The encoding of characters which read-char and write-char use could be
0) implementation defined
1) implementation defined but the same for all file ports
2) specified by R6RS, for example utf8 (and thus the same on all
R6RS Scheme implementations)
3) optionally specified in the call to open-input-file, with-input-
from-file, etc
for example: (open-input-file '(path: "foo" char-encoding: utf8))
[if not specified it would be like one of the other options]
My preference is for option 3 with a default to option 1, but if that
is controversial I can accept any of the other options (after all
even option 0 conforms to R5RS).
What I specifically don't want is extending the set of port creation
procedures with names that indicate the character encoding or the
fact that it is for binary I/O, i.e. open-binary-input-file, open-
utf8-input-file, etc. This is the wrong way to generalize the port
creation procedures (how would the names be extended to indicate the
end-of-line encoding? the buffering? etc). In fact if we agree on
option 3, I would suggest adding a "direction" setting and an "open-
file" procedure so that:
(open-input-file "foo") = (open-file '(path: "foo" direction:
input))
(open-output-file "foo") = (open-file '(path: "foo" direction:
output))
But we should keep the open-input-file and open-output-file
procedures for backward-compatibility, and because the direction is a
fundamental setting of a port that you always need to specify.
Marc
More information about the R6RS
mailing list