[R6RS] comments on proposed bytes SRFI
Michael Sperber
sperber at informatik.uni-tuebingen.de
Fri Jun 16 13:32:44 EDT 2006
Many thanks for going over this!
William D Clinger <will at ccs.neu.edu> writes:
> [various suggestions, all good]
Done, hopefully. New copy appended for archival purposes.
> I can't tell whether bytes->u8-list and u8-list->bytes
> were misnamed or mis-specified.
Misspecified.
> Perhaps bytes->s8-list and s8-list->bytes should be added, although
> there isn't much need of them given the bytes->sint-list and
> sint-list->bytes. Perhaps bytes->u8-list and its
> inverse should just be flushed.
I did it because I expected those two to be the most common case by
two. I can't say I have a strong opinion, either on complementing
them with s8 operations, or flushing them. Do you?
--
Cheers =8-} Mike
Friede, Völkerverständigung und überhaupt blabla
Title
Bytes Objects
Authors
Michael Sperber
Abstract
This library defines a set of procedures for creating, accessing, and
manipulating byte-addressed blocks of binary data, in short, bytes objects. The
library provides access primitives for fixed-length integers of arbitrary size,
with specified endianness, and a choice of unsigned and two's complement
representations.
This library is a variation of SRFI 74. Compared to SRFI 74, this library uses
a different terminology: what SRFI 74 calls blob this library calls bytes.
Rationale
Many applications must deal with blocks of binary data by accessing them in
various ways---extracting signed or unsigned numbers of various sizes. Such an
application can use octet vectors as in SRFI 66 or any of the other types of
homogeneous vectors in SRFI 4, but these both only allow retrieving the binary
data of one type.
This is awkward in many situations, because an application might access
different kinds of entities from a single binary block. Even for uniform
blocks, the disjointness of the various vector data types in SRFI 4 means that,
say, an I/O API needs to provide an army of procedures for each of them in
order to provide efficient access to binary data.
Therefore, this library provides a single type for blocks of binary data with
multiple ways to access that data. It deals only with integers in various sizes
with specified endianness, because these are the most frequent applications.
Dealing with other kinds of binary data, such as floating-point numbers or
variable-size integers would be natural extensions, but are left for a separate
library.
Specification
General remarks
Bytes objects are objects of a disjoint type. Conceptually, a bytes object
represents a sequence of 8-bit bytes.
The length of a bytes object is the number of bytes it contains. This number is
fixed. A valid index into a bytes object is an exact, non-negative integer. The
first byte of a bytes object has index 0, the last byte has an index one less
than the length of the bytes object.
Generally, the access procedures come in different flavors according to the
size of the represented integer, and the endianness of the representation. The
procedures also distinguish signed and unsigned representations. The signed
representations all use two's complement.
Interface
(endianness big) (syntax)
(endianness little) (syntax)
(endianness big) and (endianness little) evaluate to the symbols big and
little, respectively. These symbols represent an endianness, and whenever
one of the procedures operating on bytes objects accepts an endianness as
an argument, that argument must be one of these symbols. If the operarand
to endianness is anything other than big or little, an expansion-time error
is signalled.
(native-endianness )
This procedure returns the endianness of the underlying machine
architecture, either (endianness big) or (endianness little)
(bytes? obj)
Returns #t if obj is a bytes object, otherwise returns #f.
(make-bytes k)
Returns a newly allocated bytes object of k bytes, all of them 0.
(bytes-length bytes)
Returns the number of bytes in bytes as an exact integer.
(bytes-u8-ref bytes k)
(bytes-s8-ref bytes k)
K must be a valid index of bytes.
Bytes-u8-ref returns the byte at index k of bytes.
Bytes-s8-ref returns the exact integer corresponding to the two's
complement representation at index k of bytes.
(bytes-u8-set! bytes k octet)
(bytes-s8-set! bytes k byte)
K must be a valid index of bytes.
Bytes-u8-set! stores octet in element k of bytes.
Octet, must be an exact integer in the interval {0, ..., 255}.
Byte, must be an exact integer in the interval {-128, ..., 127}.
Bytes-s8-set! stores the two's complement representation of byte in element
k of bytes.
Both procedures return the unspecified value.
(bytes-uint-ref bytes k endianness size)
(bytes-sint-ref bytes k endianness size)
(bytes-uint-set! bytes k n endianness size)
(bytes-sint-set! bytes k n endianness size)
Size must be a positive exact integer. K must be a valid index of bytes; so
must the indices {k, ..., k + size - 1}. Endianness must be an endianness
object.
Bytes-uint-ref retrieves the exact integer corresponding to the unsigned
representation of size size and specified by endianness at indices {k, ...,
k + size - 1}.
Bytes-sint-ref retrieves the exact integer corresponding to the two's
complement representation of size size and specified by endianness at
indices {k, ..., k + size - 1}.
For bytes-uint-set!, n must be an exact integer in the interval [0, (256^
size)-1]. Bytes-uint-set! stores the unsigned representation of size size
and specified by endianness into the bytes object at indices {k, ..., k +
size - 1}.
For bytes-uint-set!, n must be an exact integer in the interval [-256^
(size-1), (256^(size-1))-1]. Bytes-sint-set! stores the two's complement
representation of size size and specified by endianness into the bytes
object at indices {k, ..., k + size - 1}.
The ...-set! procedures return the unspecified value.
(bytes-u16-ref bytes k endianness)
(bytes-s16-ref bytes k endianness)
(bytes-u16-native-ref bytes k)
(bytes-s16-native-ref bytes k)
(bytes-u16-set! bytes k n endianness)
(bytes-s16-set! bytes k n endianness)
(bytes-u16-native-set! bytes k n)
(bytes-s16-native-set! bytes k n)
K must be a valid index of bytes; so must the index k+ 1. Endianness must
be an endianness object.
These retrieve and set two-byte representations of numbers at indices k and
k+1, according to the endianness specified by endianness. The procedures
with u16 in their names deal with the unsigned representation, those with
s16 with the two's complement representation.
The procedures with native in their names employ the native endianness, and
only work at aligned indices: k must be a multiple of 2. It is an error to
use them at non-aligned indices.
The ...-set! procedures return the unspecified value.
(bytes-u32-ref bytes k endianness)
(bytes-s32-ref bytes k endianness)
(bytes-u32-native-ref bytes k)
(bytes-s32-native-ref bytes k)
(bytes-u32-set! bytes k n endianness)
(bytes-s32-set! bytes k n endianness)
(bytes-u32-native-set! bytes k n)
(bytes-s32-native-set! bytes k n)
K must be a valid index of bytes; so must the indices {k, ..., k+ 3}.
Endianness must be an endianness object.
These retrieve and set four-byte representations of numbers at indices {k,
..., k+ 3}, according to the endianness specified by endianness. The
procedures with u32 in their names deal with the unsigned representation,
those with s32 with the two's complement representation.
The procedures with native in their names employ the native endianness, and
only work at aligned indices: k must be a multiple of 4. It is an error to
use them at non-aligned indices.
The ...-set! procedures return the unspecified value.
(bytes-u64-ref bytes k endianness)
(bytes-s64-ref bytes k endianness)
(bytes-u64-native-ref bytes k)
(bytes-s64-native-ref bytes k)
(bytes-u64-set! bytes k n endianness)
(bytes-s64-set! bytes k n endianness)
(bytes-u64-native-set! bytes k n)
(bytes-s64-native-set! bytes k n)
K must be a valid index of bytes; so must the indices {k, ..., k+ 7}.
Endianness must be an endianness object.
These retrieve and set eight-byte representations of numbers at indices {k,
..., k+ 7}, according to the endianness specified by endianness. The
procedures with u64 in their names deal with the unsigned representation,
those with s64 with the two's complement representation.
The procedures with native in their names employ the native endianness, and
only work at aligned indices: k must be a multiple of 8. It is an error to
use them at non-aligned indices.
The ...-set! procedures return the unspecified value.
(bytes=? bytes-1 bytes-2)
Returns #t if bytes-1 and bytes-2 are equal---that is, if they have the
same length and equal bytes at all valid indices. It returns #f otherwise.
(bytes-copy! source source-start target target-start n)
Copies data from bytes source to bytes target. Source-start, target-start,
and n must be non-negative exact integers that satisfy
0 <= source-start <= source-start + n <= (bytes-length source)
0 <= target-start <= target-start + n <= (bytes-length target)
This copies the bytes from source at indices [source-start, source-start +
n) to consecutive indices in target starting at target-index.
This must work even if the memory regions for the source and the target
overlap, i.e., the bytes at the target location after the copy must be
equal to the bytes at the source location before the copy.
This returns the unspecified value.
(bytes-copy bytes)
Returns a newly allocated copy of bytes object bytes.
(bytes->u8-list bytes)
(u8-list->bytes list)
bytes->u8-listreturns a newly allocated list of the bytes of bytes in the
same order.
U8-list->bytes returns a newly allocated bytes object whose elements are
the elements of list list, which must all be octets, in the same order.
Analogous to list->vector.
(bytes->uint-list bytes endianness size)
(bytes->sint-list bytes endianness size)
(uint-list->bytes list endianness size)
(sint-list->bytes list endianness size)
Size must be a positive exact integer. Endianness must be an endianness
object.
These convert between lists of integers and their consecutive
representations according to size and endianness in bytes objects in the
same way as bytes->u8-list, bytes->s8-list, u8-list->bytes, and s8-list->
bytes do for one-byte representations.
Reference Implementation
This reference implementation makes use of SRFI 23 (Error reporting mechanism),
SRFI 26 (Notation for Specializing Parameters without Currying), SRFI 60
(Integers as Bits), and SRFI 66 (Octet Vectors) .
Examples
The test suite doubles as a source of examples.
References
* SRFI 4 (Homogeneous numeric vector datatypes)
* SRFI 56 (Binary I/O)
* SRFI 66 (Octet Vectors)
* SRFI 74 (Octet-Addressed Binary Blocks)
More information about the R6RS
mailing list