[R6RS] Bytes objects
Michael Sperber
sperber at informatik.uni-tuebingen.de
Thu May 11 11:19:26 EDT 2006
I've changed the "blob" SRFI into a draft for "bytes objects." It's
in Subversion in draft/bytes; a text version is attached for archival
purposes. Please comment.
--
Cheers =8-} Mike
Friede, Völkerverständigung und überhaupt blabla
Title
Bytes Objects
Authors
Michael Sperber
Abstract
This library defines a set of procedures for creating, accessing, and
manipulating byte-addressed blocks of binary data, in short, bytes
objects. The library provides access primitives for fixed-length integers
of arbitrary size, with specified endianness, and a choice of unsigned and
two's complement representations.
This library is a variation of SRFI 74. Compared to SRFI 74, this library
uses a different terminology: what SRFI 74 calls blob this library calls
bytes.
Rationale
Many applications must deal with blocks of binary data by accessing them
in various ways---extracting signed or unsigned numbers of various sizes.
Such an application can use octet vectors as in SRFI 66 or any of the
other types of homogeneous vectors in SRFI 4, but these both only allow
retrieving the binary data of one type.
This is awkward in many situations, because an application might access
different kinds of entities from a single binary block. Even for uniform
blocks, the disjointness of the various vector data types in SRFI 4 means
that, say, an I/O API needs to provide an army of procedures for each of
them in order to provide efficient access to binary data.
Therefore, this library provides a single type for blocks of binary data
with multiple ways to access that data. It deals only with integers in
various sizes with specified endianness, because these are the most
frequent applications. Dealing with other kinds of binary data, such as
floating-point numbers or variable-size integers would be natural
extensions, but are left for a separate library.
Specification
General remarks
Bytes objects are objects of a disjoint type. Conceptually, a bytes object
represents a sequence of bytes.
The length of a bytes object is the number of bytes it contains. This
number is fixed. A valid index into a bytes object is an exact,
non-negative integer. The first byte of a bytes object has index 0, the
last byte has an index one less than the length of the bytes object.
Generally, the access procedures come in different flavors according to
the size of the represented integer, and the endianness of the
representation. The procedures also distinguish signed and unsigned
representations. The signed representations all use two's complement.
For procedures that have no "natural" return value, this description often
uses the sentence
The return values are unspecified.
This means that number of return values and the return values are
unspecified. However, the number of return values is such that it is
accepted by a continuation created by begin. Specifically, on Scheme
implementations where continuations created by begin accept an arbitrary
number of arguments (this includes most implementations), it is suggested
that the procedure return zero return values.
Interface
(endianness big) (syntax)
(endianness little) (syntax)
(endianness big) and (endianness little) evaluate to the symbols
big and little, respectively. These symbols represent an
endianness, and whenever one of the procedures operating on bytes
objects accepts an endianness as an argument, that argument must
be one of these symbols. If the operarand to endianness is
anything other than big or little, an expansion-time error is
signalled.
(native-endianness )
This procedure returns the endianness of the underlying machine
architecture, either (endianness big) or (endianness little)
(bytes? obj)
Returns #t if obj is a bytes object, otherwise returns #f.
(make-bytes k)
Returns a newly allocated bytes object of k bytes, all of them 0.
(bytes-length bytes)
Returns the number of bytes in bytes as an exact integer.
(bytes-u8-ref bytes k)
(bytes-s8-ref bytes k)
K must be a valid index of bytes.
Bytes-u8-ref returns the byte at index k of bytes.
Bytes-s8-ref returns the exact integer corresponding to the two's
complement representation at index k of bytes.
(bytes-u8-set! bytes k octet)
(bytes-s8-set! bytes k byte)
K must be a valid index of bytes.
Bytes-u8-set! stores octet in element k of bytes.
Byte, must be an exact integer in the interval {-128, ..., 127}.
Bytes-u8-set! stores the two's complement representation of byte
in element k of bytes.
The return values are unspecified.
(bytes-uint-ref size endianness bytes k)
(bytes-sint-ref size endianness bytes k)
(bytes-uint-set! size endianness bytes k n)
(bytes-sint-set! size endianness bytes k n)
Size must be a positive exact integer. K must be a valid index of
bytes; so must the indices {k, ..., k + size - 1}. Endianness must
be an endianness object.
Bytes-uint-ref retrieves the exact integer corresponding to the
unsigned representation of size size and specified by endianness
at indices {k, ..., k + size - 1}.
Bytes-sint-ref retrieves the exact integer corresponding to the
two's complement representation of size size and specified by
endianness at indices {k, ..., k + size - 1}.
For bytes-uint-set!, n must be an exact integer in the interval
[0, (256^size)-1]. Bytes-uint-set! stores the unsigned
representation of size size and specified by endianness into the
bytes object at indices {k, ..., k + size - 1}.
For bytes-uint-set!, n must be an exact integer in the interval
[-256^(size-1), (256^(size-1))-1]. Bytes-sint-set! stores the
two's complement representation of size size and specified by
endianness into the bytes object at indices {k, ..., k + size -
1}.
(bytes-u16-ref endianness bytes k)
(bytes-s16-ref endianness bytes k)
(bytes-u16-native-ref bytes k)
(bytes-s16-native-ref bytes k)
(bytes-u16-set! endianness bytes k n)
(bytes-s16-set! endianness bytes k n)
(bytes-u16-native-set! bytes k n)
(bytes-s16-native-set! bytes k n)
K must be a valid index of bytes; so must the index k+ 1.
Endianness must be an endianness object.
These retrieve and set two-byte representations of numbers at
indices k and k+1, according to the endianness specified by
endianness. The procedures with u16 in their names deal with the
unsigned representation, those with s16 with the two's complement
representation.
The procedures with native in their names employ the native
endianness, and only work at aligned indices: k must be a multiple
of 2. It is an error to use them at non-aligned indices.
(bytes-u32-ref endianness bytes k)
(bytes-s32-ref endianness bytes k)
(bytes-u32-native-ref bytes k)
(bytes-s32-native-ref bytes k)
(bytes-u32-set! endianness bytes k n)
(bytes-s32-set! endianness bytes k n)
(bytes-u32-native-set! bytes k n)
(bytes-s32-native-set! bytes k n)
K must be a valid index of bytes; so must the indices {k, ..., k+
3}. Endianness must be an endianness object.
These retrieve and set four-byte representations of numbers at
indices {k, ..., k+ 3}, according to the endianness specified by
endianness. The procedures with u32 in their names deal with the
unsigned representation, those with s32 with the two's complement
representation.
The procedures with native in their names employ the native
endianness, and only work at aligned indices: k must be a multiple
of 4. It is an error to use them at non-aligned indices.
(bytes-u64-ref endianness bytes k)
(bytes-s64-ref endianness bytes k)
(bytes-u64-native-ref bytes k)
(bytes-s64-native-ref bytes k)
(bytes-u64-set! endianness bytes k n)
(bytes-s64-set! endianness bytes k n)
(bytes-u64-native-set! bytes k n)
(bytes-s64-native-set! bytes k n)
K must be a valid index of bytes; so must the indices {k, ..., k+
7}. Endianness must be an endianness object.
These retrieve and set eight-byte representations of numbers at
indices {k, ..., k+ 7}, according to the endianness specified by
endianness. The procedures with u64 in their names deal with the
unsigned representation, those with s64 with the two's complement
representation.
The procedures with native in their names employ the native
endianness, and only work at aligned indices: k must be a multiple
of 8. It is an error to use them at non-aligned indices.
(bytes=? bytes-1 bytes-2)
Returns #t if bytes-1 and bytes-2 are equal---that is, if they
have the same length and equal bytes at all valid indices.
(bytes-copy! source source-start target target-start n)
Copies data from bytes source to bytes target. Source-start,
target-start, and n must be non-negative exact integers that
satisfy
0 <= source-start <= source-start + n <= (bytes-length source)
0 <= target-start <= target-start + n <= (bytes-length target)
This copies the bytes from source at indices [source-start,
source-start + n) to consecutive indices in target starting at
target-index.
This must work even if the memory regions for the source and the
target overlap, i.e., the bytes at the target location after the
copy must be equal to the bytes at the source location before the
copy.
The return values are unspecified.
(bytes-copy bytes)
Returns a newly allocated copy of bytes object bytes.
(bytes->u8-list bytes)
(u8-list->bytes bytes)
bytes->u8-list returns a newly allocated list of the bytes of bytes
in the same order.
U8-list->bytes returns a newly allocated bytes object whose
elements are the elements of list bytes, which must all be bytes,
in the same order. Analogous to list->vector.
(bytes->uint-list size endianness bytes)
(bytes->sint-list size endianness bytes)
(uint-list->bytes size endianness list)
(sint-list->bytes size endianness list)
Size must be a positive exact integer. Endianness must be an
endianness object.
These convert between lists of integers and their consecutive
representations according to size and endianness in bytes objects
in the same way as bytes->u8-list, bytes->s8-list, u8-list->bytes,
and s8-list->bytes do for one-byte representations.
Reference Implementation
This reference implementation makes use of SRFI 23 (Error reporting
mechanism), SRFI 26 (Notation for Specializing Parameters without
Currying), SRFI 60 (Integers as Bits), and SRFI 66 (Octet Vectors) .
Examples
The test suite doubles as a source of examples.
References
* SRFI 4 (Homogeneous numeric vector datatypes)
* SRFI 56 (Binary I/O)
* SRFI 66 (Octet Vectors)
* SRFI 74 (Octet-Addressed Binary Blocks)
More information about the R6RS
mailing list