[R6RS] comments on proposed bytes SRFI

Fri Jun 16 13:32:44 EDT 2006

Many thanks for going over this!

William D Clinger <will at ccs.neu.edu> writes:

> [various suggestions, all good]

Done, hopefully.  New copy appended for archival purposes.

> I can't tell whether bytes->u8-list and u8-list->bytes
> were misnamed or mis-specified.

Misspecified.

> Perhaps bytes->s8-list and s8-list->bytes should be added, although
> there isn't much need of them given the bytes->sint-list and
> sint-list->bytes.  Perhaps bytes->u8-list and its
> inverse should just be flushed.

I did it because I expected those two to be the most common case by
two.  I can't say I have a strong opinion, either on complementing
them with s8 operations, or flushing them.  Do you?

-- 
Cheers =8-} Mike
Friede, Völkerverständigung und überhaupt blabla

Title

Bytes Objects

Authors

Michael Sperber

Abstract

This library defines a set of procedures for creating, accessing, and
manipulating byte-addressed blocks of binary data, in short, bytes objects. The
library provides access primitives for fixed-length integers of arbitrary size,
with specified endianness, and a choice of unsigned and two's complement
representations.

This library is a variation of SRFI 74. Compared to SRFI 74, this library uses
a different terminology: what SRFI 74 calls blob this library calls bytes.

Rationale

Many applications must deal with blocks of binary data by accessing them in
various ways---extracting signed or unsigned numbers of various sizes. Such an
application can use octet vectors as in SRFI 66 or any of the other types of
homogeneous vectors in SRFI 4, but these both only allow retrieving the binary
data of one type.

This is awkward in many situations, because an application might access
different kinds of entities from a single binary block. Even for uniform
blocks, the disjointness of the various vector data types in SRFI 4 means that,
say, an I/O API needs to provide an army of procedures for each of them in
order to provide efficient access to binary data.

Therefore, this library provides a single type for blocks of binary data with
multiple ways to access that data. It deals only with integers in various sizes
with specified endianness, because these are the most frequent applications.
Dealing with other kinds of binary data, such as floating-point numbers or
variable-size integers would be natural extensions, but are left for a separate
library.

Specification

General remarks

Bytes objects are objects of a disjoint type. Conceptually, a bytes object
represents a sequence of 8-bit bytes.

The length of a bytes object is the number of bytes it contains. This number is
fixed. A valid index into a bytes object is an exact, non-negative integer. The
first byte of a bytes object has index 0, the last byte has an index one less
than the length of the bytes object.

Generally, the access procedures come in different flavors according to the
size of the represented integer, and the endianness of the representation. The
procedures also distinguish signed and unsigned representations. The signed
representations all use two's complement.

Interface

(endianness big) (syntax)
(endianness little) (syntax)

    (endianness big) and (endianness little) evaluate to the symbols big and
    little, respectively. These symbols represent an endianness, and whenever
    one of the procedures operating on bytes objects accepts an endianness as
    an argument, that argument must be one of these symbols. If the operarand
    to endianness is anything other than big or little, an expansion-time error
    is signalled.

(native-endianness )

    This procedure returns the endianness of the underlying machine
    architecture, either (endianness big) or (endianness little)

(bytes? obj)

    Returns #t if obj is a bytes object, otherwise returns #f.

(make-bytes k)

    Returns a newly allocated bytes object of k bytes, all of them 0.

(bytes-length bytes)

    Returns the number of bytes in bytes as an exact integer.

(bytes-u8-ref bytes k)
(bytes-s8-ref bytes k)

    K must be a valid index of bytes.

    Bytes-u8-ref returns the byte at index k of bytes.

    Bytes-s8-ref returns the exact integer corresponding to the two's
    complement representation at index k of bytes.

(bytes-u8-set! bytes k octet)
(bytes-s8-set! bytes k byte)

    K must be a valid index of bytes.

    Bytes-u8-set! stores octet in element k of bytes.

    Octet, must be an exact integer in the interval {0, ..., 255}.

    Byte, must be an exact integer in the interval {-128, ..., 127}.
    Bytes-s8-set! stores the two's complement representation of byte in element
    k of bytes.

    Both procedures return the unspecified value.

(bytes-uint-ref bytes k endianness size)
(bytes-sint-ref bytes k endianness size)
(bytes-uint-set! bytes k n endianness size)
(bytes-sint-set! bytes k n endianness size)

    Size must be a positive exact integer. K must be a valid index of bytes; so
    must the indices {k, ..., k + size - 1}. Endianness must be an endianness
    object.

    Bytes-uint-ref retrieves the exact integer corresponding to the unsigned
    representation of size size and specified by endianness at indices {k, ...,
    k + size - 1}.

    Bytes-sint-ref retrieves the exact integer corresponding to the two's
    complement representation of size size and specified by endianness at
    indices {k, ..., k + size - 1}.

    For bytes-uint-set!, n must be an exact integer in the interval [0, (256^
    size)-1]. Bytes-uint-set! stores the unsigned representation of size size
    and specified by endianness into the bytes object at indices {k, ..., k +
    size - 1}.

    For bytes-uint-set!, n must be an exact integer in the interval [-256^
    (size-1), (256^(size-1))-1]. Bytes-sint-set! stores the two's complement
    representation of size size and specified by endianness into the bytes
    object at indices {k, ..., k + size - 1}.

    The ...-set! procedures return the unspecified value.

(bytes-u16-ref bytes k endianness)
(bytes-s16-ref bytes k endianness)
(bytes-u16-native-ref bytes k)
(bytes-s16-native-ref bytes k)
(bytes-u16-set! bytes k n endianness)
(bytes-s16-set! bytes k n endianness)
(bytes-u16-native-set! bytes k n)
(bytes-s16-native-set! bytes k n)

    K must be a valid index of bytes; so must the index k+ 1. Endianness must
    be an endianness object.

    These retrieve and set two-byte representations of numbers at indices k and
    k+1, according to the endianness specified by endianness. The procedures
    with u16 in their names deal with the unsigned representation, those with
    s16 with the two's complement representation.

    The procedures with native in their names employ the native endianness, and
    only work at aligned indices: k must be a multiple of 2. It is an error to
    use them at non-aligned indices.

    The ...-set! procedures return the unspecified value.

(bytes-u32-ref bytes k endianness)
(bytes-s32-ref bytes k endianness)
(bytes-u32-native-ref bytes k)
(bytes-s32-native-ref bytes k)
(bytes-u32-set! bytes k n endianness)
(bytes-s32-set! bytes k n endianness)
(bytes-u32-native-set! bytes k n)
(bytes-s32-native-set! bytes k n)

    K must be a valid index of bytes; so must the indices {k, ..., k+ 3}.
    Endianness must be an endianness object.

    These retrieve and set four-byte representations of numbers at indices {k,
    ..., k+ 3}, according to the endianness specified by endianness. The
    procedures with u32 in their names deal with the unsigned representation,
    those with s32 with the two's complement representation.

    The procedures with native in their names employ the native endianness, and
    only work at aligned indices: k must be a multiple of 4. It is an error to
    use them at non-aligned indices.

    The ...-set! procedures return the unspecified value.

(bytes-u64-ref bytes k endianness)
(bytes-s64-ref bytes k endianness)
(bytes-u64-native-ref bytes k)
(bytes-s64-native-ref bytes k)
(bytes-u64-set! bytes k n endianness)
(bytes-s64-set! bytes k n endianness)
(bytes-u64-native-set! bytes k n)
(bytes-s64-native-set! bytes k n)

    K must be a valid index of bytes; so must the indices {k, ..., k+ 7}.
    Endianness must be an endianness object.

    These retrieve and set eight-byte representations of numbers at indices {k,
    ..., k+ 7}, according to the endianness specified by endianness. The
    procedures with u64 in their names deal with the unsigned representation,
    those with s64 with the two's complement representation.

    The procedures with native in their names employ the native endianness, and
    only work at aligned indices: k must be a multiple of 8. It is an error to
    use them at non-aligned indices.

    The ...-set! procedures return the unspecified value.

(bytes=? bytes-1 bytes-2)

    Returns #t if bytes-1 and bytes-2 are equal---that is, if they have the
    same length and equal bytes at all valid indices. It returns #f otherwise.

(bytes-copy! source source-start target target-start n)

    Copies data from bytes source to bytes target. Source-start, target-start,
    and n must be non-negative exact integers that satisfy

    0 <= source-start <= source-start + n <= (bytes-length source)

    0 <= target-start <= target-start + n <= (bytes-length target)

    This copies the bytes from source at indices [source-start, source-start +
    n) to consecutive indices in target starting at target-index.

    This must work even if the memory regions for the source and the target
    overlap, i.e., the bytes at the target location after the copy must be
    equal to the bytes at the source location before the copy.

    This returns the unspecified value.

(bytes-copy bytes)

    Returns a newly allocated copy of bytes object bytes.

(bytes->u8-list bytes)
(u8-list->bytes list)

    bytes->u8-listreturns a newly allocated list of the bytes of bytes in the
    same order.

    U8-list->bytes returns a newly allocated bytes object whose elements are
    the elements of list list, which must all be octets, in the same order.
    Analogous to list->vector.

(bytes->uint-list bytes endianness size)
(bytes->sint-list bytes endianness size)
(uint-list->bytes list endianness size)
(sint-list->bytes list endianness size)

    Size must be a positive exact integer. Endianness must be an endianness
    object.

    These convert between lists of integers and their consecutive
    representations according to size and endianness in bytes objects in the
    same way as bytes->u8-list, bytes->s8-list, u8-list->bytes, and s8-list->
    bytes do for one-byte representations.

Reference Implementation

This reference implementation makes use of SRFI 23 (Error reporting mechanism),
SRFI 26 (Notation for Specializing Parameters without Currying), SRFI 60
(Integers as Bits), and SRFI 66 (Octet Vectors) .

Examples

The test suite doubles as a source of examples.

References

  * SRFI 4 (Homogeneous numeric vector datatypes)
  * SRFI 56 (Binary I/O)
  * SRFI 66 (Octet Vectors)
  * SRFI 74 (Octet-Addressed Binary Blocks)