[R6RS] Bytes objects

Thu May 11 11:19:26 EDT 2006

I've changed the "blob" SRFI into a draft for "bytes objects."  It's
in Subversion in draft/bytes; a text version is attached for archival
purposes.  Please comment.

-- 
Cheers =8-} Mike
Friede, Völkerverständigung und überhaupt blabla

                                     Title

   Bytes Objects

                                    Authors

   Michael Sperber

                                    Abstract

   This library defines a set of procedures for creating, accessing, and
   manipulating byte-addressed blocks of binary data, in short, bytes
   objects. The library provides access primitives for fixed-length integers
   of arbitrary size, with specified endianness, and a choice of unsigned and
   two's complement representations.

   This library is a variation of SRFI 74. Compared to SRFI 74, this library
   uses a different terminology: what SRFI 74 calls blob this library calls
   bytes.

                                   Rationale

   Many applications must deal with blocks of binary data by accessing them
   in various ways---extracting signed or unsigned numbers of various sizes.
   Such an application can use octet vectors as in SRFI 66 or any of the
   other types of homogeneous vectors in SRFI 4, but these both only allow
   retrieving the binary data of one type.

   This is awkward in many situations, because an application might access
   different kinds of entities from a single binary block. Even for uniform
   blocks, the disjointness of the various vector data types in SRFI 4 means
   that, say, an I/O API needs to provide an army of procedures for each of
   them in order to provide efficient access to binary data.

   Therefore, this library provides a single type for blocks of binary data
   with multiple ways to access that data. It deals only with integers in
   various sizes with specified endianness, because these are the most
   frequent applications. Dealing with other kinds of binary data, such as
   floating-point numbers or variable-size integers would be natural
   extensions, but are left for a separate library.

                                 Specification

General remarks

   Bytes objects are objects of a disjoint type. Conceptually, a bytes object
   represents a sequence of bytes.

   The length of a bytes object is the number of bytes it contains. This
   number is fixed. A valid index into a bytes object is an exact,
   non-negative integer. The first byte of a bytes object has index 0, the
   last byte has an index one less than the length of the bytes object.

   Generally, the access procedures come in different flavors according to
   the size of the represented integer, and the endianness of the
   representation. The procedures also distinguish signed and unsigned
   representations. The signed representations all use two's complement.

   For procedures that have no "natural" return value, this description often
   uses the sentence

   The return values are unspecified.

   This means that number of return values and the return values are
   unspecified. However, the number of return values is such that it is
   accepted by a continuation created by begin. Specifically, on Scheme
   implementations where continuations created by begin accept an arbitrary
   number of arguments (this includes most implementations), it is suggested
   that the procedure return zero return values.

Interface

   (endianness big) (syntax)

   (endianness little) (syntax)

           (endianness big) and (endianness little) evaluate to the symbols
           big and little, respectively. These symbols represent an
           endianness, and whenever one of the procedures operating on bytes
           objects accepts an endianness as an argument, that argument must
           be one of these symbols. If the operarand to endianness is
           anything other than big or little, an expansion-time error is
           signalled.

   (native-endianness )

           This procedure returns the endianness of the underlying machine
           architecture, either (endianness big) or (endianness little)

   (bytes? obj)

           Returns #t if obj is a bytes object, otherwise returns #f.

   (make-bytes k)

           Returns a newly allocated bytes object of k bytes, all of them 0.

   (bytes-length bytes)

           Returns the number of bytes in bytes as an exact integer.

   (bytes-u8-ref bytes k)

   (bytes-s8-ref bytes k)

           K must be a valid index of bytes.

           Bytes-u8-ref returns the byte at index k of bytes.

           Bytes-s8-ref returns the exact integer corresponding to the two's
           complement representation at index k of bytes.

   (bytes-u8-set! bytes k octet)

   (bytes-s8-set! bytes k byte)

           K must be a valid index of bytes.

           Bytes-u8-set! stores octet in element k of bytes.

           Byte, must be an exact integer in the interval {-128, ..., 127}.
           Bytes-u8-set! stores the two's complement representation of byte
           in element k of bytes.

           The return values are unspecified.

   (bytes-uint-ref size endianness bytes k)

   (bytes-sint-ref size endianness bytes k)

   (bytes-uint-set! size endianness bytes k n)

   (bytes-sint-set! size endianness bytes k n)

           Size must be a positive exact integer. K must be a valid index of
           bytes; so must the indices {k, ..., k + size - 1}. Endianness must
           be an endianness object.

           Bytes-uint-ref retrieves the exact integer corresponding to the
           unsigned representation of size size and specified by endianness
           at indices {k, ..., k + size - 1}.

           Bytes-sint-ref retrieves the exact integer corresponding to the
           two's complement representation of size size and specified by
           endianness at indices {k, ..., k + size - 1}.

           For bytes-uint-set!, n must be an exact integer in the interval
           [0, (256^size)-1]. Bytes-uint-set! stores the unsigned
           representation of size size and specified by endianness into the
           bytes object at indices {k, ..., k + size - 1}.

           For bytes-uint-set!, n must be an exact integer in the interval
           [-256^(size-1), (256^(size-1))-1]. Bytes-sint-set! stores the
           two's complement representation of size size and specified by
           endianness into the bytes object at indices {k, ..., k + size -
           1}.

   (bytes-u16-ref endianness bytes k)

   (bytes-s16-ref endianness bytes k)

   (bytes-u16-native-ref bytes k)

   (bytes-s16-native-ref bytes k)

   (bytes-u16-set! endianness bytes k n)

   (bytes-s16-set! endianness bytes k n)

   (bytes-u16-native-set! bytes k n)

   (bytes-s16-native-set! bytes k n)

           K must be a valid index of bytes; so must the index k+ 1.
           Endianness must be an endianness object.

           These retrieve and set two-byte representations of numbers at
           indices k and k+1, according to the endianness specified by
           endianness. The procedures with u16 in their names deal with the
           unsigned representation, those with s16 with the two's complement
           representation.

           The procedures with native in their names employ the native
           endianness, and only work at aligned indices: k must be a multiple
           of 2. It is an error to use them at non-aligned indices.

   (bytes-u32-ref endianness bytes k)

   (bytes-s32-ref endianness bytes k)

   (bytes-u32-native-ref bytes k)

   (bytes-s32-native-ref bytes k)

   (bytes-u32-set! endianness bytes k n)

   (bytes-s32-set! endianness bytes k n)

   (bytes-u32-native-set! bytes k n)

   (bytes-s32-native-set! bytes k n)

           K must be a valid index of bytes; so must the indices {k, ..., k+
           3}. Endianness must be an endianness object.

           These retrieve and set four-byte representations of numbers at
           indices {k, ..., k+ 3}, according to the endianness specified by
           endianness. The procedures with u32 in their names deal with the
           unsigned representation, those with s32 with the two's complement
           representation.

           The procedures with native in their names employ the native
           endianness, and only work at aligned indices: k must be a multiple
           of 4. It is an error to use them at non-aligned indices.

   (bytes-u64-ref endianness bytes k)

   (bytes-s64-ref endianness bytes k)

   (bytes-u64-native-ref bytes k)

   (bytes-s64-native-ref bytes k)

   (bytes-u64-set! endianness bytes k n)

   (bytes-s64-set! endianness bytes k n)

   (bytes-u64-native-set! bytes k n)

   (bytes-s64-native-set! bytes k n)

           K must be a valid index of bytes; so must the indices {k, ..., k+
           7}. Endianness must be an endianness object.

           These retrieve and set eight-byte representations of numbers at
           indices {k, ..., k+ 7}, according to the endianness specified by
           endianness. The procedures with u64 in their names deal with the
           unsigned representation, those with s64 with the two's complement
           representation.

           The procedures with native in their names employ the native
           endianness, and only work at aligned indices: k must be a multiple
           of 8. It is an error to use them at non-aligned indices.

   (bytes=? bytes-1 bytes-2)

           Returns #t if bytes-1 and bytes-2 are equal---that is, if they
           have the same length and equal bytes at all valid indices.

   (bytes-copy! source source-start target target-start n)

           Copies data from bytes source to bytes target. Source-start,
           target-start, and n must be non-negative exact integers that
           satisfy

           0 <= source-start <= source-start + n <= (bytes-length source)

           0 <= target-start <= target-start + n <= (bytes-length target)

           This copies the bytes from source at indices [source-start,
           source-start + n) to consecutive indices in target starting at
           target-index.

           This must work even if the memory regions for the source and the
           target overlap, i.e., the bytes at the target location after the
           copy must be equal to the bytes at the source location before the
           copy.

           The return values are unspecified.

   (bytes-copy bytes)

           Returns a newly allocated copy of bytes object bytes.

   (bytes->u8-list bytes)

   (u8-list->bytes bytes)

           bytes->u8-list returns a newly allocated list of the bytes of bytes
           in the same order.

           U8-list->bytes returns a newly allocated bytes object whose
           elements are the elements of list bytes, which must all be bytes,
           in the same order. Analogous to list->vector.

   (bytes->uint-list size endianness bytes)

   (bytes->sint-list size endianness bytes)

   (uint-list->bytes size endianness list)

   (sint-list->bytes size endianness list)

           Size must be a positive exact integer. Endianness must be an
           endianness object.

           These convert between lists of integers and their consecutive
           representations according to size and endianness in bytes objects
           in the same way as bytes->u8-list, bytes->s8-list, u8-list->bytes,
           and s8-list->bytes do for one-byte representations.

                            Reference Implementation

   This reference implementation makes use of SRFI 23 (Error reporting
   mechanism), SRFI 26 (Notation for Specializing Parameters without
   Currying), SRFI 60 (Integers as Bits), and SRFI 66 (Octet Vectors) .

                                    Examples

   The test suite doubles as a source of examples.

                                   References

     * SRFI 4 (Homogeneous numeric vector datatypes)
     * SRFI 56 (Binary I/O)
     * SRFI 66 (Octet Vectors)
     * SRFI 74 (Octet-Addressed Binary Blocks)