[r6rs-discuss] [Formal] modified interface to list-sort and vector-sort

From: Daniel Villeneuve <daniel_villeneuve>
Date: Sat Feb 24 19:16:11 2007

Aubrey Jaffer wrote:
> | Date: Tue, 20 Feb 2007 20:55:40 -0500
> | From: Daniel Villeneuve <daniel_villeneuve_at_sympatico.ca>
> |
> | PART A: prototype for comparison functions
> |
> | The comparison function provided to list-sort and
> | vector-sort should return a 3-way result when comparing two
> | objects a and b:
> | - negative integer: a < b
> | - 0: a = b
> | - positive integer: a > b
> |
> | This avoids comparing items twice, which can be costly in
> | some cases (e.g., comparing vectors of numbers).
>
> Merge-sort does not compare pairs of items twice. Any sorting
> algorithm which did could be improved by removing one of the
> comparisons [because (<= a b) is equivalent to (not (< b a))]. 3-way
> comparisons do not improve sorting performance.

As kindly suggested by Aubrey Jaffer and Felix Klock, I've read the
following discussions about 3-way comparisons:

http://srfi.schemers.org/srfi-32/mail-archive/msg00023.html
http://srfi.schemers.org/srfi-95/mail-archive/msg00002.html

It seems that there is some doubt about improving merge-sort using
3-way comparisons.

At the end of this article is an algorithm that I claim has the
following specs:

- bottom-up merge sort (non-recursive)
- O(n log n) worst case
- list-oriented
- stable
- Omega(n) in the best case

and which takes advantage of 3-way comparisons in a few
places.

The challenge is to get it both stable and Omega(n) in the
best case. The idea is to take advantage of sorted
subsequences as long as stability is not lost.

Since my implementation is in C, I provide the algorithm as
that language. This is _not_ the code I use and I did not
compile it. However, I've derived it line-by-line from the
real thing, so there should be no missing ideas.

History: there is a similar merge-sort implementation in GNU
glibc's C++ list sort (on my Linux system:
/usr/include/c++/3.4.3/bits/list.tcc), which is stable but
does not attempt to exploit altready sorted subsequences.
I've used variants of this algorithm for a long time, and
I've modified it only recently in order to get it both
Omega(n) in the best case and stable (when wanting to
implement a good "R6RS/3-way cmp" list-sort algorithm).
Before, I had two flavors: one stable (not looking for
sorted subsequences, similar to glibc's) and one best-case
but unstable.

I'll be happy to discuss the details should this raise any
interest.

/*****************************************************************************/

/* upper bound on merge-sort recursion */
#define STACK_SIZE (sizeof(void *) * CHAR_BIT)

typedef /* some data type */ V;
typedef struct X X;
struct X
{
  X *next; /* singly-linked list, NULL terminated */
  V val;
};

/* 3-way comparison:
   v1 < v2 : negative
   v1 = v2 : 0
   v1 > v2 : positive */
typedef int (*CMP)(V v1, V v2);

/* Stable merge of already-sorted lists new and old (if an old and a new
   item compare equal, the old item precedes the new one in the result). */
X *merge(X *new, X *old, CMP cmp)
{
  /* well-known, does not take advantage of 3-way comparisons */
}

X *sort(X *head, CMP cmp)
{
  if( !head ) return head; /* empty list */

  X *stack[STACK_SIZE] = { 0 };
  X **sp; /* stack pointer */
  X *new; /* sublist being built */
  X *tmp;

  int te; /* 3-way result */

  do {
    /* Extract a sorted sublist, new, from the first elements of head, and
       locate it's stack slot using sp. In a pure merge-sort, stack[i]
       contains either the empty list or a sorted list of 2^i items. In
       this implementation, stack[i] can contain between 2^i and 2^{i+1}-1
       items. */

    new = head, head = head->next; /* pop */

    if( !head ) sp = stack; /* new = 1-element list */
    else {
      tmp = head, head = head->next; /* pop */
      te = cmp(tmp, new);
      if( te < 0 ) { X *x = tmp; tmp = new; new = x; } /* swap */
      new->next = tmp;
      sp = stack + 1;

      if( !head ) tmp->next = NULL; /* new = 2-element list */
      else {
        /* Try to grow new by taking advantage of a possibly ascending or
           descending sorted subsequence starting at head. */

        X *last = tmp;
        int n = 2, lim = 4;

        if( !te )
          /* new and tmp were equal: try to get as many equal items
             as possible, extending to the right (stability) */
          do {
            if( ++n >= lim ) {
              if( *sp ) break; /* stability stopping criterion */
              ++sp, lim <<= 1;
            }
            tmp = head, head = head->next; /* pop */
            te = cmp(tmp, last);
            if( te >= 0 ) {
              last->next = tmp, last = tmp; /* extend to the right */
              if( !te ) continue; /* keep looking for equal items */
            }
            else {
              tmp->next = new, new = tmp; /* extend to the left */
            }
            break; /* te < 0 or te > 0 */
          } while( head );

        if( te > 0 ) {
          /* try to extend a non-decreasing subsequence to the right */
          while( head ) {
            if( ++n >= lim && *sp ) break; /* stability stopping criterion */
            te = cmp(head, last);
            if( te < 0 ) break;
            if( n >= lim ) ++sp, lim <<= 1; /* side-effect on sp only if pop */
            tmp = head, head = head->next; /* pop */
            last->next = tmp, last = tmp; /* extend to the right */
          }
        }
        else if( te < 0 ) {
          X *eq = new; /* last item of an "equal prefix" sublist of new */
          while( head ) {
            /* try to extend a non-increasing subsequence to the left */
            if( ++n >= lim && *sp ) break; /* stability stopping criterion */
            te = cmp(head, new);
            if( te > 0 ) break;
            if( n >= lim ) ++sp, lim <<= 1; /* side-effect on sp only if pop */
            tmp = head, head = head->next; /* pop */
            if( !te )
              /* extend the "equal prefix" in the "middle" */
              tmp->next = eq->next, eq->next = tmp, eq = tmp;
            else
              tmp->next = new, eq = new = tmp; /* extend to the left */
          }
        }

        last->next = NULL;
      }
    }

    /* merge nearly-equal-length lists, moving the result forward */
    for(; *sp; *sp++ = NULL)
      new = merge(new, *sp, cmp);
    *sp = new;

  } while( head );

  /* cleanup */

  for(sp = stack; !*sp; ++sp); /* must find a non-empty slot */
  new = *sp;

  while( ++sp < stack + STACK_SIZE )
    if( *sp )
      new = merge(new, *sp, cmp);

  return new;
}

/*****************************************************************************/

--
Daniel
Received on Sat Feb 24 2007 - 19:15:35 UTC

This archive was generated by hypermail 2.3.0 : Wed Oct 23 2024 - 09:15:01 UTC