[R6RS] Transcoding with and without buffering
Michael Sperber
sperber at informatik.uni-tuebingen.de
Mon Jul 24 13:13:32 EDT 2006
I've hacked up a little C program to measure the difference between
transcoding in bulk and transcoding byte-by-byte using the POSIX iconv
library. I expect that iconv or ICU or some other external library
will be a popular and reasonable implementation strategy for
performing the transcoding. The program has two functions, one which
feeds the contents of a test file to iconv as a whole, and the other
one byte-by-byte (transcoding from UTF-8 to UTF-8).
On a PowerBook G4 with 867Mhz using the iconv shipped with
the latest Mac OS X, I get:
Michael-Sperbers-Computer[266] ll UTF-8-demo.txt
-rw-r--r-- 1 sperber PUstaff 14056 Dec 9 2004 UTF-8-demo.txt
With buffering:
Michael-Sperbers-Computer[263] time ./a.out UTF-8-demo.txt
11.347u 0.998s 0:12.90 95.5% 0+0k 0+2io 0pf+0w
Without buffering:
Michael-Sperbers-Computer[265] time ./a.out UTF-8-demo.txt
76.096u 2.209s 1:20.51 97.2% 0+0k 0+2io 0pf+0w
Of course, the test program isn't entirely representative of the
transcoding machinery in a realistic I/O library, but it does show
that there's overhead involved.
--
Cheers =8-} Mike
Friede, Völkerverständigung und überhaupt blabla
-------------- next part --------------
#include <iconv.h>
#include <stdio.h>
#include <stdlib.h>
static void
test1(const char* in, size_t n)
{
char* buf = (char*)malloc(65536);
char* out = buf;
iconv_t ic = iconv_open("UTF-8", "UTF-8");
size_t n_out = 65536;
size_t result = iconv(ic, &in, &n, &out, &n_out);
iconv_close(ic);
free(buf);
/* printf("n=%ld n_out=%ld result=%ld\n", n, n_out, result); */
}
static void
test2(const char* in, size_t n)
{
const char* in_start = in;
char* buf = (char*)malloc(65536);
char* out = buf;
iconv_t ic = iconv_open("UTF-8", "UTF-8");
while (in < in_start + n)
{
size_t in_count = 1;
for (;;)
{
size_t n_out = 65536;
size_t result = iconv(ic, &in, &in_count, &out, &n_out);
/* printf("in=%p in_count=%ld n_out=%ld result=%ld\n", in, in_count, n_out, result); */
if ((result == -1) && (errno == EINVAL))
++in_count;
else
break;
}
}
free(buf);
iconv_close(ic);
}
int
main(int argc, char* argv[])
{
FILE* f = fopen(argv[1], "r");
const char* in = (const char*)malloc(65536);
size_t n = fread((void*)in, 1, 65536, f);
int i = 0;
while (i < 10000)
{
/* test1(in, n); */
test2(in, n);
++i;
}
return 0;
}
More information about the R6RS
mailing list