W
Willy Denoyette [MVP]
ThunderMusic said:Ben Voigt said:r norman said:On Thu, 23 Aug 2007 15:36:18 -0400, "ThunderMusic"
Hi,
The subject says it all... I want to use a byte[] and use it as byte* so
I
can increment the pointer to iterate through it.
What is the fastest way of doing so in C#?
I have recently started with C#, converting C++ applications that must
communicate with external equipment using protocols with rigidly fixed
byte sequences. Initially I was terribly frustrated abandoning my
precious pointer manipulation and access to byte arrays. But I
developed a bunch of utilities to create managed code byte[] arrays
from pieces and extract appropriate values from them and can now avoid
all unmanaged code. Look into Encoding.Convert for converting between
ASCII byte sequences and Unicode, MemoryStream.Write and ToArray along
with BinaryWriter.Write for creating byte[] data and the BitConverter
set of methods to extract values from byte[].
I keep telling myself that staying entirely within managed code is
worth all that effort, but I am still early in the process. Some old
timers here can probably fill you in on much more detail, including
the virtues of doing so.
Staying in managed code may be worth the effort, converting to C# is not.
The C++ compiler still makes better code than the JIT can.
hi,
That's an idea... I'll look into it... probably that some methods should
be more suited to C++ because I'm seeking a bit of performance for key
features in my app.. I never think about managed C++, but it's sure to be
a good idea... I don't want to make a C++ library for only one method
tought...
Thanks a lot
ThunderMusic
Managed C# is just as fast as C++/CLI in this case, the C++/CLI compiler
(C++/CLI to IL) doesn't "unroll" the inner loop (as illustrated by Ben), at
present there is no single managed compiler that performs "loop unrolling".
The JIT compiler (common for all managed languages) doesn't care to unroll
either.
Choosing the right algorithm is key for better performance not the (managed)
language choice.
Just to give you an idea, I optimized the algorithm in the following
function using C# with pointers (fixed/unsafe = non verifiable code).
private const int BASE = 65521; /* largest prime smaller than 65536 */
private const int NMAX = 5552;
//using an unroll factor of 16,
// compile with /o+
private static uint Adler(byte[] databytes)
{
uint adler = 1;
uint s1 = adler & 0xffff;
uint s2 = (adler >> 16) & 0xffff;
uint k;
if (databytes == null) return 1;
uint len = (uint)databytes.Length;
unsafe
{
fixed (byte* fixeddataPtr = &databytes[0])
{
byte* ptr = fixeddataPtr;
while (len > 0)
{
k = len < NMAX ? len : NMAX;
len -= k;
while (k >= 16)
{
s1 += ptr[0];
s2 += s1;
s1 += ptr[1];
s2 += s1;
s1 += ptr[2];
s2 += s1;
s1 += ptr[3];
s2 += s1;
s1 += ptr[4];
s2 += s1;
s1 += ptr[5];
s2 += s1;
s1 += ptr[6];
s2 += s1;
s1 += ptr[7];
s2 += s1;
s1 += ptr[8];
s2 += s1;
s1 += ptr[9];
s2 += s1;
s1 += ptr[10];
s2 += s1;
s1 += ptr[11];
s2 += s1;
s1 += ptr[12];
s2 += s1;
s1 += ptr[13];
s2 += s1;
s1 += ptr[14];
s2 += s1;
s1 += ptr[15];
s2 += s1;
ptr += 16;
k -= 16;
}
if (k != 0)
{
int i = -1;
do
{
s1 += ptr[++i];
s2 += s1;
} while (--k > 0);
}
s1 %= BASE;
s2 %= BASE;
}
}
}
return (s2 << 16) | s1;
}
Calling this function 4000 times on my box, using a byte array of 27500
random values, takes ~ 63 msecs.
The C++/CLI code (optimized build /O2) posted by Ben takes 95 milliseconds
to run the same test.
The same algorithm used in C# using array indexing (no unsafe constructs)
takes 105 msecs, that is only 10 % slower than the C++/CLI code.
private static uint AdlerSafe(byte[] ba)
{
uint adler = 1;
uint s1 = adler & 0xffff;
uint s2 = (adler >> 16) & 0xffff;
uint k;
if (ba == null) return 1;
uint len = (uint)ba.Length;
int offset = 0;
ArraySegment<byte> ptrSegm = new ArraySegment<byte>();
while (len > 0)
{
k = len < NMAX ? len : NMAX;
len -= k;
while (k >= 16)
{
ptrSegm = new ArraySegment<byte>( ba, offset, 16);
int i = ptrSegm.Offset - 1;
s1 += ptrSegm.Array[++i];
s2 += s1;
s1 += ptrSegm.Array[++i];
s2 += s1;
s1 += ptrSegm.Array[++i];
s2 += s1;
s1 += ptrSegm.Array[++i];
s2 += s1;
s1 += ptrSegm.Array[++i];
s2 += s1;
s1 += ptrSegm.Array[++i];
s2 += s1;
s1 += ptrSegm.Array[++i];
s2 += s1;
s1 += ptrSegm.Array[++i];
s2 += s1;
s1 += ptrSegm.Array[++i];
s2 += s1;
s1 += ptrSegm.Array[++i];
s2 += s1;
s1 += ptrSegm.Array[++i];
s2 += s1;
s1 += ptrSegm.Array[++i];
s2 += s1;
s1 += ptrSegm.Array[++i];
s2 += s1;
s1 += ptrSegm.Array[++i];
s2 += s1;
s1 += ptrSegm.Array[++i];
s2 += s1;
s1 += ptrSegm.Array[++i];
s2 += s1;
offset += 16;
k -= 16;
}
if (k != 0)
{
ptrSegm = new ArraySegment<byte>( ba, offset, (int)k);
do {
s1 += ptrSegm.Array[offset++];
s2 += s1;
} while (--k > 0);
}
s1 %= BASE;
s2 %= BASE;
}
return (s2 << 16) | s1;
}
See, unsafe code is the fastest, but verifiable C# may be fast enough.
Either way, I don't like this kind of hand optimizing ( nor do I like
"unsafe" C# code), it's up to the compilers to take care about this,
unfortunately at present they don't.
Willy.