c# is a good way to learn c

  • Thread starter Thread starter Montrose...
  • Start date Start date
Olaf said:
Very nice. :-)

But the big question now in modern processors is yours faster?
I do know that some simple instructions that are used more often tends to be
faster than instructions that might have one opcode but not used that much.
And another thing is that that modern processors execute stuff out of order.
So less instructions does not mean necesarely faster performance, but it
could and in your solution it might.

The only way to determin what is faster is to actually measure it.
And then again, it might be processor dependend.

btw., what's wrong with the BSWAP instruction?

Stefan
 
btw., what's wrong with the BSWAP instruction?And the winneerr isssss!!!!!!!!! STEFAAAAAANNNN!!!!!

He is right, since the 80486 there exist one assembler function that does
that. :-)

<snip>
The bswap instruction, available only on 80486 (yes, 486) and later
processors, converts between 32 bit little endian and big endian values.
This instruction accepts only a single 32 bit register operand. It swaps the
first byte with the fourth and the second byte with the third. The syntax
for the instruction is
bswap reg32where reg32 is an 80486 32 bit general purpose register.
</snip>
 
This does the trick in VC++ 2003

int iswapped=0;
int iOriginal=0x01234567;
__asm {
mov eax,iOriginal
bswap eax
mov iswapped,eax
}

Which translates to:

mov dword ptr [iswapped],0 // int iswapped=0;
mov dword ptr [iOriginal],1234567h // int iOriginal=0x01234567;
// and now the __asm part
mov eax,dword ptr [iOriginal] // mov eax,iOriginal
bswap eax
mov dword ptr [iswapped],eax // mov iswapped,eax

And in a optimizer compiler: "mov eax,dword ptr [iOriginal]" and "mov dword
ptr [iswapped],eax" might have been stripped away since the variables would
probaly already exist in eax.

So for those that thought that Assembler died, it is still there, absorbed
into the C++ language.
 
Hi Stefan, You ased: << what's wrong with the BSWAP instruction ? >>

Variations of my Big_First_32() illustrate serious problems
with MS_CPP, not just with Sequence_Points,
but, even more seriously, with how it's /Og switch, Global optimizations,
breaks perfectly legal code.

Spooky would say that BSWAP doesn't work on the PowerPC, or something.

MS_CPP has _byteswap_ulong() which is just a BSWAP instruction:

mov ecx,eax
bswap ecx

I have a cheap_ass Celeron with only a 128 KB L2 cache.
After double_clicking Kelsey.EXE three times in a row, for good measure,
these are the times that I'm getting:
.00445 Seconds, Sum 2147244176424960, _byteswap_ulong( 0 - 999,999 ).
.00608 Seconds, Sum 2147244176424960, Swap_32( 0 - 999,999 ).
.00593 Seconds, Sum 2147244176424960, Big_First_32( 0 - 999,999 ).
.01049 Seconds, Sum 2147244176424960, htonl( 0 - 999,999 ).

http://www.Cotse.NET/users/jeffrelf/Kelsey.EXE
http://www.Cotse.NET/users/jeffrelf/Kelsey.CPP
http://www.Cotse.NET/users/jeffrelf/Kelsey.VCPROJ

#pragma warning( disable: 4244 )
#define WIN32_LEAN_AND_MEAN
#include <windows.h>
#include <StdLib.H>
#include <stdio.h>
#include <IO.H>
#include <Winsock2.H>
#pragma comment( lib, "Ws2_32.LIB")

#define Loop( N ) int J = - 1, LLL = N ; while ( ++ J < LLL )
#define Tics ( QueryPerformanceCounter( ( Quad * ) & _Tics ), _Tics )
#define Secs ( _Secs = Tics / Secnd_Dub )

typedef char * int_8_P ;
typedef unsigned char uint_8 ; typedef uint_8 * uint_8_P ;
typedef unsigned __int32 uint_32 ;
typedef LARGE_INTEGER Quad ;

double Secnd_Dub, _Secs, Mark ; __int64 _Tics, Secnd ;

uint_32 Big_First_32 ( int & X ) { uint_8_P B = ( uint_8_P ) & X ;
return * B << 24 | B [ 1 ] << 16 | B [ 2 ] << 8 | B [ 3 ]; }

uint_32 Swap_32 ( int X ) {
return uint_8( X ) << 24 | uint_8( X >> 8 ) << 16
| uint_8( X >> 16 ) << 8 | uint_8( X >> 24 ); }

main() {
QueryPerformanceFrequency( ( Quad * ) & Secnd ); Secnd_Dub = Secnd ;
FILE * fp = fopen( "AA.TXT", "w" );
const int Times = 1000 * 1000 ;

__int64 X = 0 ; Mark = Secs ;
{ Loop( Times ) X += _byteswap_ulong( J ); } double Dur = Secs - Mark ;

__int64 X2 = 0 ; Mark = Secs ;
{ Loop( Times ) X2 += Swap_32( J ); } double Dur2 = Secs - Mark ;

__int64 X3 = 0 ; Mark = Secs ;
{ Loop( Times ) X3 += Big_First_32( J ); } double Dur3 = Secs - Mark ;

__int64 X4 = 0 ; Mark = Secs ;
Loop( Times ) X4 += htonl( J ); double Dur4 = Secs - Mark ;

char SecStr [ 99 ] ;
sprintf( SecStr, "%1.5f" , Dur );
fprintf( fp, "%s Seconds, Sum %I64d, _byteswap_ulong( 0 - 999,999 ).\n"
, SecStr + ( * SecStr == '0' ), X );

sprintf( SecStr, "%1.5f" , Dur2 );
fprintf( fp, "%s Seconds, Sum %I64d, Swap_32( 0 - 999,999 ).\n"
, SecStr + ( * SecStr == '0' ), X2 );

sprintf( SecStr, "%1.5f" , Dur3 );
fprintf( fp, "%s Seconds, Sum %I64d, Big_First_32( 0 - 999,999 ).\n"
, SecStr + ( * SecStr == '0' ), X3 );

sprintf( SecStr, "%1.5f" , Dur4 );
fprintf( fp, "%s Seconds, Sum %I64d, htonl( 0 - 999,999 ).\n"
, SecStr + ( * SecStr == '0' ), X4 ); fclose( fp ); }
 
Hi Olaf, You wrote: << This does the trick in VC++ 2003
int iswapped=0; int iOriginal=0x01234567;
__asm {
mov eax,iOriginal
bswap eax
mov iswapped,eax }

Which translates to:

mov dword ptr [iswapped],0 // int iswapped=0;
mov dword ptr [iOriginal],1234567h // int iOriginal=0x01234567;
// and now the __asm part
mov eax,dword ptr [iOriginal] // mov eax,iOriginal
bswap eax
mov dword ptr [iswapped],eax // mov iswapped,eax

And in a optimizer compiler: mov eax,dword ptr [iOriginal]
and mov dword ptr [iswapped],eax might have been stripped away
since the variables would probaly already exist in eax.

So for those that thought that Assembler died,
it is still there, absorbed into the C++ language. >>

Too bad you don't have MS_CPP_Pro,
You might look for a so_called used or OEM copy on eBay.

I don't trust /Og, wouldn't use it for serious products, too fickle,
but it is fun to play with.
 
Hi Olaf, Re: * ++ P << 16 | * ++ P << 8 and how | doesn't, but should,
provide a so_called Sequence_Point in C_99, gcc or MS_CPP_7_1,

You told me: << Yes it shoud have worked in my opinion too.
But this is life, nothing is perfect. >>

I think MicroSoft won't implement extra Sequence_Points
for fear of becomming too incompatible with the other compilers.
Which is a real shame, as, no doubt, it's the cause of many a mysterious bug.

I think MS should just add a bunch of additional Sequence_Points
and tell the others to either catch up or go fish.

Speaking of compiler faults,
I say C# is wrong to not support printf(), #define, #include, etc.
Not because I care about old code, but becuase I prefer those operatives.

C++/C#'s String/cout/STL can be handy, no doubt,
but I still prefer doing my own memory management, dynamic lists, etc.
As my HTM_TXT.CPP demonstrates, LoopTo() is just pure flexibility:

#define LoopTo( StopCond ) \
while ( Ch && ( Ch = ( uchar ) * ++ P ) \
&& ! ( Ch2 = ( uchar ) P [ 1 ], StopCond ) )

You recounted: <<
I also discovered one time that, if you use events in a C++ class,
VC++ 2002 [ throws ] an access violation if the class happens
not to have a constructor defined and implemented in the header file.
So, for a week, I struggled with that event thing,
and, funny enough, the examples worked but mine failed.
[ Looking at the assmbly code, I saw that ]
the variables of those events [ weren't getting ] initialized.
Clearly a bug in the C++ compiler. Now my code works fine,
since I now put a complete constructor in the header file. >>

Interesting, I wouldn't have thought to check the disassembly like that,
I'll try that the next time I have a hard_to_find bug ( i.e. soon ).

Re: How you use assembly these days, You wrote: <<
In my case it is 15 years old knowledge.
I only use it to look at the generated code to find bugs in my code
and to optimize my functions to speed up without resorting to assembler.
Or to learn a new language,
because I can compare it to something I already know.

And now I do this with IL assembler generated by .NET. >>

The IL assembler sounds cool to me.

You added: << One thing I discover is that properties are not optimized,
not inlined so that could explain why some C# code could be slower.
But then again if I look at my C++ code of the VC++ 2003 Standard,
none of my properties gets inlined too. And this explains
why my C++ code and C# code are almost the same speed on the same computer
and the same OS and compiled with the same compiler environment.
I hope that The VS 2005 gets a better optimizer for that. >>

I have my MS_CPP_Pro inline stuff, even when debugging,
it hasn't been a problem for me.

Re: x86 assembly, you told me: << In the case of Intel like processors,
the ax, eax register gets specialized for processing things.
It is another name for accumulator. >> ... <<
any operation is done with that register, so a lot of code
is copying registers to the eax register and then moving it back.

Another thing to know is that you cannot access the upper word part of eax.
( or the ebx, ecx, edx )
Only the lower word part [ can ] be split into a high byte and a low byte.

eax is the 32 bit register
ax is the same as LOWORD(eax)
and al would be like LOBYTE(ax)
and ah would be like HIBYTE(ax)

So to get the HIWORD(eax), you must >> 16 to [ access ] the ax part.
Then you can access it.

Typically eax is used for calculating things.
ebx is used as index pointer
ecx is typically used as counter
edx is typically used as destination index pointer.
But in the case of an optimizer compiler
you might lose that relationship. >>

What other registers are there, and how are they restricted ?

You added: << Another thing something like this xor ecx,ecx
is actually saying set ecx to null.
This notation is only one byte and superfast
compared to loading it with a actual value. >>

That much I knew already.

Re: mov ecx, dword ptr [ esp + 18h ]

You concluded: <<
Yes a local variable located at 18h positions from your return address. :-)
And if you get something like this mov ecx, dword ptr [ esp - 18h ]
then it is some parameter passed on from outside your function. >>

That's interesting, - is a parameter, thanks.
 
Interesting, I wouldn't have thought to check the disassembly like that,
I'll try that the next time I have a hard_to_find bug ( i.e. soon ).
Most of the times I use it if I have complicated one liners, to find out
what instruction in that part is causing the problem.
And now I do this with IL assembler generated by .NET. >>

The IL assembler sounds cool to me.
It is some kind of OOP version of assembler.
But they don't use registers. They use the stack, it is up to the optimizer
to use the most efficient way of register usage.

IL looks something like this (peudo code):

//x=A+B-C
Push A
Push B
Add
Push C
Sub

A little bit like HP calculators would do to.
As far as I see, the generated IL code in C# does not seem to be optimized.
But the JIT might do that optimizing instead, so when I find time I am going
to check into this.
Typically eax is used for calculating things.
ebx is used as index pointer
ecx is typically used as counter
edx is typically used as destination index pointer.
But in the case of an optimizer compiler
you might lose that relationship. >>

What other registers are there, and how are they restricted ?
This is getting too big for the time that I have to explain all this.
You concluded: <<
Yes a local variable located at 18h positions from your return address. :-)
And if you get something like this mov ecx, dword ptr [ esp - 18h ]
then it is some parameter passed on from outside your function. >>

That's interesting, - is a parameter, thanks.
Most of the time eax, ebx, ecx and edx is used for the first 4 parameters
that fit in a 32 bit number, like int, short, byte, and pointers/references)
and then they start using parameters pushed on the stack
Most of the time a result is returned in the eax.

But if you use a different calling convention other rules might apply.
 
Hi Olaf, Re: Push A Push B Add Push C Sub,

I know Forth fairly well, including PostScript.

You told me: << Most of the time eax, ebx, ecx and edx
are used for the first 4 parameters that fit in a 32 bit number,
( like int, short, byte, and pointers/references )
and then they start using parameters pushed on the stack.
Most of the time a result is returned in the eax. >>

Cool, that's good to know, thanks.
 
In comp.os.linux.advocacy, Jeff_Relf
<[email protected]>
wrote
Hi Spooky, Re: Your mod of the code I showed, You told me: <<
return * P << 24 | * ++ P << 16 | * ++ P << 8 | * ++ P ; }
does work correctly in GCC. >>

But that's a quirk, of course.
The solution would be to modify the MS_CPP and gcc compilers themselves,
as well as the C_2006 standard, to introduce Many more sequence points.

For example,
Each of the following should have it's left_to_right'ness guaranteed,
complete with so_called sequence_points in the obvious places:
1. char Buffer [] = { func(), /* Seq_Pnt */ func(), /* Seq_Pnt */ func() };
2. func( func(), /* Seq_Pnt */ func(), /* Seq_Pnt */ func() );
3. return * ++ P << 16 | /* Seq_Pnt */ * ++ P << 8 ;

At the very Very least, a compiler warning should be thrown !

Re: How I liked the disassembly of the working C++ code I showed,

You replied: << Feh. Try this code for pretty.
This is in MASM/Intel syntax:

mov edx, eax ; edx=0x84838281
shr eax, 16 ; eax=0x00008483
xchg eax, edx ; edx=0x00008483,eax=0x84838281
xchg al, ah ; eax=0x84838182
shl eax, 16 ; eax=0x81820000
xchg dl, dh ; edx=0x00008384
or eax, edx ; eax=0x81828384

There's probably better methods but I've already reduced it to 7
instruction lines by simply coding it by hand. Works like a champ. >>

Wow, I'm impressed ! So much for my claim that you don't do assembly anymore.

I don't. This is a hobby. :-) But I've used it in the past,
in the 8088 PC era (the systems were a *lot* slower then),
and I've had to debug the occasional routine using nothing but.
You added: << I could reduce it to 3 if there's a method to exchange
the 16-bit register ax with the high 32-bits of eax. However,
the obvious choices xchg eax,ax and xchg eah,ax both fell flat. >>

Wow again, You just taught me some x86 assembly:
1. ax is a short.
2. al is ax's low byte.
3. ah is ax's high byte.
4. eax is a long.

Olaf taught me that this is moving an int off the stack to ecx

mov ecx, dword ptr [ esp + 18h ]

Hmm...that's a simple one, at that:

push ebp
move ebp, esp
mov ax, word ptr 8(ebp)
xchg al, ah
lsh eax, 16
mov ax, word ptr 10(ebp)
xchg al, ah
pop ebp
ret

9 lines.

It's not quite a fair comparison since we're defining a whole
function here, as opposed to merely loading eax; my earlier
attempt, for instance, would have to be rewritten

push ebp
move ebp, esp
push edx
mov eax, dword ptr 8(ebp)
mov edx, eax
shr eax, 16
xchg eax, edx
xchg al, ah
shl eax, 16
xchg dl, dh
or eax, edx
pop edx
pop ebp
ret

14 lines.

(Your example(s) would require the 4 extra lines (push, move, pop, ret)
as well.)
You concluded: <<
I don't know if GCC is up to producing this quality of code, or not.
It would take quite some doing -- one could call it grokking
-- the subtleties of the machine architecture. >>

It'd be easier to just inline the assembly.
To time it, you could add it to something like my Kelsey.CPP
http://www.Cotse.NET/users/jeffrelf/Kelsey.CPP
http://www.Cotse.NET/users/jeffrelf/Kelsey.VCPROJ

Inline assembly is a VC-specific syntax. GCC does it differently.
But, as I keep repeating, readability is my only goal, not speed.
I say the following code is more readable,
...and it Should be legal and very optimizable... but it's neither.

uint_32 Big_First_32 ( int X ) { uint_8_P P = ( uint_8_P ) & X ;
return * P << 24 | * ++ P << 16 | * ++ P << 8 | * ++ P ; }

Optimizable? With more sequencepoints? The more sequencepoints,
the smaller the code bunches between sequencepoints that the
compiler can optimize.
 
In comp.os.linux.advocacy, Olaf Baeyens
<[email protected]>
wrote
And the winneerr isssss!!!!!!!!! STEFAAAAAANNNN!!!!!

He is right, since the 80486 there exist one assembler function that does
that. :-)

I knew it. :-) There just had to be a better way.

Kewl.
 
Try this code for pretty. This is in MASM/Intel syntax:

mov edx, eax ; edx=0x84838281
shr eax, 16 ; eax=0x00008483
xchg eax, edx ; edx=0x00008483,eax=0x84838281
xchg al, ah ; eax=0x84838182
shl eax, 16 ; eax=0x81820000
xchg dl, dh ; edx=0x00008384
or eax, edx ; eax=0x81828384

That usual way to do this on processors that don't specifically have an
instruction to do it is with three rotates:

ror 8, ax
ror 16, eax
ror 8, ax

The 486 and above has an instruction for this, though, so probably best
is to use that:

bswap eax
 
Hi Spooky, Re: How I think the following code is more readable,
and Should be legal and very optimizable... but it's neither: <<

uint_32 Big_First_32 ( int X ) { uint_8_P P = ( uint_8_P ) & X ;
return * P << 24 | * ++ P << 16 | * ++ P << 8 | * ++ P ; } >>

You told me: << Optimizable ? With more sequencepoints ?
The more sequencepoints, the smaller the code bunches
between sequencepoints that the compiler can optimize. >>

More Sequence_Points would be more predictible/readable,
it's absurd to not have three Sequence_Points here: a() + b() * c(),
....really Really absurd... and not so much as a warning is thrown.

Although speed is not nearly as important of an issue,
it's possible to optimize a series of Sequence_Points, of course.
 
I am surprised that there are still developers who say they prefer C/C++ over
C#, even, or especially, if they are writing a Windows application from
scratch. If you are developing Windows software in a competitive environment,
ignoring .NET is simply not an option. People who write their applications
with, say, C++ and STL and not C# or VB.NET must be programming for the sake
of programming itself and not to create solutions. Why would you write code
that somebody else has already written for you, tested and QA'd and nicely
integrated into the runtime? Beats me.
 
I'm surprised that there are still developers who say they prefer C/C++ over
C#, even, or especially, if they are writing a Windows application from
scratch. If you are developing Windows software in a competitive environment,
ignoring .NET is simply not an option. People who write their applications
with, say, C++ and STL and not C# or VB.NET must be programming for the sake
of programming itself and not to create solutions. Why would you write code
that somebody else has already written for you, tested and QA'd and nicely
integrated into the runtime? Beats me.
 
I'm surprised that there are still developers who say they prefer C/C++ over
C#, even, or especially, if they are writing a Windows application from
scratch. If you are developing Windows software in a competitive environment,
ignoring .NET is simply not an option. People who write their applications
with, say, C++ and STL and not C# or VB.NET must be hobbyists or are
programming for the sake of programming itself and not to create solutions.
Why would you write code that somebody else has already written for you,
tested and QA'd and nicely integrated into the runtime? Beats me.
 
Back
Top