String / Character Conversion Question

  • Thread starter Thread starter anon
  • Start date Start date
A

anon

How can following two statement be the same?

(1) using System.l1001

(2) using System.Globalization;

That is, how does the compilier know these two representations, at the using
declaration level, are the same?

Thanks.
 
anon said:
How can following two statement be the same?

(1) using System.l1001

(2) using System.Globalization;

That is, how does the compilier know these two representations, at the using
declaration level, are the same?

It doesn't, as far as I know. Do you have an example which leads you to
suspect that they are the same?
 
anon said:
It's from an Obfuscation example. I want to know how this is possible in
the first place.

Unless the obfuscator is also obfuscating the system libraries, I don't
see how those two *are* going to be equivalent.

Of course, the obfuscator could have decided to change all namespaces
to System.XXX in order to confuse people - in which case it's worked!
 
anon said:
This is where I saw it. So I looked and I saw they started to change the
names at the system level and then I said, how did they manage to do this.

http://www.semdesigns.com/Products/Obfuscators/CSharpObfuscationExample.html

They basically can't, as far as I can tell - not without also providing
an obfuscated version of the system libraries themselves. If they've
done that, then that's a different matter. Not that creating obfuscated
*source* code is a particularly useful thing anyway, IMO. They've said
how clever they are for removing line breaks, converting text using
\uxxxx format etc - all of which is pointless when, for the most part,
obfuscation is used for *binaries*, not source: just compiling and then
decompiling would remove all the protection there apart from the name
changes.
 
Do you think these guys know something we don't know? At least for changing
the names to system classes?

But in regards to obfuscating source code, let's say if you did compile it
and then decompiled it.

The decompiled version would like or be similar to the source code which
would be obfuscated anyway.

The name changes would be, at least to me, a somewhat considerable hurdle to
figure out or use the source code in the first place. You would have only a
glimmer of an idea of what the source code did as you could not read that
anyway.
 
anon said:
This is where I saw it. So I looked and I saw they started to change the
names at the system level and then I said, how did they manage to do this.

http://www.semdesigns.com/Products/Obfuscators/CSharpObfuscationExample.html

it's hard to know what they're doing in the example, but there are a
couple of comments that I think I can make:

1) using statements (of the namespace sort) do not get compiled down to
IL - they're used by the C# compiler to make dealing with nested
namespace names more convenient. However, the emited IL always deals
with the full typenames, so I'm not sure how a decompiler would generate
using statements, except by noting which nested namespaces are used in
the entire assembly and emitting a using statement for some or all of
those. In this case, the using statements would not magically appear in
the same order in the decompiled source as they were in the original
source as they seem to do in the example.

2) More importantly, there are no types from the System.Globalization
namespace being used in the example. If no types from that namespace
are used, then the decompiler would be free to put pretty much any
namespace it wanted in place of "using System.Globalization;". However,
I'm at a loss as to why it would decide to do anything with
System.Globalization if no types from that namespace are used.

Looking at the example some more, I notice that the following namespaces
get renamed:

antlr
antlr.collections
System.Globalization
System.IO


These seem to correspond to the namespaces that don't actually get used.
That would explain why they can rename those namespaces, but it
doesn't explain why they'd have anything at all for those namespaces in
the decompiled code. Maybe I'm mistaken in my point #1, but I don't
think so.


 
mikeb said:
it's hard to know what they're doing in the example, but there are a
couple of comments that I think I can make:

1) using statements (of the namespace sort) do not get compiled down to
IL - they're used by the C# compiler to make dealing with nested
namespace names more convenient. However, the emited IL always deals
with the full typenames, so I'm not sure how a decompiler would generate
using statements, except by noting which nested namespaces are used in
the entire assembly and emitting a using statement for some or all of
those. In this case, the using statements would not magically appear in
the same order in the decompiled source as they were in the original
source as they seem to do in the example.

2) More importantly, there are no types from the System.Globalization
namespace being used in the example. If no types from that namespace
are used, then the decompiler would be free to put pretty much any
namespace it wanted in place of "using System.Globalization;". However,
I'm at a loss as to why it would decide to do anything with
System.Globalization if no types from that namespace are used.

Looking at the example some more, I notice that the following namespaces
get renamed:

antlr
antlr.collections
System.Globalization
System.IO


These seem to correspond to the namespaces that don't actually get used.
That would explain why they can rename those namespaces, but it doesn't
explain why they'd have anything at all for those namespaces in the
decompiled code. Maybe I'm mistaken in my point #1, but I don't think so.

I see now that the obfuscator is not obfuscating compiled IL, but
obfuscating directly from the source. As noted by Jon Skeet, that's not
a particularly useful thing to do. I guess it gets you a certain level
of obfuscated names in any assembly compiled from the obfuscated source,
but the tricks they pull with using \uXXXX encoded characters won't buy
a single thing once compiled.

However, it does explain how they can rename get the using statements.

It's a simple as:

The namespaces are not used, so they can rename them however they want.
To get the obfuscated source to compile, they would need to provide a
namespace somewhere that the compiler can match up with or there's a
compiler error. There doesn't need to be anything useful in that
namespace, however.

One thing that I am still curious about is that they rename the
Hashtable class. This causes a compiler error. I believe that this is a
mistake in their obfuscation example. Hashtable should have been
included in the list of public items that should not be renamed.

....<snip>...
 
anon said:
Do you think these guys know something we don't know? At least for changing
the names to system classes?

But in regards to obfuscating source code, let's say if you did compile it
and then decompiled it.

The decompiled version would like or be similar to the source code which
would be obfuscated anyway.

Well, the decompiled source code wouldn't have the \uxxxx bits in when
they're not needed, and it would have appropriate line breaks. For
instance, here's a similar program to the one they were talking about,
but done by hand:

using System;class \u0051{static void \u004d\u0061\u0069\u006e()
{\u0043\u006f\u006e\u0073\u006f\u006c\u0065.\u0057rite\u004cine
("\u0048\u0065\u006c\u006c\u006f");}}

(It could be made worse, but that'll do...)

Looks pretty bad, right? Until you run it through a decompiler, which
gives (for the Main method, for instance):

private static void Main()
{
Console.WriteLine("Hello");
}

The only bit of obfuscation left is the class name (Q).
The name changes would be, at least to me, a somewhat considerable hurdle to
figure out or use the source code in the first place. You would have only a
glimmer of an idea of what the source code did as you could not read that
anyway.

Absolutely. Obfuscators renaming code is absolutely great - it's the
extra "features" they're promoting (like putting everything on one
line, changing some of the text to include unicode escapes, etc) which
make no odds in the long run.
 
see below...


Jon Skeet said:
Well, the decompiled source code wouldn't have the \uxxxx bits in when
they're not needed, and it would have appropriate line breaks. For
instance, here's a similar program to the one they were talking about,
but done by hand:

using System;class \u0051{static void \u004d\u0061\u0069\u006e()
{\u0043\u006f\u006e\u0073\u006f\u006c\u0065.\u0057rite\u004cine
("\u0048\u0065\u006c\u006c\u006f");}}

(It could be made worse, but that'll do...)

Looks pretty bad, right? Until you run it through a decompiler, which
gives (for the Main method, for instance):




When you say run it through a decompiler, Is that the very first step?

Or do you mean (1st) compile, then (2nd) decompile?

And which decompiler would you be talking about?
 
anon said:
When you say run it through a decompiler, Is that the very first step?

Or do you mean (1st) compile, then (2nd) decompile?

You'd compile first - but typically the only reason for obfuscation is
to protect your binaries anyway. If you don't want people reading your
source code, why give them it in any form to start with?
And which decompiler would you be talking about?

That was using Reflector, but I believe I'd get the same result with
others.
 
Back
Top