You're missing the point.
All instance data inside a class is stored on the heap, but within the
memory allocation for that object. The compiler figures computes the
instance data layout, just like it would for the local variables in a
procedure and then references all the instance data relative to the instance
base address. This is very simple to code in machine assembler (MOV AX,
WORD PTR [DX]) for example. Boxing is a runtime feature and is used only
when an object's type is unknown at compile time. When you put a struct
inside a class, the compiler knows the types at runtime, except when you
declare a structure component as a generic "Object".
Here's a simple set of rules for how storage is allocated:
Reference types, including anything derived from "Object": Heap pointer
from either a stack frame, global compilera allocated storage, or another
object on the heap.
All Value Types, including "Struct": Stored on the stack, in global
compiler allocated storage, or relative to the base address of an object on
the heap
Where this gets confusing is when you use a value type as part of the
instance data of a reference type. In this case, the value type is stored
on the heap, but inside the allocated space for the reference type instance.
When you use a reference type as instance data of a reference type, all that
is allocated inside the containing reference type instances is a pointer to
the instance data. This complexity is why C and C++ programs tend to have
memory leaks and why the .NET good garbage collector is required.
Note that I referenced "global compiler allocated storage". This is the
memory allocated by the compiler for globally accessible public variables as
well as static (VB Shared) variables associated with objects. On the x86
platforms, this memory is reference relative to the "DS" register.
In your original example,
namespace _111_
{
public struct S
{
public int i1, i2;
}
public class C
{
public S s;
}
class Program
{
static void Main(string[] args)
{
C myClass = new C();
myClass.s.i1 = 999;
myClass.s.i2 = 888;
//at this point some memory must be assigned for
myClass.s.i1 & myClass.s.i2
//question: where this memory was taken from? From stack?
Or from heap?
}
}
}
The class C is allocated from the heap. Since struct S is a value type, it
it stored entirely in the class. The named components of S are also value
types, so they are stored inside S:
Offset Value Code Reference Comments
0000 i1 base of class C, start of Struct S, int i1 is laid
down first by the compiler
0004 i2 int i2
0008 next object starts here
In code:
// myClass = allocate(C)
MOV EAX, 8 // C is 8 bytes long - GC_ALLOCATE will
return the offset in EAX as well
CALL GC_ALLOCATE // Have the garbage collector allocate 8 bytes of
storage
// The GC will add object
overhead, but these will be negative offsets from the returned address.
MOV [ESP], EAX // The variable myClass is on the stack at
offset 0 relative to the current stack frame BP
MOV EAX, myClass // myClass's address is actually stored on the
stack; this instruction can be optimized out in this case
MOV DWORD PTR [EAX], 999 // myClass.s.i1 = 999
MOV DWORD PTR [EAX]+4, 888 // myClass.s.i2 = 888
First - I don't guarantee the syntax (I haven't written in x86 assembler in
several years), but this is close enough for discussion.
Second - GC_ALLOCATE returns an offset into the heap. This is a "magic"
number that the memory management subsystem handles for you. It is relative
to the value in register DS, which is set at application startup by the
memory manager. As you should be able to see from the assembler code, there
is no "boxing" of the variables. Boxing requires a rather expensive call to
determine an object's actual data type.
All Reference types are allocated in a similar method to the above example.
Value types are allocated by the compiler and don't require the call to
GC_ALLOCATE at run time.
Note the magic occurring in "GC_ALLOCATE" - it allocates memory from the
heap and returns an offset into the heap that is then used by later code for
reference to the memory. The actual object size will be 8 bytes plus
garbage collector management buffer. The GC management buffer will be at
negative offets to the returned address, thus making the rest of the
compiler easier to write. If GC_ALLOCATE can't allocate the requested
memory, it calls GC_COLLECT, which runs compacts accessible heap memory and
resets the allocation pointer for GC_ALLOCATE, which then tries again. If
GC_ALLOCATE still can't allocate memory, it asks the OS to extend the heap.
The OS returns the new heap size to GC_ALLOCATE. If GC_ALLOCATE still can't
allocate the requested memory, it throws an Out of Memory exception.
Mike Ober.
Göran Andersson said:
Jon Skeet [C# MVP] wrote:
OK, I think the same. BUT! In other words myClass.s.i1 is already in
heap, yes? Now - what is boxing? Boxing is creating special packed
version of value-type in heap. In our case i1 ALREADY in heap. So...
object o = myClass.s.i1; //_not_ a boxing here?
???
Yes, it's boxed. You are not storing the myClass.s.il variable in the
object, you are storing a copy of the value of the myClass.s.il variable.
No, there's no boxing going on there. Boxing is creating a separate
object on the heap for a value type. That's not happening here.
Yes, it is.
myClass.s.i1 is an integer variable. The value copied from that integer
variable can not be stored in the variable o, as it is an object
reference, so a separate object has to be created on the heap where the
value can be stored, and the reference to that object is stored in the
variable o.