new wchar_t[5]

  • Thread starter Thread starter Isak Dinesen
  • Start date Start date
I

Isak Dinesen

Being relatively new to c++, I've recently discovered a few things
about the behavior of new with respect to wchar_t, ZeroMemory and
delete. I can't seem to find documentation describing the following
three behaviors:

1. int len = 5; wchar_t *pwch = new wchar_t[len];

Subsequent inspection of pwch always show that it is allocated space
for 12 WCHARs. (The amount of overallocation varies depending on the
value of len).

2. By default pwch always begins with a string of five 0xcdcd's,
followed by two 0xfdfd's, then four 0xabab's and finally the string is
always terminated by a 0xfeee.

3. If I somehow overwrite any of the 0xfdfd's, 0xabab's or the 0xfeee,
subsequent calls to delete fail and my application terminates
abnormally.

After spending an hour or so exploring, it has become clear that VC
likes to overallocate storage for wchar_t, uses a 0xfdfd, 0xabab,
0xfeee sequence to identify the overallocated bytes so delete can
correctly identify and free the bytes following the null string
terminator.

As a newb, I wanted to use ZeroMemory to zero-out the entire 12 WCHARs
rather than just the WCHAR at position pwch + 4. When that failed, I
read the documentation on new wchar_t and delete, and hunted around
usenet for about 2 hours before attempting to reinvent the wheel and
solve the problem on my own (losing another 2 hours).

Where does one find documentation on the allocation of specific types?

Why are these strings over-allocated? Is that an optimization?

Any replies are much appreciated.
 
Isak Dinesen said:
Being relatively new to c++, I've recently discovered a few things
about the behavior of new with respect to wchar_t, ZeroMemory and
delete. I can't seem to find documentation describing the following
three behaviors:

1. int len = 5; wchar_t *pwch = new wchar_t[len];

Subsequent inspection of pwch always show that it is allocated space
for 12 WCHARs. (The amount of overallocation varies depending on the
value of len).

You asked for 5 wchar_t's. You got them. The fact that they live in a
"bigger neighborhood" than what you expected is beside the point. :-)
2. By default pwch always begins with a string of five 0xcdcd's,
followed by two 0xfdfd's, then four 0xabab's and finally the string is
always terminated by a 0xfeee.

In the presence of a debugger, compilers like to instrument your program in
ways that make it easier for them to diagnose user errors. For example, if
you make a common error and write

pwch[5] = blah;

at runtime VS may be abe to tell you that you overran a buffer.
3. If I somehow overwrite any of the 0xfdfd's, 0xabab's or the 0xfeee,
subsequent calls to delete fail and my application terminates
abnormally.

Yes. This shouldn't surprise you. :-) Writing to memory that you didn't
allocate can do just about anything. Virus writers use this technique to
take control of an application or a set of them. Do not do this, ever!

When C was hot, I heard it said that "C sets policy, it does not enforce
practice". In other words, the language gives a developer enough rope to
hang himself. C++ tries to "improve" on that state of affairs in some ways,
but at the end of the day, if you try you can still hurt yourself. Badly.
Some say, for example, that an application developer should never manipulate
a "naked" pointer but rather use a "smart" pointer type. That's a subject
for another day.
Where does one find documentation on the allocation of specific types?

At some level, you shouldn't be concerened about implementation. If you ask
for an array of N elements of type T you get them in sequential storage
locations. In fact, the behavior you report may be different in a release
build than in a debug build depending on options selected. The behavior can
(and does) change across compiler versions.

To satisfy your curiosity, however, you might want to search the MSDN-CD or
at http://msdn.microsoft.com for the article "Compiler Security Checks In
Depth" by Brandon Bray (who sometimes posts here) if you want to learn more
about VS code generation in debug builds, etc.

Regards,
Will
 
William gave a terrific answer (and not just because he mentioned my
article). I just wanted to fill in one other thing.

Isak said:
Why are these strings over-allocated? Is that an optimization?

As William said, it shouldn't matter to you as a user. Nevertheless, it is
nice to know. The reason is that the smallest allocation that malloc or new
can return on x86 is 16-bytes. You'll find that all memory is dealt out in
16-byte chunks (unless you change the memory manager).

The main reason for doing this is to help align memory. When accessing
unaligned memory (i.e., accessing a double on an 4-byte alignment rather
than 8-byte alignment) the x86 stalls. Also, if the allocation chunks are
too small, the memory manager has more to manage. It tends to lead to
greater memory fragmentation which over time looks like memory leaks. What
the memory manager does now is empiracally the best thing in general for
most programs.

Hope that helps. Cheerio!
 
Back
Top