x64 and BSTR allocation, what has changed?

  • Thread starter Thread starter Egbert Nierop \(MVP for IIS\)
  • Start date Start date
Hi Eberhard!
But you do realize that you must create, manipulate and destroy a BSTR
only in the documented ways, essentially using the Sysxxx functions?

Yes. That´s why I do not care about the internals of BSTR...
The preceding length value is an implementation detail,
and therefore we should not make any assumptions -- agreed. But if you
fail to remember that a BSTR is /not/ just a wchar_t array, you'll get
into trouble.

Yes. You are right. For BSTR you should only use the SysXxx functions.

Greetings
Jochen
 
The internals are always good to know, especially for the OP case - if you
need to control allocations ...
 
The fact that BSTRs have a length prefix is fully and publicly documented <
see
http://msdn.microsoft.com/library/d...html/323cefbf-836c-4c9d-bcbe-f2663a57d2b5.asp >

This is a very important thing that you need to know and understand if you
ever manage BSTRs yourself. I don't know where you got the idea that this
is an undocumented, secret implementation detail - but it's absolutely not.
If you've done nontrivial code that works with BSTRs, I'm not sure how you
could get by without knowing these sorts of things.

Additionally, even the allocation strategy used for BSTRs by oleaut32 is
also documented (see the above link).
 
Hi Skywing!
The fact that BSTRs have a length prefix is fully and publicly documented <
see
http://msdn.microsoft.com/library/d...html/323cefbf-836c-4c9d-bcbe-f2663a57d2b5.asp >
Great!

This is a very important thing that you need to know and understand if you
ever manage BSTRs yourself.

I don't see the need to know the internals of this BSTR...
I don't know where you got the idea that this
is an undocumented, secret implementation detail - but it's absolutely not.

Maybe I was not yet in the situation to handle "nontrivial" BSTR code...


But in general: It is always good to known the the underlying system is
doing...

--
Greetings
Jochen

My blog about Win32 and .NET
http://blog.kalmbachnet.de/
 
Jochen Kalmbach said:
Hi Skywing!

I don't see the need to know the internals of this BSTR...

Who told you that you need to?

When I ask a question, you just can assume that I see need and if you don't
know to answer, please refrain from vain comments then :)

In fact, I wanted a mechanisme, that has 'disabled' BSTR caching, without
affecting the whole process or using a sneaky API to disable caching.

Caching of strings in a process, is bad for performance for a Service.

You don't see a need, so don't waste time about that.
 
Since I started it, let me clarify. The layout is documented
to the extent that the 4 bytes preceding the pointer are
its length (in bytes I might add). It is also documented
that you should use SysAllocString and friends to allocate
BSTRs. What is not documented, however, is how SysAlloc
functions allocate the raw memory. It was common knowledge
that on Win32 the allocated pointer is 4 bytes before the returned
pointer, but this does not make it documented! (It is also public
knowledge that CoTaskMemAlloc is used for the memory
allocation of course.) Apparently, this allowed Microsoft to
change the allocation strategy to reflect the alignment requirement
for Win64 and return a 64-bit aligned pointer from SysAlloc
functions. Of course this breaks the aforementioned public
knowledge (and we extended it obviously...). There's nothing
wrong with knowing undocumented aspects. What's wrong is
to expect that these assumptions will continue to be valid
going forward.

--
=====================================
Alexander Nickolov
Microsoft MVP [VC], MCSD
email: (e-mail address removed)
MVP VC FAQ: http://www.mvps.org/vcfaq
=====================================
 
Egbert Nierop (MVP for IIS) said:
In fact, I wanted a mechanisme, that has 'disabled' BSTR caching, without
affecting the whole process or using a sneaky API to disable caching.

Caching of strings in a process, is bad for performance for a Service.

What are the symptoms you're seeing? High contention/cpu usage/
memory usage/something else? Does setting OA_NOCACHE=1 help?

What OS is this?
 
BTW, I'm not suggesting OA_NOCACHE as a permanent workaround.
But it would be useful to know if it helps alleviate whatever symptoms
you are seeing.
 
Pavel Lebedinsky said:
What are the symptoms you're seeing? High contention/cpu usage/
memory usage/something else? Does setting OA_NOCACHE=1 help?

What OS is this?

Any server version of windows (starting with windows 2000).

I very simple program showed me that BSTR caching, only works within -the
same- thread. So, in IIS, when you use a lot of oleautomation (COM &
scripting) and each thread creates its own garbage etc, it is really useless
to create a caching mechanism. In terms of scalability, using thread local
storage, is evil :)

And OA_NOCACHE only helps as I understood, for a debug DLL as the
documentation says.

And there is also a new API call, starting from XP Sp1 which disables
caching. But my component, should not disable caching in a process that I do
not host.

Thanks anyway for your comments.
 
Alexander Nickolov said:
Since I started it, let me clarify. The layout is documented
to the extent that the 4 bytes preceding the pointer are
its length (in bytes I might add). It is also documented
that you should use SysAllocString and friends to allocate
BSTRs. What is not documented, however, is how SysAlloc
functions allocate the raw memory. It was common knowledge
that on Win32 the allocated pointer is 4 bytes before the returned
pointer, but this does not make it documented! (It is also public
knowledge that CoTaskMemAlloc is used for the memory
allocation of course.) Apparently, this allowed Microsoft to
change the allocation strategy to reflect the alignment requirement
for Win64 and return a 64-bit aligned pointer from SysAlloc
functions. Of course this breaks the aforementioned public
knowledge (and we extended it obviously...). There's nothing
wrong with knowing undocumented aspects. What's wrong is
to expect that these assumptions will continue to be valid
going forward.

I don't quite understand this.
Imagine, that the total prefix, still was 4 bytes, instead of 8, this has
nothing to do with alignment imho.
CoTaskMemAlloc, just returns a 64-bit pointer and in 32-bit windows, a
32-bit pointer. If alignment _plays_ a role, then not the SysAlloc*
functions deal with that, but CoTask* deals with it! I dare to bet my shoes
on this...

:)
 
Hi Egbert!
I don't quite understand this.
Imagine, that the total prefix, still was 4 bytes, instead of 8, this
has nothing to do with alignment imho.
CoTaskMemAlloc, just returns a 64-bit pointer and in 32-bit windows, a
32-bit pointer. If alignment _plays_ a role, then not the SysAlloc*
functions deal with that, but CoTask* deals with it! I dare to bet my
shoes on this...

The problem is that the allocated pointer will be 8-byte aligned
(CoTaskMemAlloc will make this sure).

And now many functions pass the BSTR-Pointer to function which treats
the BSTR-Pointer as wchar_t*. And now if the "prefix" of the BSTR would
only be 4 bytes, then this BSTR-Pointer would be misaligned (not 8 byte
alligned).

That's IMHO the reason why the BSTR is now padding a 4-byte free-space
after the allocation-unit.
But this behaviour was never documented and might be change in the future...

Documented is only:
BSTR-4: Len
BSTR: String

Greetings
Jochen
 
Jochen Kalmbach said:
Hi Egbert!


The problem is that the allocated pointer will be 8-byte aligned
(CoTaskMemAlloc will make this sure).

And now many functions pass the BSTR-Pointer to function which treats the
BSTR-Pointer as wchar_t*. And now if the "prefix" of the BSTR would only
be 4 bytes, then this BSTR-Pointer would be misaligned (not 8 byte
alligned).

That's IMHO the reason why the BSTR is now padding a 4-byte free-space
after the allocation-unit.

The pointer to every thing in 64-bit windows, =is= just a pointer. What it
points to, can even be 1 byte!.

I think the 4 byte waste, has to do with future plans, not with alignment.

Another possibility is that some design guys at MS decided that BSTRs should
not be limited by the 2GB / 4GB limit and after this decision, they found
that a lot of interop (COM/RPC) would not work anymore.

Imagine, you do COM interop (ie RPC interop) with a 32-bit machine. The BSTR
is persisted by value, including the length-prefix! WHile a LPWSTR (if
interoped) would not have that problem, it is simply for instance a 5 GB
string ( a little bit long, but though, it's the idea) would just be 0
terminated...

But this behaviour was never documented and might be change in the
future...

Documented is only:
BSTR-4: Len
BSTR: String

There are guys, who don't dare to bet on technology and guys who do.

For instance, NORTON was a programmer, who found undocumented IBM PC BIOS
function calls and used them as well. I know some other guys.
For instance, Ethan Whiner, an ASM programmer who wrote and nearly rewrote
QuickBasic and PDS. Now he 's so rich, he can play cello and do what he
wants the rest of his live.

THe ones who bet, sometimes get hurt, but get a much wider horizon.

Anyway, my product works for years, and it now also will work for years on a
64-bit platform.
The nextplatform with 128-bit pointers :)) , would I still live then?
All that time, my product has the best scalability, and that's what I
prefer...
 
I think you misunderstand alignment. A pointer alignment deals
with the address the pointer points to, whereas data alignment
deals with the address a member of a struct is aligned on. The size
of the struct member dictates the minimum alignment required,
but pointer alignment is fixed - at least the CPU addressing
width, which in case of Win64 is 8 bytes. Since SysAlloc
functions return pointers, they must ensure proper alignment
as per the architecture. Note the alignment has only slight
performance implication on x86 architecture systems, but
misalignment can lead to program crash on other platforms.
Imagine you store a (properly aligned) struct in the buffer
returned by SysAllocStringByteLen and access it after type-
casting. If the pointer is misaligned, so will be the struct members!

--
=====================================
Alexander Nickolov
Microsoft MVP [VC], MCSD
email: (e-mail address removed)
MVP VC FAQ: http://www.mvps.org/vcfaq
=====================================
 
Perhaps I wasn't very clear - struct member alignment is relative,
whereas pointer alignment is absolute. I hope this carries out
my message in more clear terms...

--
=====================================
Alexander Nickolov
Microsoft MVP [VC], MCSD
email: (e-mail address removed)
MVP VC FAQ: http://www.mvps.org/vcfaq
=====================================

Alexander Nickolov said:
I think you misunderstand alignment. A pointer alignment deals
with the address the pointer points to, whereas data alignment
deals with the address a member of a struct is aligned on. The size
of the struct member dictates the minimum alignment required,
but pointer alignment is fixed - at least the CPU addressing
width, which in case of Win64 is 8 bytes. Since SysAlloc
functions return pointers, they must ensure proper alignment
as per the architecture. Note the alignment has only slight
performance implication on x86 architecture systems, but
misalignment can lead to program crash on other platforms.
Imagine you store a (properly aligned) struct in the buffer
returned by SysAllocStringByteLen and access it after type-
casting. If the pointer is misaligned, so will be the struct members!

--
=====================================
Alexander Nickolov
Microsoft MVP [VC], MCSD
email: (e-mail address removed)
MVP VC FAQ: http://www.mvps.org/vcfaq
=====================================

Egbert Nierop (MVP for IIS) said:
The pointer to every thing in 64-bit windows, =is= just a pointer. What
it points to, can even be 1 byte!.

I think the 4 byte waste, has to do with future plans, not with
alignment.

Another possibility is that some design guys at MS decided that BSTRs
should not be limited by the 2GB / 4GB limit and after this decision,
they found that a lot of interop (COM/RPC) would not work anymore.

Imagine, you do COM interop (ie RPC interop) with a 32-bit machine. The
BSTR is persisted by value, including the length-prefix! WHile a LPWSTR
(if interoped) would not have that problem, it is simply for instance a 5
GB string ( a little bit long, but though, it's the idea) would just be 0
terminated...



There are guys, who don't dare to bet on technology and guys who do.

For instance, NORTON was a programmer, who found undocumented IBM PC BIOS
function calls and used them as well. I know some other guys.
For instance, Ethan Whiner, an ASM programmer who wrote and nearly
rewrote QuickBasic and PDS. Now he 's so rich, he can play cello and do
what he wants the rest of his live.

THe ones who bet, sometimes get hurt, but get a much wider horizon.

Anyway, my product works for years, and it now also will work for years
on a 64-bit platform.
The nextplatform with 128-bit pointers :)) , would I still live then?
All that time, my product has the best scalability, and that's what I
prefer...
 
Alexander Nickolov said:
Perhaps I wasn't very clear - struct member alignment is relative,
whereas pointer alignment is absolute. I hope this carries out
my message in more clear terms...

Thanks for the clarification.

Because you said this, I found out (by using CoGetMalloc), that the
allocation (length) alignment is 16 bytes, always. Applied to 32-bit or 64
bit systems.
 
Yes, that's to make sure you are alignment-compatible with
future processors for quite some time... :) I don't imagine we'll
be porting to 256-bit processors any time soon (these would
require 32-byte alignment).

However, SysAlloc functions are only 4-byte aligned since
they only need to support 32-bit CPUs. With Win64 their
alignment has to be upped up a bit to 8 bytes.

--
=====================================
Alexander Nickolov
Microsoft MVP [VC], MCSD
email: (e-mail address removed)
MVP VC FAQ: http://www.mvps.org/vcfaq
=====================================
 
Egbert Nierop (MVP for IIS) said:
I very simple program showed me that BSTR caching, only works within -the
same- thread. So, in IIS, when you use a lot of oleautomation (COM &
scripting) and each thread creates its own garbage etc, it is really
useless
to create a caching mechanism. In terms of scalability, using thread local
storage, is evil :)

I believe the original goal of BSTR caching was to reduce contention
on the process heap by keeping a small number of freed BSTRs
in TLS. It mostly succeeds in this, though the effect can usually be
seen only in artificial benchmarks that allocate tons of BSTRs from
multiple threads.

The real problem with BSTR caching is that it can result in very
inefficient memory usage for certain types of workloads. If you
set OA_NOCACHE=1 and your Process\Private Bytes drop
significantly then you're probably affected by this. Otherwise it
most likely makes no difference.
And OA_NOCACHE only helps as I understood, for a debug DLL as the
documentation says.

On XP and later OA_NOCACHE works with retail oleaut32.dll
(but it still should only be used for debugging purposes).
 
Pavel Lebedinsky said:
I believe the original goal of BSTR caching was to reduce contention
on the process heap by keeping a small number of freed BSTRs
in TLS. It mostly succeeds in this, though the effect can usually be
seen only in artificial benchmarks that allocate tons of BSTRs from
multiple threads.

I understand. I've seen a more flat performance and memory usage on a HTTP
benchmark. In the tested script lots of variables were created and
persisted.
For some reason, reallocating is faster than allocating/deleting so my
string heap management is based on reallocation, instead of using a cache.

However, this benchmark was done on a PIII 600. I've never rebenched this.
The real problem with BSTR caching is that it can result in very
inefficient memory usage for certain types of workloads. If you
set OA_NOCACHE=1 and your Process\Private Bytes drop
significantly then you're probably affected by this. Otherwise it
most likely makes no difference.

This is true. Just because of the fact that IIS and ASP is -pure- scripting.
On XP and later OA_NOCACHE works with retail oleaut32.dll
(but it still should only be used for debugging purposes).

A component builder never should rely on this or set it for customers.
 
Back
Top