Here's a Dell story you don't see too often

  • Thread starter Thread starter Yousuf Khan
  • Start date Start date
Rob said:
The problem with browser spoofing like that is that it
lets the web developer get away with his stupidity.
In the long run I think you are far better served by
requiring the developers to do a good job instead of
looking for work-arounds on your end when they screw up.

I even pointed out to my ISP's tech support about the fact that it comes
back with this kind of error message on Opera browsers, even with browser
spoofing in place. Their standard answer was that they don't support Opera.
But if you look at their policies, they say they don't support IE or
Netscape/Mozilla either, yet they had no problems in making this stuff
compatible with their products. If they don't support anything, then they
should have equal opportunity "no support". Even "no support" means quite
different things depending on products. :-)

Yousuf Khan
 
Now, how could this benefit a Hyperthreading processor over a non-HT one?
Well, in an HT CPU, the benchmark can configure it such that it runs the
applications test-script in the primary HT logical processor, while all of
the synthetic load-generating simulations are executed in the secondary
logical processor. Windows would multitask the applications test script in
the run queue of one logical processor where nothing else would be running,
while the synthetics would contend amongst themselves for attention in the
secondary logical processor. In a non-HT CPU, all of the tasks (whether real
or synthetic) would contend for timeslices within the same Windows' run
queue.

Are you speaking from experience about ability of a Windows application to
allocate one logical processor solely for itself, or this is your
conjecture? Last time I heard (and it was today, from a big
manager from a very relevant company), MSoft has no hardware concept
of asymmetrical multitasking, but I might misinterpreted the issue...
Could you clarify please?

- aap
 
Everyone who multitasks cares about how much work is being done in the
background.

Not true for everyone. For example, playing music is a background task, but
as long as
the stream is delivered isochronously, the amount of work doesn't matter
much
and hard to quantify, only to the extent it affects timing of the foreground
task.
Same for video playback, or CD burning without buffer underrun.
Also, some non-time-critical tasks like backups, virus checks, or
multi-megabyte
movie downloads are of little concern about time as compared to some
foreground
time-critical task.

- aap
 
alexi said:
Not true for everyone. For example, playing music is a background
task, but as long as
the stream is delivered isochronously, the amount of work doesn't
matter much
and hard to quantify, only to the extent it affects timing of the
foreground task.

You just gave an example of why background multitasking performance *DOES*
matter, while trying to show the opposite. People care about the playback
quality, they don't want the music to jump or skip while working in the
background. Fortunately, this is not a huge performance hurdle to achieve,
so most people are happy if the machine is simply able to keep up with the
music stream in the background, and if it can do even better than that, they
don't care or even notice.
Same for video playback, or CD burning without buffer underrun.
Also, some non-time-critical tasks like backups, virus checks, or
multi-megabyte
movie downloads are of little concern about time as compared to some
foreground
time-critical task.

All of those do have some very loose performance criteria too. A movie
download or a tape backup that takes too long will be noticed by people,
eventually. Every one of these have an upper limit criteria beyond which
people start to worry about what's going on.

Anyways, these are not the sorts of background tasks we were talking about.
The background tasks in this benchmark were entirely fake simulated
workloads whose entire purpose was to slow down the processor from executing
the the foreground tasks fast enough. They served no other purpose, and once
the foreground task finished, these tasks also finished.

Yousuf Khan
 
alexi said:
Are you speaking from experience about ability of a Windows
application to allocate one logical processor solely for itself, or
this is your conjecture?

There is code available from Intel itself which can show you how to
determine how many physical and logical processors there are in a system.
Doable completely in user mode, without OS assistence.
Last time I heard (and it was today, from a
big
manager from a very relevant company), MSoft has no hardware concept
of asymmetrical multitasking, but I might misinterpreted the issue...
Could you clarify please?

What do you mean by asymmetrical multitasking?

Yousuf Khan
 
Yousuf Khan said:
There is code available from Intel itself which can show you how to
determine how many physical and logical processors there are in a system.
Doable completely in user mode, without OS assistence.

I can detect how many processors are there, not a problem. However,
I believe there is a huge problem to allocate any specific processor
to a specific task in Windows. Correct me if I am wrong.

What do you mean by asymmetrical multitasking?

Opposite to Symmetrical MT: Ability to assign a process/task to a specific
processor (possibly optimized in a specific way) in Windows environment.
 
Adam Warner said:
Hi alexi,


You're wrong. The technical term is "processor affinity".

<http://www.google.com/search?q="processor+affinity">

Regards,
Adam

Thanks Adam, yes, there is a concept of processor affinity, and apparently
some means to control the task. With this regard, Yousuf is right. However,
this mechanism is related to allocating logical processors, not physical
processors. As far as I remember, enumeration of physical processors (in x86
world) is
a random process. Different physical processors may have some asymmetry in
the way
they are hooked up in the system (different configuration of Hypertransport
links for example),
therefore they may have different advantages and disadvantages with regard
to
different I/O-loaded tasks, while Win OS treats processors symmetrically.

Regards,
-aap
 
alexi said:
I can detect how many processors are there, not a problem. However,
I believe there is a huge problem to allocate any specific processor
to a specific task in Windows. Correct me if I am wrong.

I am no Windows programmer, but I've glanced over code from Intel that
basically does allow you to change your CPU context to whichever CPU you
like. I didn't pay it too much detailed attention, so I can't give you the
details. I believe it's buried somewhere in here:

http://www.intel.com/technology/hyperthread/
Opposite to Symmetrical MT: Ability to assign a process/task to a
specific processor (possibly optimized in a specific way) in Windows
environment.

Don't know. The Intel examples certainly seem to be able to jump from
processor to processor to processor.

Yousuf Khan
 
alexi said:
Thanks Adam, yes, there is a concept of processor affinity, and
apparently some means to control the task. With this regard, Yousuf
is right. However, this mechanism is related to allocating logical
processors, not physical processors.

The examples in the Intel website certainly made no distinction between
physical or logical processors. In fact, as far as Intel is concerned, all
of the processors are just logical processors. It's just that there would be
two logical processors per physical processor. Intel's Hyperthreading
mechanism makes allows for upto 256 logical processors in a physical
processor (there's an 8-bit counter for logical processors).
As far as I remember,
enumeration of physical processors (in x86 world) is
a random process. Different physical processors may have some
asymmetry in the way
they are hooked up in the system

Nope, not random at all, it's all governed by the APIC specifications how
processors are enumerated. It may have been random prior to the advent of
APIC, but now there's a specific enumeration order. Part of the spec is that
the secondary logical processors are counted well after all of the primary
logical processors have been counted.
(different configuration of
Hypertransport links for example),
therefore they may have different advantages and disadvantages with
regard to
different I/O-loaded tasks, while Win OS treats processors
symmetrically.

Almost all operating systems have the ability to allocated certain tasks to
certain processors or processor groups. Windows and most other OSes will
simply allocate them round-robin by default, but there are administrative
commands available to set the processor affinity. Therefore if there are
administrative commands that can do it, then other programs should be able
to access the same facilities.

Yousuf Khan
 
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

It doesn't work with either the latest Mozilla or Mozilla
Firebird. I also talked to someone I know who has a Sparc III
system running Solaris and Linux - he can't view it with any
of his browsers. For me, not using crapware like IE
is my choice - he has no such option at all.

Try this:

http://www.chrispederick.com/work/firefox/useragentswitcher/

Going back to the aforementioned website with the user-agent string set to
IE 6 brought up a login page. None of the links on the page went anywhere
beyond that, other than a link to Intel's website. There's not even an
option for creating a new account, and BugMeNot had no login information for
it. Overall, it appears to be yet another content-free website.

_/_
/ v \ Scott Alfter (remove the obvious to send mail)
(IIGS( http://alfter.us/ Top-posting!
\_^_/ rm -rf /bin/laden >What's the most annoying thing on Usenet?

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (Linux)

iD8DBQFBJjd2VgTKos01OwkRAmChAKCcSm59hsjplRuCS540slfN2RrojQCfXNXI
Y1GtrJVC9G0b8O4/xkR3tmI=
=DzcS
-----END PGP SIGNATURE-----
 
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Are you speaking from experience about ability of a Windows application to
allocate one logical processor solely for itself, or this is your
conjecture? Last time I heard (and it was today, from a big
manager from a very relevant company), MSoft has no hardware concept
of asymmetrical multitasking, but I might misinterpreted the issue...
Could you clarify please?

I don't think the OP was saying that the benchmark was forcing Windows
itself to run on one processor and apps to run on another. Instead, the
benchmark's main thread was set to run on one processor and its various
background threads were set to run on the other. The SetProcessAffinityMask
system call lets you restrict a process and its subthreads to run on the
processor(s) you specify:

http://msdn.microsoft.com/library/d...en-us/dllproc/base/setprocessaffinitymask.asp

One app I can think of offhand that uses this feature is Prime95...on an MP
system, you can run multiple instances of Prime95, each set to run on a
different processor.

_/_
/ v \ Scott Alfter (remove the obvious to send mail)
(IIGS( http://alfter.us/ Top-posting!
\_^_/ rm -rf /bin/laden >What's the most annoying thing on Usenet?

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (Linux)

iD8DBQFBJjmLVgTKos01OwkRAnOgAJ9+R5Uh1TBolPoJL9VZGvM1o0rl2QCgrWE7
j03qVnypA4BOWvs2gZp/O9M=
=hR51
-----END PGP SIGNATURE-----
 
Robert Myers wrote:

[SNIP]
Now the article has my attention. I don't care _that_ much about the
details of what a workstation does (certainly not fifteen percent), but
I don't want my workstation to be "shut down" just because I've got a
few things going.

The phrase "shut down" does stretch credibility somewhat.

If the methodology is not published in enough detail to reproduce
the results then personally I will conclude that they know their
results are bogus.


Cheers,
Rupert
 
Rupert said:
Robert Myers wrote:

[SNIP]
Now the article has my attention. I don't care _that_ much about the
details of what a workstation does (certainly not fifteen percent),
but I don't want my workstation to be "shut down" just because I've
got a few things going.


The phrase "shut down" does stretch credibility somewhat.

Mmmm. Another discussion about "responsiveness" and how to characterize
it? Perhaps you would have been less put off if he had said the system
becomes "unresponsive," whatever that means.

I came across a thread about this very subject on realworldtech.com
(which thread references this thread). That's alot of buzz about one
sloppy benchmark for a trade rag and some overblown rhetoric to go with it.

As a discussion generator, overdrawn rhetoric works, and, by that
measure, we should probably acknowledge that nothing here has harmed Mr.
Kennedy's career trajectory or Infoworld's readership.
If the methodology is not published in enough detail to reproduce
the results then personally I will conclude that they know their
results are bogus.

This puts us both in the position of speculating about mindset, but I'll
doubt Mr. Kennedy is as cynical about his results as your comment implies.

I started to type a comment to the realworldtech thread, but stopped
because I realized what a quagmire I was wading into. Benchmarks, like
statistics, are more frequently misused than used in a way that bears
examination.

As it is, though, I'm not quite as dismissive of Mr. Kennedy's little
squib as you are:

<quote>

The Peak CPU Saturation Index, which is calculated from a sampling of
the Processor Queue Length counter as exposed by the Windows Performance
Data Helper libraries, showed that, on average, the Opteron system had
16 percent more waiting threads in its queue -- a clear indication that
the system was in fact CPU-bound and running out of processor bandwidth.

</quote>

That's a remark with content that ostensibly yields insight into what's
actually happening (significantly more denotative than "shut
down")--more than you can expect from the eye-blurring pages of
benchmarks you typically see and have to try to draw some insight from
(because the publisher certainly hasn't provided it).

The details of the benchmark can't be repeated, but the conclusion can
be confirmed or refuted. Design your own bogus benchmark, see what
happens to the Processor Queue Length and publish your results. You
know ahead of time you will have readers. As a freelancer, you could
probably sell it, although probably not for enough to pay for your time.

RM
 
Robert Myers said:
<quote>

The Peak CPU Saturation Index, which is calculated from a sampling of
the Processor Queue Length counter as exposed by the Windows
Performance Data Helper libraries, showed that, on average, the
Opteron system had 16 percent more waiting threads in its queue -- a
clear indication that the system was in fact CPU-bound and running
out of processor bandwidth.

</quote>

That's a remark with content that ostensibly yields insight into
what's actually happening (significantly more denotative than "shut
down")--more than you can expect from the eye-blurring pages of
benchmarks you typically see and have to try to draw some insight from
(because the publisher certainly hasn't provided it).

The details of the benchmark can't be repeated, but the conclusion can
be confirmed or refuted. Design your own bogus benchmark, see what
happens to the Processor Queue Length and publish your results. You
know ahead of time you will have readers. As a freelancer, you could
probably sell it, although probably not for enough to pay for your
time.

The 16% higher average processor wait queue length comes back to my
conjecture that the benchmark has simply shuffled off the unmeasured
synthetic load-generating scripts into a separate logical processor, while
running the measured real-app script in its other logical processor. In a
non-HT processor, all of those threads would have run inside a single
processor queue, but in an HT processor they run inside two queues. So it's
the difference between running one thread in one queue and three threads in
the other queue, vs. running all four threads in one queue.

Yousuf Khan
 
Yousuf said:
The 16% higher average processor wait queue length comes back to my
conjecture that the benchmark has simply shuffled off the unmeasured
synthetic load-generating scripts into a separate logical processor, while
running the measured real-app script in its other logical processor. In a
non-HT processor, all of those threads would have run inside a single
processor queue, but in an HT processor they run inside two queues. So it's
the difference between running one thread in one queue and three threads in
the other queue, vs. running all four threads in one queue.

Easy enough to demonstrate if you have the appropriate hardware. I don't.

RM
 
Yousuf Khan said:
The 16% higher average processor wait queue length comes back to my
conjecture that the benchmark has simply shuffled off the unmeasured
synthetic load-generating scripts into a separate logical processor, while
running the measured real-app script in its other logical processor. In a
non-HT processor, all of those threads would have run inside a single
processor queue, but in an HT processor they run inside two queues. So it's
the difference between running one thread in one queue and three threads in
the other queue, vs. running all four threads in one queue.

Yousuf Khan

Yousuf,

Let me try to understand your assertion. You have two systems. One system is
(dual Opteron). It appears as two logical processors at the OS logical
level. The other system is dual Xeon, and it appears as 4 logical processors
to the OS. You are saying that the "measured real-app script" allocates a
dedicated logical processor to itself, while the other background loads are
running on what is left, right?
More, the other assertion was that the rest of background tasks are ran
unmeasured, right? I this case, the benchmark on the dual Opteron system
would run the measured app in one queue, and all other (unmeasured) threads
in another queue. I don't see how it is different from the system of 4
logical processors, given that the background performance goes unmeasured as
well. It looks like you keep forgetting that the Opteron system was also a
multiprocessor system. Am I missing something here?

- Alexei
 
Robert Myers wrote:

[SNIP]
<quote>

The Peak CPU Saturation Index, which is calculated from a sampling of
the Processor Queue Length counter as exposed by the Windows Performance
Data Helper libraries, showed that, on average, the Opteron system had
16 percent more waiting threads in its queue -- a clear indication that
the system was in fact CPU-bound and running out of processor bandwidth.

</quote>

That's a remark with content that ostensibly yields insight into what's
actually happening (significantly more denotative than "shut
down")--more than you can expect from the eye-blurring pages of
benchmarks you typically see and have to try to draw some insight from
(because the publisher certainly hasn't provided it).

My Windows NT (and later) internals knowledge is pretty dated because
I haven't had a play with it for some time... *If* that counter is per-
logical processor his observation would be consistent, but it would
not really help you guage what system throughput is though. Nor does it
really measure responsiveness.

Still, if he can make inadequately substantiated assertions about
responsiveness and system throughput then I can too :

I've used (early) Solaris boxes that had 50+ students banging away
with C++ compilers that remained very responsive... I have even used
a Pentium Pro 200 on WinNT 3.51 that had one CPU maxed out, and yet
it remained responsive although it was a bit slow on the screen
repaint I guess. ;)

This is why I suspect his methodology is broken...

Cheers,
Rupert
 
alexi said:
Let me try to understand your assertion. You have two systems. One
system is (dual Opteron). It appears as two logical processors at the
OS logical level. The other system is dual Xeon, and it appears as 4
logical processors to the OS. You are saying that the "measured
real-app script" allocates a dedicated logical processor to itself,
while the other background loads are running on what is left, right?
More, the other assertion was that the rest of background tasks are
ran unmeasured, right? I this case, the benchmark on the dual Opteron
system would run the measured app in one queue, and all other
(unmeasured) threads in another queue. I don't see how it is
different from the system of 4 logical processors, given that the
background performance goes unmeasured as well. It looks like you
keep forgetting that the Opteron system was also a multiprocessor
system. Am I missing something here?

It doesn't matter if the Opteron system was also a multiprocessor system, it
still has half as many run queues to work with than the HT system. If you
group all of the background processes onto the virtual processors through
processor affinity, that means you have those background processes will only
fight it out for timeslices amongst themselves, leaving the foreground
process free to occupy its own private run queue. It doesn't matter if there
were one processor, or two processors, or 4 or 8, in each case the SMT
system will have twice as many run queues.

Yousuf Khan
 
Back
Top