X2 vs X4

  • Thread starter Thread starter Dave
  • Start date Start date
D

Dave

If an application does NOT support multiple CPU cores, will it run
slower on a Phenom 2.4 GHz CPU than it would on an X2 2.4 GHz CPU?

I currently have a 2.2 GHz X2 and I want to upgrade it. My motherboard
supports the Phemon X4 but from what I'm reading, software that doesn't
support multiple cores may run slower if I do.

Any advice?
 
Where did you read that software that doesn't support multiple cores may run
slower?

A thread on CraigsList a few days ago. Several people were discussing
performance issues and stated that software that does not support
multiple cores runs slower on a multi-core CPU than on a non-multi-core
CPU. Nobody disagreed with that statement in the thread.
 
Dave said:
A thread on CraigsList a few days ago. Several people were discussing
performance issues and stated that software that does not support
multiple cores runs slower on a multi-core CPU than on a non-multi-core
CPU. Nobody disagreed with that statement in the thread.

I'd not consider craigslist to be a top technical forum.

Given identical clock speeds and voltages, a single-threaded application
will perform equally on a single core or a multi-core box. The multi-core
box will, of course, be able to run multiple copies of the single-threaded
application much faster than the single core box.

When one considers that a typical operating system often has dozens of
processes running other than the "foreground application", an application
on a multicore system _may_ perform better, because the operating system processes
can run on the other core freeing capacity for the application. Now, this
really only holds if the application is using 80% or more of the processor (e.g.
mp3 encoders, video transcoders, numerical analysis applications, etc). Most
graphical applications seldom use significant amounts of processing power.

scott
 
Dave Feustel said:
The effective clock speed of a single core in a multiple core chip is
the chip clock speed divided by the number of cores,

This is incorrect.

All cores run at the same clock speed, which is the 'chip clock speed'. Of course
the power-management capabilities of the processor allow the operating system to
individually ramp-down the voltages and frequencies of each core to allow them to
run slower (when idle), but the norm is for all cores to run at the same clock
speed which is equal to (not a fraction of) the core clock speed.

So called SMT (aka Hyperthreading) is different, in that the secondary thread is
leveraging otherwise idle execution and load/store resources on a single core.

scott
 
Zootal said:
I don't get this - what can hyperthreading do that a good cpu scheduler
can't do?

Leverage otherwise idle resources in the core. A core typically has
two or more integer ALU's and one or more floating point ALU's. These
allow superscaler behaviour (i.e. multiple instructions can be in flight
at the same time (multiple issue)). However, for many instruction streams, not all
of the ALU's and FPU's are used, so a second 'logical' processor (the
hyperthread) can be made available to the operating system to take advantage
of those idle resources.

Note that even with HT/SMT, the operating system sees them as two
distinct cores, even though they aren't really stand-alone cores.

A four physical core processor with SMT will appear to the
operating system as 8 logical cores.
If I have two virtual cores, I have to have two schedulers running
(one for each virtual cpu), each with their own set of queues and each with
50% cpu time. Is that more efficient then one single scheduler that has 100%
cpu time?

There is only one scheduler in a typical operating system. It schedules
across all logical cores and is typically NUMA and SMT aware in order to
make optimal scheduling decisions. NUMA awareness means scheduling
user threads/tasks on a CPU close to memory. SMT aware schedulers understand
that resources are shared and attempt to schedule related threads (i.e.
threads from the same process/job/task) on the secondary threads.

scott
 
Dave Feustel said:
The person who told me this is Miles R***, a person who sells computers
for a living. If the cores ran at the chip's nominal clock speed, a
four-core chip would perform 4 times faster than a single core chip at
the same clock speed, which they don't. And the power consumption would
be much higher. So I think Miles is correct.

No, this is not correct.

Either you misinterpreted "Miles R***", or he is quite ignorant about
his own product (or both).

-Miles
 
The person who told me this is Miles R***, a person who sells computers
for a living. If the cores ran at the chip's nominal clock speed, a
four-core chip would perform 4 times faster than a single core chip at
the same clock speed, which they don't. And the power consumption would
be much higher. So I think Miles is correct.

The four core chip can only run an application on all four cores if
it's threaded and at least 4 threads have work that can be run
simultaneously. Even in threaded applications this can't always happen
unless the threads are doing something that doesn't depend on others,
say converting a video file where each core can be given a section of
the file to convert.

I can see how he came to the conclusion though if he ran a single
threaded application and it ran four times slower than expected, since
it ran on only one core. Get him to run four of them at the same time
and they should complete in nearly the same time as one providing he
isn't running anything else at that time.

As for power consumption my dual core chip uses 45 watts and the quad
core version uses 95 watts. Taking into account the extra circuitry for
the 4 cores it's about right.
 
Jim Beard said:
Specifically with respect to X2 vs X4, the kernel scheduler will do a
fairly good job of using two CPUs, but rarely does well with more
than two unless the applications are specifically tailored for

maybe with respect to windows, but linux schedulers are O(1) over
large numbers of cores.

scheduler overhead is pretty much non-existent.

scott
 
Bill said:
You're entitled to your opinion, but as far as "The effective clock
speed of a single core in a multiple core chip is the chip clock
speed divided by the number of cores" is concerned Miles R*** is full
of shit*, and you can tell him I said so.

You need to get to Intel's or AMD's website and do some reading.

Bill
I have a X4 and each core at default is 2.5ghz.
 
In message <[email protected]> Dave Feustel
The person who told me this is Miles R***, a person who sells computers
for a living.

"Never trust someone trying to sell you something" comes to mind.
If the cores ran at the chip's nominal clock speed, a
four-core chip would perform 4 times faster than a single core chip at
the same clock speed, which they don't.

Depending on your task, a four-core CPU can perform reasonably close to
four times the clock speed of a single core CPU. Unfortunately, few
tasks parrallelize that well, and even less software takes full
advantage of modern CPUs.

That being said, aside from some shady marketing in the past advertising
dual CPU systems as double the clock speed of one CPU rather then
advertising the actual configuration, each core runs at the full clock
speed advertised.
 
Dave said:
So the 4 core chip cpu should run 4 independent identical tasks (compute
pi to 1 million digits) in essentially the same time that a single core
runs one instance of that task?

Yes
 
In message <[email protected]> Dave Feustel
So the 4 core chip cpu should run 4 independent identical tasks (compute
pi to 1 million digits) in essentially the same time that a single core
runs one instance of that task?

More or less, yes. However, in the real world, not all tasks will scale
quite this well as many tasks require not only CPU resources, but also
other resources which may become starved before you load all four cores.

For something that can be done entirely on-chip, you'll get four times
the performance using all four cores of a quad 2.4GHz CPU then a single
core version of the same 2.4GHz CPU.
 
Zootal said:
Are you sure about that? Each cpu has its own set of runqueues. If I have 4
cpus, I have 4 sets of runqueues to manage, and 4 sets of runqueues to
search. The runqueue itself can be searched for the next entry in O(1)
time - this is where the O(1) comes from, because the amount of time it
takes to find the next task in the queue is constant and not dependant by
the number of tasks in the queue.

I would think that that the default linux scheduler is O(n) over large
number of cores, where n = the number of cores.

If you have a runqueue per core, then you simply schedule the next
entry in the queue for each core. O(1). Remember that code is shared by all
processors, and scheduling happens in-context - there is not a
scheduler "thread" or "job" or "task" per se.

scott
 
Back
Top