Tony said:
No, I was saying the exact opposite.
256-bit wide L2 cache port, 64-bit wide L3
You just said you meant the opposite of what I thought you said, but
then provided numbers to back up what I thought you said. I find the
Xeon to be very strange if it has 256-bit width from L2 and 64-bit width
from L3. That's 32-bytes vs 8-bytes.
Itanium 2 returns data from the L2 256-bits at a time to either the L1D
or the L1I. It fills the L2 256-bits at a time.
P4EE/Xeon chips have something like a 10 cycle L2 latency and about a
40 cycle L3 latency. With Itanium my guess is that the spread is even
wider (ie the very small 256K of L2 cache in the Itanium2 probably has
very low latency while the huge 3-9MB of L3 cache probably has rather
high latency).
Itanium 2 latency is 5 cycles from L2 and 12 cycles from L3. Much
better than Xeon. Xeon has a ratio of 4:1 while Itanium 2 has a ratio
of 2.4:1. Those numbers are for McKinley (the 1GHz version). I believe
the Madison (1.5GHz version) raised the latency to L3 by 2 cycles, so 5
and 14 (2.8:1). Which corresponds to 3.33ns and 9.33ns total time for
the Itanium 2 at 1.5GHz vs (since I don't know what speed Xeon your
numbers are for I'll assume the 3.0GHz Xeon MP with 4M cache) 3.33ns and
13.33ns total times. So the L2 caches have the same access time, but
the Itanium 2 is faster to reach its larger cache. I'm curious to see
what the timings will be on the Montecito, which ups the L3 ante to 12MB.
Alex