90 nm Athlon 64 die size

  • Thread starter Thread starter Grumble
  • Start date Start date
G

Grumble

This chart on AMD's web site made me curious:
http://www.amd.com/us-en/Processors/ProductInformation/0,,30_118_9485_9487^10248,00.html

Process Technology 90 or 130 nm, SOI technology
Packaging 754-pin or 939-pin organic micro PGA
Die Size 144 to 193 mm^2
Number of Transistors 68.5 to 105.9 million
(depending on cache size)

Clawhammer
1 MB L2 cache
105.9 million transistors
193 mm^2 @ 130 nm

OK, so the "193 mm^2" and "105.9 Mtransistors" are accounted for.

**Time for some guesstimates**

512 KB SRAM =~ 25-30 Mtransistors.

Thus, the Newcastle core =~ 76-81 Mtransistors.

Assume a slighty worse transistor density than Clawhammer, because
we removed dense areas... say 0.54 Mtransistors/mm^2.

Die size =~ 141-150 Mtransistors. This is probably where the chart's
"144 mm^2" comes from.

<guess>
Newcastle
512 KB L2 cache
78 million transistors
144 mm^2 @ 130 nm
</guess>

I still have several questions.

What's the die size for Winchester (Newcastle's 90 nm cousin)?
endian.net reports 83 mm^2.

Which core has only 68.5 Mtransitors? 37.4 Mtransitors seems a lot
for 512 KB L2 cache, no?

Is Winchester just a die shrink of Newcastle? I heard AMD might
implement SSE3 and minor micro-optimizations (LEA).

I seem to recall AMD once said 100 mm^2 was their sweet spot. If
they can manufacture enough Winchester cores, we might see sub-$100
Athlons again, or am I crazy here?
 
Grumble said:
This chart on AMD's web site made me curious:
http://www.amd.com/us-en/Processors/ProductInformation/0,,30_118_9485_9487^10248,00.html

Process Technology 90 or 130 nm, SOI technology
Packaging 754-pin or 939-pin organic micro PGA
Die Size 144 to 193 mm^2
Number of Transistors 68.5 to 105.9 million
(depending on cache size)

Clawhammer
1 MB L2 cache
105.9 million transistors
193 mm^2 @ 130 nm

OK, so the "193 mm^2" and "105.9 Mtransistors" are accounted for.

**Time for some guesstimates**

512 KB SRAM =~ 25-30 Mtransistors.

Thus, the Newcastle core =~ 76-81 Mtransistors.

Assume a slighty worse transistor density than Clawhammer, because
we removed dense areas... say 0.54 Mtransistors/mm^2.

Die size =~ 141-150 Mtransistors. This is probably where the chart's
"144 mm^2" comes from.

<guess>
Newcastle
512 KB L2 cache
78 million transistors
144 mm^2 @ 130 nm
</guess>

I still have several questions.

What's the die size for Winchester (Newcastle's 90 nm cousin)?
endian.net reports 83 mm^2.

Which core has only 68.5 Mtransitors? 37.4 Mtransitors seems a lot
for 512 KB L2 cache, no?

Is Winchester just a die shrink of Newcastle? I heard AMD might
implement SSE3 and minor micro-optimizations (LEA).

I seem to recall AMD once said 100 mm^2 was their sweet spot. If
they can manufacture enough Winchester cores, we might see sub-$100
Athlons again, or am I crazy here?

No one wants to comment? :-)
 
Its at least in the right ballpark. AMD says 84 sqmm


I cant sort these question into the context. Winchester and Newcastle would
have pretty much the same transistorcount, I would think.

Winchester has couple of minor optimizations, but no SSE3 implementation in
its current stepping.

Sure. Its just a question of when, not if. :)
No one wants to comment? :-)
K.
 
Sub $100 Athlon 64 chips? I doubt that . Imo we might see the K8 based Sempron (with the 64 bit
features disabled)being priced just under $100
in early '05 after it moves to 90 nm. AMD has no reason to lower the prices on
its Athlon 64 chips by much(especially the lower priced ones) since Intel doesn't
have any products that compete adequately against them. As 64 bit computing
becomes more popular, I expect AMD to gain much more market share.
 
Klaus said:
Its at least in the right ballpark. AMD says 84 sqmm


I cant sort these question into the context. Winchester and Newcastle would
have pretty much the same transistorcount, I would think.


Winchester has couple of minor optimizations, but no SSE3 implementation in
its current stepping.


Sure. Its just a question of when, not if. :)

Probably not until after Intel starts selling its 64 bit chips at low prices.
Don't hold your breath for that. At around $150 now, the Athlon 64 3000+
is a great bargain.
 
Klaus said:
I can't sort these questions into the context. Winchester and Newcastle
would have pretty much the same transistor count, I would think.

Allow me to rephrase.

Clawhammer had 106 Mtransitors and 1 MB L2 cache. In their chart,
AMD claims that the transistor count for one of their Athlon 64
cores is 68.5 Mtransitors. I'll assume this new core sports only
512 KB L2 cache.

Thus AMD removed 37.4 Mtransitors between Clawhammer and this
new core. I am not a hardware engineer, but I don't think it
takes 37.4 Mtransitors to implement 512 KB of cache.

If indeed Newcastle has ~78 Mtransitors, and it is Winchester that
has ~68.5 Mtransitors, then it would appear that AMD has optimized
the K8's layout at 90 nm in order to minimize the transistor count.

Did I make any sense?
Winchester has couple of minor optimizations, but no SSE3
implementation in its current stepping.

I had seen a mention of SSE3 in a slide from Kevin McGrath's talk.
http://techreport.com/onearticle.x/6363
http://www.geek.com/news/geeknews/2004Mar/bch20040303024101.htm

New features
o Power reductions
o Speed improvements
o Lower power in Halt, Stopclock states
o SSE3 (Prescott New Instructions)
o Enhanced data prefetch
- Negative stride, improved page crossing
o On-Die Thermal Throttling
o Additional write combining buffers
o Convert LEA -> ADD
o DRAM controller improvements
- DDR400, managing open pages, 2T

Errr, I thought the Athlon 64's memory controller already supports
unbuffered DDR400...
 
Grumble said:
Klaus Fehrle wrote:
Allow me to rephrase.
Clawhammer had 106 Mtransitors and 1 MB L2 cache. In their chart,
AMD claims that the transistor count for one of their Athlon 64
cores is 68.5 Mtransitors. I'll assume this new core sports only
512 KB L2 cache.
Thus AMD removed 37.4 Mtransitors between Clawhammer and this
new core. I am not a hardware engineer, but I don't think it
takes 37.4 Mtransitors to implement 512 KB of cache.

512 KB is 512 X 1024 = 524288 Bytes.

There are 8 bits in a byte, 9 bits per byte for ECC, but we also
need some transistors for the L2 tag, decoders, etc. So let's use
~12 bits per byte.

524288 x 12 = 6291456 bits of storage.

The classical SRAM cell uses 6 transistors per bit.

6291456 x 6 = 37748736.

So the estimate is about 37.7M transistors. That means the overhead
is less than the 3 bits per byte in the estimate, but pretty close.
(Overhead also includes bits used in cache for redundancy)
 
David said:
512 KB is 512 X 1024 = 524288 Bytes.

There are 8 bits in a byte, 9 bits per byte for ECC, but we also
need some transistors for the L2 tag, decoders, etc. So let's use
~12 bits per byte.

I'll give you the extra ECC byte for every 64-bit word.

However, I don't think the transistor count for the structures
you mention increases linearly with cache size.
524288 x 12 = 6291456 bits of storage.

The classical SRAM cell uses 6 transistors per bit.

6291456 x 6 = 37748736.

So the estimate is about 37.7M transistors. That means the overhead
is less than the 3 bits per byte in the estimate, but pretty close.
(Overhead also includes bits used in cache for redundancy)

Your computation would not explain the transistor count for Banias
or Madison 6M.

Banias - 77 Mtransistors - 1 MB L2 cache
Madison - 410 Mtransistors - 6 MB L3 cache

When Intel switched from Willamette to Northwood, they simply added
256 KB non-ECC (??) L2 RAM. The transistor count went up by only 13
Mtransistors. Another data point: Barton has only ~16 more
Mtransistors than Thoroughbred B.
 
Grumble said:
David Wang wrote:
I'll give you the extra ECC byte for every 64-bit word.
However, I don't think the transistor count for the structures
you mention increases linearly with cache size.

It doesn't matter. As long as it works out to be ~11.9 bits
per byte, ECC, tag, decoder, redundancy bits inclusive at the
512 KB/1MB sizes, whether the transistor count scales perfectly
linearly with respect to cache size is not relevent.
Your computation would not explain the transistor count for Banias
or Madison 6M.
Banias - 77 Mtransistors - 1 MB L2 cache
Madison - 410 Mtransistors - 6 MB L3 cache

Direct comparisons of these designs may not be possible because
the cache structures are different. i.e. the Madison count
includes the L2 cache, which IIRC, has 5 ports, and that port count
would drive up the transistor count per cell.
Banias L2 also has been designed with extra circuitry to enable
separate array power up/down.

These issues are not present in the Opteron 512K/1M L2 computation.
The cache structure there is the same.
When Intel switched from Willamette to Northwood, they simply added
256 KB non-ECC (??) L2 RAM. The transistor count went up by only 13
Mtransistors. Another data point: Barton has only ~16 more
Mtransistors than Thoroughbred B.

I don't have time to look them up and recompute all the numbers.

The basic numbers are sound, but to get to the numbers you want,
you have to start looking at cache structures, associativity,
ports, etc. Since they're all different for all these processors,
the fact that they are different on a per bit basis should
surprise no one.
 
David said:
The basic numbers are sound, but to get to the numbers you want,
you have to start looking at cache structures, associativity,
ports, etc. Since they're all different for all these processors,
the fact that they are different on a per bit basis should
surprise no one.

I see.

[ Are you RWT's David T. Wang? ]

Do you know if the K7's and the K8's L2 caches are similar? (By K7,
I mean the latest models i.e. Thoroughbred and Barton.) Is the K7's
L2 cache ECC-protected? (I don't think so.)

Here are some numbers to fuel the discussion.
http://anandtech.com/cpuchipsets/showdoc.aspx?i=1685&p=3
http://endian.net/details_compare.asp?ItemNo=2488&ItemNo=2508&ItemNo=3677&ItemNo=2985

===== K7 =====

Palomino
256 KB L2 cache
37.5 Mtransistors
128 mm^2 @ 180 nm

Thoroughbred A
256 KB L2 cache
37.2 Mtransistors
80 mm^2 @ 130 nm

Thoroughbred B
256 KB L2 cache
37.6 Mtransistors
84 mm^2 @ 130 nm

Barton
512 KB L2 cache
53.9 Mtransistors
115 mm^2 @ 130 nm

===== K8 =====

Clawhammer
1 MB L2 cache
105.9 Mtransistors
193 mm^2 @ 130 nm
 
Back
Top