December 3, 200817 yr Sam,I'm no hardware guru like you, but is it correct to compare FSB speeds between i7 and C2D architectures? As far as I've read from articles at AnandTech and other places the i7 architecture really doesn't have any FSB in the conventional way. Is it your opinion that the i7 architecture is slow compared to C2D regarding memory bandwidth?Ulf BThere's nothing revolutionary here, at all. This is just a step closer to the holy grail of a "system on a chip." The 45nm tracing process freed up real estate on the CPU die. Intel's just integrated the Northbridge/FSB function onto the CPU die. It's still working on the same basic principles. Once we get to 32, then 28nms, there will be room for dozens of cores too . . . . several will be dedicated to video processing, This will be the basis of Intel's Larabee tech. Lots of huff about this, We'll see how much puff. Nvidia/ATI are trying get CPU functionalioty onto their XPU ("GPGPU"). One of the Northbridge functions is to transfer data between the memory and the system buss (aka FSB and Quickpath). The P35's northbridge could only transfer a data bit between these 2 busses on (about) every 12th a/c cycle (alternating current cycle). So, if a FSB was running at 400Mhz, every a/c cycle lasted ~ 3ns. 12 x 3ns = 36ns. So, at 400Mhz (400 million alternating current cycles per second), the FSB could transfer a data bit to/from the memory's data buss at ~ 36ns intervals. This is called Latency. Now addin the memory's internal processes (CAS RAS, etc) and the full latency gets to the 45-80ns range we see with the P35s. This is the number we see with the Everest memory test, etc. All these factors combine to get a data bit moved between these 2 busses. Individually they are meaningless. Only their product - latency - matters. Anyone can experiment with this latency number. Intel's setting uses (about) every 12th a/c cycle. This is controlled by a setting called "tRD." This is the "timing, Read Delay" setting (tRD). In this case, the Read event Timing is Delayed to occur on every 12th FSB a/c cycle. Memset (google it and get it) has a "Performance Level" setting that can change tRD in real time. No need to reboot. As it happened, Intel's "12th" setting is Very generious. Most systems will run at a tRD of 7 (or a data bit transfer every 7th alternating current cycle of the FSB). At 3ns per cycle, this is a (5x3ns = ) 15ns latency decrease. One can also trim off another couple ns by tightening up timings. For instance each CAS is worth 1 or 2 ns. Also each 100Mhz of increased ram speed will reduce latency by ~ 1ns. Each 100Mhz of FSB speed will also decrease latency by ~ 1ns. Even without fancy ram, most P35s can get well down into the 50s. Try it. The believers saw it, but the pragmatists could only generously concede that the performance difference between 80ns and even 45ns was only subjective, at best. The system just wasn't able to take advantage of that increased data transfer potential. Enter Nehalem. The P-35's FSB was wasting 11 out of 12 alternating current cycles (a/c). However, even though Nahelem's onboard system buss/controller is running a slower FSB (133Mhz), it is using more closely spaced Quickpath (i.e., FSB) cycles. Back-of-the-envelope arithmetic would suggest that at a Quickpath speed of 133Mhz, each a/c cycle is 8ns long, Now add in the memory's internal latency of ~ 35ns (like this??: The old 400Mhz FSB latency of 36ns + X of memory internal latency = the old (normal) latency of ~ 70ns. X = ~ 35ns. Da know, but sounds 'bout right.). The techreport article is seeing ~45-55ns memory latencies. This might suggest that the Nehalem is using a tRD of between 2 and 4. ({8ns x 2} + 35ns = ~ 50?). They were able to slow down the FSB (oops, I mean quickpath buss) and maintain the same system buss to/from memory buss performance by simply using more closely spaced a/c cycles. See, not so hard. But also observe this is a generous estimate. They are Not using Every QP cycle. Likely, not even every second. A tRD of 3-5 is more likely. See what needs to happen at a tRD of 3+? It really will take that 1333/1600 speed ram (i.e., their decreased additive latencies) - just to maintain - the overall system latencies we had with the old P35 boards. It's all a neat neat trick . . . but it doesn't help us a whit. So why bother? Remember that chipsets are $20-50 per board. The southbridge is still around, but the northbridge is gone. Did the boards get cheaper? You Bet! Cheaper to manufacture. I guess the question should have been did the board's price go down? Nope. That $$$ drops straight into Intel and their mobo partner's pocket at pure profit. That's why. Ya gotta respect these guys. Their genius is - Not - just technical.
Create an account or sign in to comment