Broadwell: Intel Core i7-5775C And i5-5675C Review

Page 10 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.


HBM shouldn't be on the chip itself but as DRAM chips accessed via an interposer layer. It's single package not single die, meaning binning and QA is already done prior to adding the HBM and thus no rerouting of defective memory blocks is needed. This is what AMD did with the 390 and Fiji. In an APU environment you wouldn't want 4GB of 4096-bit 500Mhz HBM memory, you would want 1GB of 1024-bit 750 ~ 1000Mhz (HBM-2) memory or 2GB of 2048 bit memory. It's fairly inexpensive $20~30 give or take depending on scaling and mass production. This type of setup would dominate the sub $170~200 USD high volume budget sector and future integrated devices (consoles). Eventually Intel will be using HBM, which is when things are going to get real. These chips are just 128MB of eDRAM acting as a super fast cache, imagine what they could do with 1~2GB of high speed graphics memory.
 

InvalidError

Titan
Moderator

Intel, like most other compute-oriented firms, are backing Hybrid Memory Cube which favors memory access concurrency at the expense of power and density instead of High Bandwidth Memory like AMD and Nvidia do since GPUs are mostly unaffected by memory access latency. Intel will likely end up using HMC instead of HBM in their chips - at a glance, the CrystalWell eDRAM architecture seems to demonstrate all the fundamental HMC building blocks except die stacking.

As far as eDRAM as "super fast cache" goes, it is much slower than the CPU's built-in caches in both access time and bandwidth. Even HMC won't beat that. With 1-2GB of HMC "cache" to manage, the Tag-RAM to keep track of which memory rows are cached where would need to be much bigger, and incidentally slower, than the 64-128MB eDRAM's. I doubt it would be practical to manage such a large on-package data store (a significant chunk of total system memory) as a conventional cache, so my bet is that it will end up OS-managed as some form of NUMA memory space where the OS can decide between using the on-package memory as cache or additional system RAM depending on system memory load.
 


I was referring to the 128MB eDRAM as a graphics cache for the iGPU accessing system memory. 128MB as a systems cache would be almost pointless, much less 1~2GB worth. The effect of cache on performance gets reduced by every incremental increase, going from 32KB or on die cache to 64KB produced less of a performance increase then going from 0 to 32K. 64K to 128K produced even less, 128K to 256K even less and so forth until we're adding mega bytes for 2~4% increase's. It just gets harder and harder to keep it utilized and relevant to performance the larger you go. Graphics on the other hand, due to the vast amount of data they process, you can find large performance increase's by going with large cache's as Intel's 128MB cache demonstrates. Technically that L4 cache also applies to system memory access, but it hardly makes any difference at all, but when used for graphics access there is an extremely large increase.

I wouldn't expect to see HMC anywhere near an enthusiast computer, it's very expensive for what it provides. It's a technology aimed at supercomputers and higher high end processing environments where latency and concurrency of extremely large memory pools is a big issue. There would be zero benefit to your standard desktop user or even power use / gamer over cheaper technology like HBM. Also HMC is non-JEDEC and runs the exact same path that Rambus memory did, licensing fee's, royalties and competitive restrictions will keep it out of the consumer market. HBM on the other hand is JEDEC and thus available to anyone who wants to make / use / sell it. That will ensure the market gets competitive prices and compatible components which is what drives the consumer market. It will become the standard for video graphics memory in the near future and I see it becoming the standard for all iGPU devices as well. The funny thing is that the HSA standard already defines ways in which you can manage and utilize that separate memory pool. I see in the medium term future programs being evolved to utilize the iGPU's as co-processors along with their accompanied high speed memory. AMD was talking about that for awhile, they just had too many other issues to really push it but I see it happening eventually.
 
^ didn't know that HMC was non-JEDEC standard. it'll help HBM become even more mainstream.

on another note, an apu with 4-8GB (minimum) HBM can eventually eliminate off-package memory like system RAM. a ULP soc with HBM can perform a lot better than it can with LPDDR4.
 

yankeeDDL

Distinguished
Feb 22, 2006
100
17
18,685


Why don't we comment on Skylake when it comes out?
I was referring to comment like this: "The real star of the show is Iris Pro Graphics 6200, which absolutely destroys anything Intel previously offered for its LGA 1150 interface (not to mention AMD’s best effort to make APUs look good)"

I personally would still recommend AMD A10 7850 over the i7 5775C for a home user. 350usd are a boatload of money: you cannot possibly compare *any* AMD product to the 5775 (or to the i5 5675, for what matters).
The i7 5775C is an impressive piece of engineering, however, at that price, is useless. And buying it to use it as a CPU, and ignore the iGPU, which has a massive cost contribution, seems unreasonable.

For casual gamers, home users, and even most of the office work, ADM products offer a better price/perf ratio, and reviews (and comments) like those in this articles are, in my view, misleading.

I own a Lenovo Z50, with an A10-7300. At $450, was a fantastic buy (could use a better screen and a larger battery...) and it runs most modern games at decent settings, and not-so-new games maxed out. This is at the price of the core-i7-5775 alone!
Unless anyone has very, very, very specific needs for energy efficiency, or extremely CPU-limited tasks, AMD are, already today, a better deal. That's my view, at least.
 

yankeeDDL

Distinguished
Feb 22, 2006
100
17
18,685


I don't think that you, or anyone else knows the cost of HBM.
Cards with Fury are in the same price range as previous generation. And Fury is expensive also because the silicon is HUGE.
Put only 1GB of HBM, and make the silicon of a reasonable size, and you *could* get a seriously fast APU, at a fraction of the 5775C...

That said, time will tell ...
 

logainofhades

Titan
Moderator


HBM, I think, would be useful in APU based laptops as well.
 

InvalidError

Titan
Moderator

I think HMC has plenty of potential to help with desktop computing: normal memory accesses cost 60+ cycles while HMC would likely be under 40 cycles like the eDRAM is, which could help quite a bit with reducing the impact of memory access latency on pipeline stalls, especially if NUMA-aware applications can ask the OS to allocate their scratchpad memory on HMC when it fits. I imagine many game engines would benefit from having their most performance-critical data constructs hosted on HMC. And HMC is also fully capable of acting as IGP memory too. Since HMC and HBM construction is very similar, the manufacturing cost difference shouldn't be very large.

Who backs HBM? AMD, Nvidia, Hynix.

Who backs HMC? Micron, Intel, Samsung, IBM, ARM, Xilinx and over a hundred others:
http://www.hybridmemorycube.org/about.html
While the bulk of them are research and HPC-oriented, there are also many networking, telecom, video processing, test equipment and other companies in there like Brocade, Juniper, Miranda, Huawei, LeCroy, etc.

HBM might be a JEDEC standard but HMC is more versatile.
 


Isn't Intels L1 cache pushing 1TB/s now?



Of course I don't know the actual cost. I can say a few things though.

1. HBM is new and the yields are low. Over time it will get better though.

2. The Fury X is a big chip but it is on a very mature 28nm so yields for that are going to be vastly better han a new tech.

3. Multiple sites have spoken that HBM is currently expensive and low supply so cost would be higher.

It will get better and eventually we will probably see solutions using HBM or HMC to give performance gains for APUs.



Of course. Anything that cuts power while increasing performance will. Of course I think it will depend on the added cost it might bring. Probably will have to wait till it becomes more mature and abundant before AMD will start putting it on APUs. Cost vs gains matter the most.
 

InvalidError

Titan
Moderator

256B/cycle for Haswell x 4GHz, yup.


The reason why HMC and HBM spilt the high-speed logic/IO from the DRAM arrays in the stack is to allow building the DRAM dies on standard DRAM manufacturing processes instead of using a fancy eDRAM process. As such, yields for the DRAM dies should be quite similar to regular DRAM.

I think the main hurdle would be economies of scale: with most ASIC designers looking at HMC, HBM might have trouble gaining traction beyond AMD and Nvidia. If nobody else adopts HBM, there won't be much motivation for other DRAM manufacturers to offer HBM dies/stacks and Hynix will have the whole HBM market for itself.
 


I think you missed the cost argument. Enthusiast space is dominated by costs, otherwise you'd all be buying $5,000 USD graphics cards and $2,000 USD CPUs. DDR memory is dirt cheap, HBM is slightly more expensive, about the same cost as GDDR5, HMC is significantly more expensive the either of the others. It has lower latency but also lower bandwidth then HBM, it's big claim to fame is it's ability to scale to handle extremely large I/O processing loads, something that doesn't exist in the enthusiast space. We've had this discussion before, HPC work loads do not exist in the consumer space and thus products that make sense in HPC do not make sense at the consumer level. An HMC based memory subsystem would produce near zero real world differences in a DDR4 subsystem and a HBM subsystem would only make sense if there was an iGPU present and only as an add on and not main memory. Though the argument about AIO and fixed function budget systems using a single HBM memory pool is pretty interesting.

I can not stress how important costing is to pretty much everything, there are no unlimited budgets. Expensive memory subsystems like HMC will not see much if any success in the enthusiast market for similiar reasons RAMBUS didn't, proprietary implementation and cost / benefit results.
 

InvalidError

Titan
Moderator

While HMC has the potential of getting used in expensive and complex memory configurations, it is also designed to be scalable to very small scale: for SoCs and other controllers which may require a moderate amount of fast memory, they may use only a single half-width (x8) port to interface with their HMC stack and in the HMC2 spec, even this smallest footprint interface can already provide similar bandwidth to a 128bits wide 3GT/s DDR4 interface using 1/5th as many bumps under the ASIC die: 30GB/s symmetrical (60GB/s aggregate) for single-port x8 HMC2 vs 48GB/s half-duplex for DDR4.

Individual HMC stacks themselves are unlikely to cost more than few extra bucks compared to HBM only due to the more complex logic on the stack controller since everything else is otherwise fundamentally identical between the two as far as manufacturing is concerned. Where the "very expensive" configurations come from is when you start tacking HMC dies onto chips/packages and multiple packages onto cards with additional memory routing and distributed computing ICs but those are very expensive custom applications of HMC, not HMC's own cost.

Another advantage of HMC over HBM is that since HMC uses high-speed serial links instead of a massive parallel bus, HMC can omit the silicon interposer HBM requires to route its massive bus. This likely offsets the cost of the more complex HMC stack controller for at least one stack.

Not sure where you got the idea that HMC provides less bandwidth than HBM does. HMC1 interfaces can be four ports wide with 16 symmetrical lanes each running at up to 15Gbps for a total of 240GB/s peak aggregate bandwidth per stack (120GB/s symmetrical) while HBM1 is 128GB/s half-duplex per stack. In their second generation iterations, both interfaces double maximum bit rates, so their relative standings remain unchanged with HMC having the potential of being nearly twice as fast in 50-50 read-write or read-modify-write scenarios.
 
IE,
You really need to look up some specifications. HBM has higher total bandwidth then HMC precisely because of those massively wide bus's. At 1024 bits per stack you don't need to run it very fast to get massive bandwidth. In the realm of bandwidth there is absolutely zero room for debate, HBM wins (for the cost). It's 256GB's per stack not 128, someone forgot that it's speced for 1Ghz and DDR and probably just copied AMD's first implementation of it on Fury. HMC is a lot more expensive, each stack must include a dedicated memory controller that talks via SerDes to other memory controllers, in this way HMC is more like a networked memory architecture that can be scaled upward into HPC level workloads. On an enthusiast level it's the functional equivalent of having a 16GbFC SAN link between one or two disks and a single host adapter. Technically it would be faster then just having two SATA links, but the cost would be so outrageous that only the truly ridiculous would ever try to use it. Just because it has Intel's name on it doesn't make it superior to all other options. Each implementation has various benefits and constructions that must be taken into account when engineering a solution. For desktop / consumer level solutions HMC makes absolutely zero sense, you might as well take a wad of cash and burn it in a barrel for the good it will do. It is like trying to use an Enterprise SAN / DAS storage implementation on a desktop computer. It's technically possible but ridiculously inefficient. It doesn't even compete directly with HBM since HBM is designed for graphics cards and other high bandwidth / high latency applications. Just because it has AMD's name on it doesn't make it bad, it's a JEDEC standard for a reason. Using HBM for system memory would most likely be inefficient unless you would have it there anyway due to an iGPU / APU situation, in which cause it might be a cost cutting measure by leaving DRAM entirely out of the picture. Wide-IO is a specialized implementation for low power mobile devices, it's designed around using the lowest power possible while still providing sufficient performance to operate media devices like Phones / Tablets / ect. DDR3 and later 4 is by far the cheapest most cost efficient solution and what will most likely continue being the primary system memory solution of choice. Finally Samsung, Micron and others are currently working on Wide I/O / Wide I/O 2 memory. HBM is just a specialized implementation of Wide I/O in the same way that GDDR5 is a specialized implementation of DDR3. Wide I/O focus's on lower power, lower cost and can be bolted directly onto a CPU to save precious space. HBM on the other hand is designed for extremely high scalable bandwidth and requires an thin interposer layer to act as an interface for it's thousands of pins. It's simply too big, to fast and to hot to fit directly onto a CPU die (in comparison to Wide I/O).

Specifications of HBM, JESD235.

http://www.kitguru.net/components/graphic-cards/anton-shilov/sk-hynix-adds-hbm-dram-into-catalogue-set-to-start-mass-production-in-q1-2015/

The HBM JESD235 standard stacks four DRAM dies with two 128-bit channels per die on a base logic die, which results into a memory device with a 1024-bit interface. Each channel is similar to a standard DDR interface, but is completely independent and therefore each channel within one stack and even within one die can operate at different frequency, feature different timings and so on. Each channel supports 1Gb – 32Gb capacities, features 8 or 16 banks (16 banks configuration is used with 4Gb and larger channels) and can operate at 1Gb/s – 2Gb/s data-rates (1GHz – 2GHz effective DDR frequency). As a result, each HBM 4Hi stack (4 high stack) package can provide 1GB – 32GB capacity and 128GB/s – 256GB/s memory bandwidth. It is expected that the HBM JESD235 standard will evolve to accommodate stacks of eight DRAM dies (HBM 8Hi stack package/topology).

http://www.kitguru.net/components/graphic-cards/anton-shilov/sk-hynix-confirms-mass-production-of-first-gen-hbm-memory/

SK Hynix is shipping HBM to AMD today and plans to support demand for such memory from other customers.

Next year SK Hynix and other memory producers will start to manufacture second-generation HBM that will support up to eight memory stacks, higher DRAM capacities and clock-rates. HBM2 will allow to create 8GB memory chips with 256GB/s bandwidth per chip.

Here is a good breakout from Extremetech
DRAMs.jpg


This breaks down into the following implementations

Consumer System Memory = DDR3/4
Graphics = HBM
Integrated Solutions / APUs / iGPU's = HBM
HPC System Memory = HMC
Enterprise micro-server memory = FB-DDR3/4
Enterprise Heavy server memory = HMC
Mobile Media systems memory = Wide-IO

:Edit:

This is an addon about manufacturers. Since HBM has been a JEDEC standard since 2013 there is no need for licensing it or entering into tech sharing agreements so while AMD and Hynix were pushing and drafting the standard many others were involved. Samsung is currently experimenting in producing HBM chips, they have some demo products already made. Intel has even been playing around with the idea of using it on consumer products. HBM is the logical successor to GDDR5 while HMC is a completely different architecture of distributed memory that address's a specific set of problems identified in extremely high end computing. The two are so different from each other that direct comparisons in an attempt to claim one is "the winner" just goes to show how far people will go to participate in their teams perceived football match.
 

InvalidError

Titan
Moderator

Re-check your specs/math.
HBM1: 1024 bits x 1GT/s (500MHz) = 128GB/s
HMC1: 4 ports x 16 lanes x 15Gbps x 2 (symmetrical) / 8 = 240GB/s

HBM2: 1024 bits x 2GT/s (1GHz) = 256GB/s
HMC2: 4 ports x 16 lanes x 30Gbps per lane x 2 (symmetrical) / 8 = 480GB/s

HBM may have a 16X as wide a bus but HMC runs its bus 15X as fast which cancels out most of HBM's width advantage and is symmetrical, which gives it a significant lead for peak throughput.

As for the cost of putting SERDES on a chip, 10-16Gbps is cheap as chips these days. I doubt Intel would leave a full complement of serial lanes on their lower-end CPUs and chipsets if SERDES had a significant impact on their costs, same goes for AMD and Nvidia's lower-end stuff. If SERDES were significantly more expensive than a wide interface, Intel would not have gone serial for their CrystalWell eDRAM interface either.
 
I am under the impression (likely wrong) that Zen will roll-together AM3+ and FM2+. Whatever it becomes, it will have *APU-like* cores and features, and build upon the initial HSA foundation in Kaveri/Carrizo.

The issue will HBM as a 'last-level cache' would be the interposer and the potential for adding latency beyond the current L3 (or an IOMMU).

No doubt 'bandwidth' is an issue that HBM (or a fancy e-DRAM) potentially could address, but under the KISS principle reduced latency for a Zen SoC APU on DDR4 may be achieved with 3 256-bit 'Fusion Control Links' backed-up by 2 (or more) 256-bit 'Radeon Control Links' sniffing a last-level L2 ... or a 'unified' serial DDR memory.

Did that make sense ?? :lol:

 
^Zen is supposed to be on AM4 which will be a single socket for CPUs and APUs, which is a smart move.

I haven;t seen anything about if they plan to have APUs only or CPUs top end and APUs mid-low end like Intel currently has.
 


You didn't read my post in your attempt to quickly hammer out a response, that's highly disrespectful. I even provided you with the exact JEDEC specification.


Specifications of HBM, JESD235.

The HBM JESD235 standard stacks four DRAM dies with two 128-bit channels per die on a base logic die, which results into a memory device with a 1024-bit interface. Each channel is similar to a standard DDR interface, but is completely independent and therefore each channel within one stack and even within one die can operate at different frequency, feature different timings and so on. Each channel supports 1Gb – 32Gb capacities, features 8 or 16 banks (16 banks configuration is used with 4Gb and larger channels) and can operate at 1Gb/s – 2Gb/s data-rates (1GHz – 2GHz effective DDR frequency). As a result, each HBM 4Hi stack (4 high stack) package can provide 1GB – 32GB capacity and 128GB/s – 256GB/s memory bandwidth. It is expected that the HBM JESD235 standard will evolve to accommodate stacks of eight DRAM dies (HBM 8Hi stack package/topology).

It's speced at 1ghz (2Gbps DDR) not 500mhz (1Gbps) and this spec was made in 2013 by JEDEC. AMD chose to build the first iteration at 500mhz because that's all the Fiji GPU could use, it's just a modified 290 outfitted with a new memory controller. It's also the first iteration of a new technology. We don't claim DDR3 is limited to 666mhz even though 666mhz exists in the specification. nVidia's next GPU will be using the full 1Ghz in the JEDEC standard for 1TBs bandwidth on a 4096-bit interface. They may go more but that seems a likely target. JEDEC standard mentions nothing about HBM 1/2/3/4/5/6/7, those are manufacturer iteration numbers and not standards.

I also mentioned cost for a reason, both these technologies enable you to expand the speed by adding more chips, it's cheaper to a few more HBM then a few more HMC. HMC's advantage is that you can keep adding more and more while HBM is likely going to struggle to go past eight stacks. Anyhow your very wrong about HMC's costs vs capabilities and have yet to respond to most of my points, just cherry picking a few lines to quickly type out, which is itself incredibly disrespectful. It's the functional equivalent of putting in a 10GbE SAN to connect a few disks to your desktop, and then arguing it's the future because it might be theoretically faster then using SATA.

Anyhow we are starting to stray outside the score of this topic. New technologies are judged buy their capabilities and costs, not by the name and logo of the company creating them.
 


You won't want to use HBM for main memory / cache if possible, it's targeted at extremely high bandwidth / high latency applications like GPU's. It's the evolution of GDDR5. The technology slated to replace DDR would be Wide I/O, which is the cousin to HBM, same technology but different target like DDR vs GDDR. Of course we also have DDR4 and in the meantime it's going to be the memory of choice for consumer markets because of how cheap it is. HMC was purpose built for high powered / high performance computing, stuff like supercomputers or big iron enterprise kits. DRAM has a bit issue when you start scaling up into the 256GB +range, you start needing extension cards and very complex memory design's because of how far the signal must travel and the ridiculous number of memory channels you need to use. An Intel Xeon server CPU usually has with four channels, a quad socket system would then need sixteen memory channels with each channel needing four to eight DRAM sticks. That quickly gets messy, expensive and pulls a lot of power to keep them all refreshed. Go into larger then four sockets and it gets astronomical to manage all that, not to mention trying to keep your cache sane. HMC solves this by equipping every DRAM stack with it's own networked memory controller. It's designed such that each HMC stack can either directly link with the main CPU memory controller or provide a pass-through link to another HMC stack further away. In this manor you can chain HMC stacks and use multiplexor chips and other technologies to expand your memory array to fill whatever need you have.

You could use it on a desktop but it would provide absolutely no benefit over DDR4 while still being considerably more expensive. You could try using it on a graphics card but it would crushed in both cost and performance by HBM, which is purpose built for that role. Go into high end computing and it easily beats all other implementations. There is also discussion of tossing into into high end core routing devices because with IPv6 you need a ridiculously high amount of low latency local memory. Distribution layer devices would still use cheap DRAM though.

The only place I see for HBM inside a consumer device would be some amount on the CPU to act as a graphics memory for the iGPU. Think 2~4GB on some APU / Intel CPU. It's not being used for system memory or CPU cache but as dedicated graphics memory. And if the whole system is operating under some sort of HSA regime then an application could use it as memory for specialized vector instructions (physics / geometry / ect..) being tasked to the iGPU. The only real way you'd have HBM for system ram would be some sort of highly integrated AIO / Kiosk / SOC device where they removed the DRAM slots and interface entirely as a cost cutting measure. It would be lower performing due to the high latencies but I think at that point cost and power utilization becomes the driving factor not absolute performance.
 

InvalidError

Titan
Moderator

Right now, there is a handful of OEM prebuilt systems with Broadwell chips in them. It looks like there isn't enough supply for units to reach retail shelves - at least not as a regular stock items from major retailers. There occasionally are sellers on Amazon and eBay who claim to have them in stock.

Since Broadwell was originally intended to be BGA/embedded-only and has suffered multiple delays from on-going 14nm production issues, I would not be surprised if Intel canceled plans to ramp Broadwell-D production in favor of not delaying Skylake any further once 14nm issues are finally sorted out.

Broadwell-D may end up being little more than a paper launch with a relatively insignificant number of units shipped over its commercial lifespan. The way Intel only introduced two desktop SKUs - one unlocked model at the very top of the i5 and i7 ranges respectively - screams that they never intended to produce or sell very many units in the first place.
 

logainofhades

Titan
Moderator


Not seen a single one for sale. With Skylake supposedly coming out around the first week, of August, I do not expect to see them either. This was, in all fairness, a paper launch.
 
Status
Not open for further replies.