News Intel's CEO says Moore's Law is slowing to a three-year cadence, but it's not dead yet

Page 2 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.
Thanks for sharing. I am aware of specialized applications of multi-valued logic. I was talking about in general-purpose computing.

My concerns remain, unless you can point to where they claim these multivalued logic circuits can ultimately be synthesized at sizes close enough to binary ones as to retain a higher net density.

Where I see immediate benefits from competitive multivalued logic is in natively implementing support for PAM4 signalling, as in the case of PCIe 6.0. In that case, it needn't even scale as small, so long as it can at least provide an energy benefit over binary logic. In fact, I believe it's primarily in communication circuits where multi-valued logic is found today.

A 4 state transistor can emulate 2 binary transistors, so it could be used to implement 2 CPU in the same atoms, running 2 different programs. It would not demand designing new software.
Okay, that's an interesting idea.

SIMD hardware could double. L1 Cache could double.

Chip size could be reduced quadratically. Yields would increase exponentially.
Again, I'm not seeing where it says these multi-valued logic gates can be shrunk to the same size as binary ones, or how they even scale with respect to the number of states, on leading nodes.

You do have a good point about RAM density. That could potentially follow the path of NAND, in storing more bits per cell. However, as long as the rest of the processor is implemented in binary, we're probably just talking about L3 or L4 cache and DRAM. Don't forget that multi-bit NAND comes at a performance penalty - particularly during the write phase.
 
Last edited:
  • Like
Reactions: George³
Yes, but this is not description of average home user. I think that prices of lowering nodes will make production unbearably dear, not too many years from now. Moreover, it will not be able to "ripen" to prices that will be bearable even after maximum refinement.
I was talking about hypothetical layered chips with tens of trillions of transistors. If there's no economy of scale from stacking 1,000 dies on top of each other, it would be as at least as expensive as making each layer individually.

https://forums.tomshardware.com/thr...sts-as-complexity-rises.3831573/post-23165635

If you do the math on the incoming $30,000 N2 wafers, while it can be painful, it probably doesn't make consumer products unbearably expensive to make, especially for smaller dies around 100-250 mm^2. The low end of that range is large smartphone chips or AMD's Mendocino, larger mainstream APUs or Navi 23/33 are around 200-250 mm^2.

There's also a common sense solution in the short term: move the L3 cache, perhaps all of it, onto a cheaper node like N6 and 3D stack it. There's no sense making 50% of your die SRAM on an absurdly expensive node, if the 3D stacking is mature and cheap enough.

If we end up at $100,000 wafers for the N1 node or whatever and halt all scaling there, I think the price can eventually ripen if there are years to slow down and work on optimizing EUV and multiple patterning. Then there are alternate approaches to look at like 3DSoC and carbon nanotubes made on cheaper nodes. Absurdly expensive wafers will spur more development of alternatives.
 
  • Like
Reactions: Diogene7
you do the math on the incoming $30,000 N2 wafers, while it can be painful, it probably doesn't make consumer products unbearably expensive to make, especially for smaller dies around 100-250 mm^2. The low end of that range
Now imagine something like only 40% of the chips cut from the wafer being usable.
 
  • Like
Reactions: bit_user
Now imagine something like only 40% of the chips cut from the wafer being usable.
Another $60 to BOM cost for a 225mm^2 chip, compared to 85% yield. Maybe that turns into +$85-100 for the consumer. Bad for budget chips and even the mid-range, but not the end of the world.

AMD's adoption of chiplets for most of RX 7000 has readied them for this type of scenario. An RX 7900 XTX is only using 306mm^2 of N5 for the graphics die.
 
Last edited:
I didn't said that the technology is already available. to make ALUs. I say that is the only way to preserve Moore's law: to do more with the same atoms. There is no other way.

One process "node" should have 4 states. The next, double it. Each one should double it to keep with Moore's law.

The limit is the quantization of energy/voltage.
 
Moore's law will hit a hard stop. Electron - Photon - whatever you use in a gate can only go down to a certain size. The hard max is one atom which is 2-3 angstroms. So with 10 angstroms per nanometer and TSMC sits at 3 nanometers right now (30 angstrom node). ( I know that this is a bit of liar's poker due to actual gate size vs advertised node) Moore's law was about doubling density, not performance. So we only have 15 angstrom, 7.5 and theoretically 3.75 left as as a density doubling. If we are actually at something more like 100 angstrom this still only gives 50,25,12.5, 6.25. I am highly reluctant to believe that single atom gates are possible at scale so that really puts us into a maximum of two to four more doubling cycles. Moore's law isn't dead (at least from the doubling of transistor every two years), but it is done when it hits the hard wall of the atom regardless of electron or photon in the gate. So even if we push it back to 3 year cycles we are somewhere between 6 and 12 years before we hit the hard wall and silicon node shrinks end. It will be really expensive to hit those last couple of nodes, perhaps to the point of economically unviable and I think we will see some half or quarter steps.

This doesn't mean that performance will stop increasing, just that die shrinks on silicon stop. My gut is that silicon itself will be moved away from, at least on certain performance cores. Pick you material: graphene, Re6Se8Cl2, or any other of a half dozen promising materials that could end up being a chiplet core being put on as a tile along with silicon. Think putting ultra performance high clock cores in the mix. Graphene alone has been theorized to be able to clock at the terahertz level with working setups in the lab in the 100Ghz level. So rather than size we will see clockspeed coming back to the forefront. That isn't writing off quantum cores and/or massive parallelism. Silicon itself doesn't go away, there is too much infrastructure in place in the chip industry, it is just no longer the performance medium somewhere in the next decade.
 
Last edited:
I was talking about hypothetical layered chips with tens of trillions of transistors. If there's no economy of scale from stacking 1,000 dies on top of each other, it would be as at least as expensive as making each layer individually.

https://forums.tomshardware.com/thr...sts-as-complexity-rises.3831573/post-23165635

If you do the math on the incoming $30,000 N2 wafers, while it can be painful, it probably doesn't make consumer products unbearably expensive to make, especially for smaller dies around 100-250 mm^2. The low end of that range is large smartphone chips or AMD's Mendocino, larger mainstream APUs or Navi 23/33 are around 200-250 mm^2.

There's also a common sense solution in the short term: move the L3 cache, perhaps all of it, onto a cheaper node like N6 and 3D stack it. There's no sense making 50% of your die SRAM on an absurdly expensive node, if the 3D stacking is mature and cheap enough.

If we end up at $100,000 wafers for the N1 node or whatever and halt all scaling there, I think the price can eventually ripen if there are years to slow down and work on optimizing EUV and multiple patterning. Then there are alternate approaches to look at like 3DSoC and carbon nanotubes made on cheaper nodes. Absurdly expensive wafers will spur more development of alternatives.
That isn't to mention the issue with heat buildup when you stack. There has been some interesting work from UCLA on solid state thermal transistors that could possibly offset some of this, but you are still going to have to cool the individual layers in the sandwich, especially if you are stacking more than two or three. This isn't to mention the complexity of atomic level heat sinks being added to the design.
 
Multivalued transistors are not a thought experiment
Multivalued transistors demonstrated on wafer-scale
High-performance multivalued logic circuits based on optically tunable antiambipolar transistors†

A 4 state transistor can emulate 2 binary transistors, so it could be used to implement 2 CPU in the same atoms, running 2 different programs. It would not demand designing new software.

SIMD hardware could double. L1 Cache could double.

Chip size could be reduced quadratically. Yields would increase exponentially.
While chip size could be reduced quadratically this still overlooks the Moore's law discussion. Moore was observing transistor size and process node development cycle in 1965 . Even if we move to a different type of transistor the density doesn't change. Throughput potentially does, but the number of transistors stays the same per area if they are binary or not. Moore's observation also held to a point of cost per transistor and for that reason alone it is already dead since about 2012 with "minimum component costs" increasing at the rate per node that they are rather than remaining relatively flat compared to the doubling of density as Moore was observing. If you actually read his quote in "Electronics" magazine he felt that the observation was good for at least ten years in the article where he was published in an editorial (1965). This has been more of a goal for Intel to maintain than anything else. I think a multi state transistor could at least in theory make for a huge single generation jump in computing power per transistor. But it's success would be highly based on how the transistor design clocks vs its binary counterpart (in the real world) and have no effect on transistor density or Moore's law as it sits.
 
Last edited:
I was talking about hypothetical layered chips with tens of trillions of transistors. If there's no economy of scale from stacking 1,000 dies on top of each other, it would be as at least as expensive as making each layer individually.
There are a few big issues I see with the idea of simply adding more layers to existing wafers.
  1. Increased production time & costs, due to more steps, more masks, etc.
  2. Drastically decreased yield. This isn't a NAND, where FTL can simply remap around any defects. A defect in logic can compromise just one core, but maybe even the entire die.
  3. Scaling the number of layers both increases the areal density of heat-generating elements and reduces the ability of that heat to be removed.

So, it looks to me like logic pretty much needs to remain planar, both for economic and performance reasons. Maybe there could be a slight increase in the number of logic layers, but it's likely to deliver incremental benefits, rather than anything game-changing.

This is just my opinion, as an outsider. I could be wrong, but I wonder why we wouldn't have already see scaling of logic layers if it really does makes so much sense?

There's also a common sense solution in the short term: move the L3 cache, perhaps all of it, onto a cheaper node like N6 and 3D stack it. There's no sense making 50% of your die SRAM on an absurdly expensive node, if the 3D stacking is mature and cheap enough.
As I already mentioned, AMD made a point about how they had to stack SRAM on SRAM. Putting SRAM on logic would've been problematic for thermal reasons - and their X3D models already have enough of those, as is.

I think the price can eventually ripen if there are years to slow down and work on optimizing EUV and multiple patterning. Then there are alternate approaches to look at like 3DSoC and carbon nanotubes made on cheaper nodes. Absurdly expensive wafers will spur more development of alternatives.
Yes, we need innovations and these are now as likely to be motivated by cost savings as performance. If you look at the multiplicity of new nodes, we can already see a lot of optimization going on (i.e. for cost, frequency, power, etc.):

wikichip_tsmc_logic_node_q2_2022-1.png

Source: https://fuse.wikichip.org/news/7048/n3e-replaces-n3-comes-in-many-flavors/
 
  • Like
Reactions: George³
Throughput potentially does, but the number of transistors stays the same per area if they are binary or not.
Perhaps this is a bit simplistic, but my understanding is that it's not the transistors that are fundamentally changing, but rather how you use them. In multi-valued logic, the complexity of the logic gates should increase (thereby reducing density). Furthermore, it might be the case that multi-valued logic actually calls for higher-performance (i.e. lower-impedance, lower-noise) transistors that are necessarily larger than what you need for binary logic (though maybe not enough larger to nullify the benefits). No matter what, I'm pretty sure it's not a free lunch.

I'm pretty sure there are good reasons multi-valued logic hasn't been pursued, outside of the niches where it's currently being used.
 
  • Like
Reactions: jasonf2
Then they could reduce SRAM on the base to something like 4-8 MB, and layer it up.
Yes, I see progress, through regression. It would be a lot of fun to layer 10+ layers of SRAM to achieve today's volumes in use in real CPU's. With all the wiring problems for data and for powering the cells in added layers. Even if we forget about the already mentioned thermal problems, with which we are still barely able to cope even with the two layers(one basic + one 3D-V cache). Additional issues with ensuring coherence I don't even think about
 
Then they could reduce SRAM on the base to something like 4-8 MB, and layer it up.
What might be interesting is to make multi-layer SRAM dies. Then, you could get multiple layers' worth of cells, but with the manufacturing simplicity of having only a 2-high stack. This could also avoid the cost and complexity of the base die having to incorporate multiple logic layers.

Ideally, you could burn out fuses to disable bad lines of SRAM, which could enable even greater scaling. The big winner would be AI, which is a huge bandwidth hog and generally loves SRAM*. Would be good for GPUs, also.

* Cerebras' WSE-2 has 40 GiB of SRAM!
 
Last edited:
  • Like
Reactions: usertests
If I'm not wrong it's normal one layer SRAM. This volume is there just because the chip area is tremendous large.
Yes, I know. I was just using it as an example of AI wanting lots of SRAM.

As for the amount, I think it's still large in relation to the amount of logic, but I haven't tried to confirm that. There are other AI accelerators which have devoted more than half their die area to SRAM, but I didn't cite that one because I don't have a reference at hand and don't remember whose it was.
 
Yes, I know. I was just using it as an example of AI wanting lots of SRAM.

As for the amount, I think it's still large in relation to the amount of logic, but I haven't tried to confirm that. There are other AI accelerators which have devoted more than half their die area to SRAM, but I didn't cite that one because I don't have a reference at hand and don't remember whose it was.
If you make the effort to get the average amount of cache per number of cores... At 850000 cores in one chip, there is a negligible amount of cache left for one core. It's good that it's allocated dynamically, depending on the needs, and that probably helps in most use cases.
 
If you make the effort to get the average amount of cache per number of cores... At 850000 cores in one chip, there is a negligible amount of cache left for one core.
Ugh. You made not enough effort, as those are more like GPU cores than CPU cores.

Compute the same ratio for a GPU and you'll find it has even less cache per "core"!
 
Last edited:
Ugh. You made not enough effort, as those are more like GPU cores than CPU cores. Compute the same ratio for a GPU and it has even less cache!
In fact, it's like defining the cores in AMD cDNA chips, the architecture as GPU cores. Perhaps even they are closer to the GPU than the ones in the Cerebras chip. To the extent that each chip does the calculations, that's correct. To the extent that there is specialization to a certain type of computing that is not graphical, the definition is no longer appropriate. In fact, Cerebras does not claim that the cores in the chip are of GPU design. In its advertising description, it makes a comparison with the performance of tens hundreds of GPUs. This is a performance comparison, not a description of their chip architecture.
A single CS-2 typically delivers the wall-clock compute performance of many tens to hundreds of graphics processing units ...
 
Perhaps this is a bit simplistic, but my understanding is that it's not the transistors that are fundamentally changing, but rather how you use them. In multi-valued logic, the complexity of the logic gates should increase (thereby reducing density). Furthermore, it might be the case that multi-valued logic actually calls for higher-performance (i.e. lower-impedance, lower-noise) transistors that are necessarily larger than what you need for binary logic (though maybe not enough larger to nullify the benefits). No matter what, I'm pretty sure it's not a free lunch.

I'm pretty sure there are good reasons multi-valued logic hasn't been pursued, outside of the niches where it's currently being used.
I briefly read over the article posted earlier on this. One thing I can see pretty quickly is that if the gate is using multi-voltage steps you are going to get the worst of the voltage level tradeoffs. The lower voltage level will reduce the ability to push clock and the higher voltage level will make more heat. I have a feeling you are going to get a pretty rough performance to density trade off if you are trying to emulate binary transistors which have a directly proportional voltage to clock relationship. That isn't to say that utilizing it as a true multiple state logic gate wouldn't allow for massive parallelism opportunities but it would be subject to the same limitations that electronic analog computers designs face ( especially when trying to mix them with binary digital computers).
 
  • Like
Reactions: bit_user
Parallelism should increase at log2( num_states ). If voltage increases proportionately to the number of states, it's probably not going to be a big win.
I think it would depend a lot on how much of a voltage step you need to differentiate logical states in the final product and how many states you are trying for. 2 volts bad, .1 volts maybe not so much, but regardless state count would be inversely proportional to max clock cycle speed.
 
  • Like
Reactions: bit_user
The lower voltage level will reduce the ability to push clock and the higher voltage level will make more heat.
On the other side, if you have 1 transistor doing the work of 2, it produces half the heat.
Also a multivalued transistor can replace multiple layers of binary transistors, so it can do more work on less clocks.

Besides, high frequency have diminishing returns, due to heat, so reducing the clock speed also has less cost in performance.
it would be subject to the same limitations that electronic analog computers designs face ( especially when trying to mix them with binary digital computers).
Most of those difficulties come from analog computers not being digital.
Multivalued transistors are digital. They are not equivalent to analog computers.
 
Last edited:
Status
Not open for further replies.