News Thunderbird packs up to 6,144 CPU cores into a single AI accelerator and scales up to 360,000 cores — InspireSemi's RISC-V 'supercomputer-cluster-o...

Status
Not open for further replies.

bit_user

Titan
Ambassador
The article said:
InspireSemi's Thunderbird 'supercomputer-cluster-on-a-chip' packs 1,536 RISC-V cores designed specifically for high-performance computing
The downside of this approach is that all of those cores need to fetch instructions from memory and then spend energy decoding those instructions. Furthermore, to the extent they're executing the same kernels, all of that instruction cache the cores have with basically the same contents is also wasted area.

That's why GPUs use wide SIMD, with up to 128x 32-bit = 4096-bit for Nvidia's HPC/AI GPUs (though my info could be out of date). In AMD's case, I think AMD's MI200 and later are using 64x 64-bit, for what's effectively 4096-bit SIMD. MI100 and CGN were 64x 32-bit.

The article said:
InspireSemi says that its solution offers up to 24 FP64 TFLOPS at 50 GFLOPS/W
Okay, but AI doesn't use fp64.

This thing is seemingly hearkening back to the approach taken by Intel's failed Xeon Phi, except that they presumably get some benefit from RISC-V having less baggage than x86. Good luck to them, but I remain skeptical.

Their best case scenario might be if Nvidia, AMD, and Intel all turn their backs on the HPC market to focus on AI. Then, their general-purpose programmability + heavy fp64 horsepower might actually count for something.
 
Last edited:

KnightShadey

Reputable
Sep 16, 2020
131
76
4,670
Okay, but AI doesn't use fp64.

Yes, it does, only it's mainly edge cases, not where all the focus is. Nor is it the sole focus of these designs, just an option.
Which is where anyone trying to compete with AMD, Google, & nV should be looking to make their mark/money.... where they aren't. 🤔

This thing is seemingly hearkening back to the approach taken by Intel's failed Xeon Phi, except that they presumably get some benefit from RISC-V having less baggage than x86. Good luck to them, but I remain skeptical.

No, it's almost an exact copy of Tenstorrent's design (just likely similar to what their next generation would be core/cluster wise), which makes sense timeframe wise. They are another group that focuses on the Edge case scenario to succeed, with a bunch of former AMD, Apple, intel, & nVidia folks at the helm (sometimes all in the same person😉).

If you're unfamiliar with them, Anand did a great interview with the founder/ceo Ljubisa Bajic & CTO Jim Keller (both former AMD colleagues). Jim's a big name in chip design;
https://www.anandtech.com/show/1670...storrent-ceo-ljubisa-bajic-and-cto-jim-keller

and THG did an article of the architecture in the past.

https://www.tomshardware.com/news/t...h-performance-risc-v-cpus-and-ai-accelerators

Semi also has a good breakdown of benefits, and where they are focusing on the fringe applications (yet alread have large orders from LG etc.)

https://www.semianalysis.com/p/tenstorrent-wormhole-analysis-a-scale



https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F44457387-829f-4f48-880b-384c2751279c_1024x566.jpeg


uMSitvb2KSfxgoTQt74d8d-1200-80.png


Mask-group-4.jpg
 

bit_user

Titan
Ambassador
Yes, it does, only it's mainly edge cases, not where all the focus is.
Source?

Nor is it the sole focus of these designs, just an option.
How is it not a focus, when they literally compared their fp64/W to Nvidia's A100? That sure sounds like they were making it a focus!

Furthermore, fp64 is key to many of their specified use cases:

89iArEroKtHY5EuSiESZHY-970-80.png.webp


Which is where anyone trying to compete with AMD, Google, & nV should be looking to make their mark/money, where they aren't. 🤔
As mentioned in the article, Nvidia's Hopper is no slouch on fp64! AMD's MI250X briefly took a lead on fp64, before Nvidia's H100 recaptured the crown. It's telling that they compared Thunderbird to the previous generation (A100), at a time when Nvidia is on the verge of launching their next generation (Blackwell). That shows InspireSemi is well aware they're not competitive on fp64.

No, it's almost an exact copy of Tenstorrent's design (just likely similar to what their next generation would be core/cluster wise),
LOL, no. Their design has an order of magnitude more RISC-V cores than Tenstorrent's Tensix cores! That can only be true if their cores are much narrower. I'll bet they're also lacking direct-mapped SRAM like Tenstorrent and Crebras use, which puts them at a substantial efficiency disadvantage, because cache is a lot less energy-efficient than software-managed SRAM.

They are another group that focuses on the Edge case scenario to succeed,
Exactly which edge case scenarios is Tenstorrent focusing on?

If you're unfamiliar with them, Anand did a great interview with the founder/ceo Ljubisa Bajic & CTO Jim Keller
I Read it. The whole thing. Both interviews he did with Jim, around that time, actually.

yet alread have large orders from LG etc.)
I heard about the LG deal, like a year ago. Any newer news?

I read a lot of nice stuff about Tenstorrent. I hope they are successful, though I also hope they don't backtrack on their promises to open source their software and tools.
 
Last edited:

KnightShadey

Reputable
Sep 16, 2020
131
76
4,670

See above post.
No one said preferred or exclusive, but again, also NOT 'not used' as you claim.🧐

However, to sum up, like yesterday.... You Disagree.
M'Kay, Thanks! 😎🤙

Like yesterday, you're probably running at maybe about 50% on your assumptions/claims, but that may be generous. 🤨
 

bit_user

Titan
Ambassador
See above post.
No one said preferred or exclusive, but again, also NOT 'not used' as you claim.🧐
Again, please provide a source to support the assertion that fp64 is used in AI ...or drop the point.

However, to sum up, like yesterday.... You Disagree.
M'Kay, Thanks! 😎🤙
I kinda feel like you're trying to diminish my position by casting me as the dissenter, when mine was the original post.

Also, I asked you some questions and to support some of your claims, which (for the record) you've not done. That's fine, but it should be noted.

Like yesterday, you're probably running at maybe about 50% on your assumptions/claims, but that may be generous. 🤨
This is not a constructive statement. You're free to take issue with anything I've said, but to cast aspersions without specifics is a pretty low-brow tactic.
 

KnightShadey

Reputable
Sep 16, 2020
131
76
4,670
I kinda feel like you're trying to diminish my position by casting me as the dissenter, when mine was the original post.

Casting you as the dissenter? But darling, YOU ARE the original dissenter...
"Okay, but AI doesn't use fp64." 🧐

Which you never provided exhaustive supporting evidence for. Yet feel you can compel others to do so? 🤨

So to be clear, and to further summarize your position: You Disagree with Tom's, Tenstorrent, InspireSemi, and myself.
Okay, Thanks !
🤠🤙
 
  • Like
Reactions: TechyIT223

bit_user

Titan
Ambassador
Casting you as the dissenter? But darling, YOU ARE the original dissenter...
"Okay, but AI doesn't use fp64." 🧐
Oh, am I? That statement wasn't disagreeing with anything - it was merely stating a relevant fact. There was no actual point in contention, until you contradicted it.

Which you never provided exhaustive supporting evidence for.
As you'll know, one can't prove a negative. However, let's look at some examples of AI accelerators lacking fp64.
  • Tenstorrent - "On Grayskull this is a vector of 64 19 bit values while on Wormhole this is a vector of 32 32 bit values."
  • Cerebras - "The CE’s instruction set supports FP32, FP16, and INT16 data types"
  • Habana/Intel's Gaudi - "The TPC core natively supports the following data types: FP32, BF16, INT32, INT16, INT8, UINT32, UINT16 and UINT8."
  • Movidius/Intel's NPU (featured in Meteor Lake) - "Each NCE contains two VLIW programmable DSPs that supports nearly all data-types ranging between INT4 to FP32."
  • AMD's XDNA NPU (based on Xilinx Versal cores) - "Enhanced DSP Engines provide support for new operations and data types, including single and half-precision floating point and complex 18x18 operations."

Not to mention that Nvidia's inferencing-oriented L4 and L40 datacenter GPUs implement fp64 at just 1:64 relative to their vector fp32 support (plus no fp64 tensor support). That's almost certainly there as a vestige of their client GPUs fp64 scalar support, which is needed for the odd graphics task like matrix inversion.

I don't know about you, but I'd expect fp64 support to be a heck of a lot more prevalent in so many purpose-built AI accelerators and AI-optimized GPUs, if it were at all relevant for AI. Instead, what we see is that even training is mostly using just 16-bits or less!

Yet feel you can compel others to do so? 🤨
Hey, you're the one who volunteered the oddly specific claim:

"Yes, it does, only it's mainly edge cases"

That seems to suggest specific knowledge of these "edge cases" where it's needed. If you don't actually know what some of those "edge cases" are, then how are you so sure it's needed for them?

So to be clear, and to further summarize your position: You Disagree with Tom's, Tenstorrent, InspireSemi, and myself.
Okay, Thanks !
🤠🤙
As I said, I'm not in disagreement with them over this. In your zeal to score internet points, it seems you didn't take the time to digest what the article relayed about InspireSemi's strategy:

"This is a major milestone for our company and an exciting time to be bringing this versatile accelerated computing solution to market," said Alex Gray, Founder, CTO, and President of InspireSemi. "Thunderbird accelerates many critical applications in important industries that other approaches do not,

So, they are taking a decidedly generalist approach, much more like Xeon Phi than Tenstorrent's chips. This is underscored by their point that:

"this processor can be programmed like a regular RISC-V CPU and supports a variety of workloads, such as AI, HPC, graph analytics, blockchain, and other compute-intensive applications. As a result, InspireSemi's customers will not have to use proprietary tools or software stacks like Nvidia's CUDA."

Above, I linked to Tenstorrent's TT-Metallium SDK. Writing compute kernels to run on their hardware requires specialized code, APIs, and tools, which is quite contrary to InspireSemi's pitch.
 
Last edited:

KnightShadey

Reputable
Sep 16, 2020
131
76
4,670
Oh, am I? That statement wasn't disagreeing with anything - it was merely stating a relevant fact. There was no actual point in contention, until you contradicted it.

No, it's not a fact, and again, you obviously didn't bother reading about where it can be useful, as instructed, even though it has been pointed out to you before. You also didn't bother to at least look at InspireSemi's own material which specifically references it towards the end. 🧐

As you'll know, one can't prove a negative. However, let's look at some examples of AI accelerators lacking fp64.

Let's not, and focus on the fact that pointing to the typical use does not prove that it is the ONLY way.

AGAIN, if you even bothered to to follow the provided links, or even go to the end of InspireSemi's own material (instead of lazily cutting & pasting Tom's brief slice you would know the very specific areas they think they can leverage their HPC-centric design towards AI with the use of higher precision. This very thing has been brought to your attention before, and you ignore.

Even in your previous discussion with CmdrShprd the other day, you show that if people provide you evidence you dismiss it for any reason you can think of, including questioning the validity of the people involved or the examples provided.

Which is why I pointed out that it was for those exceptions to the rule, which you flew by in your default mode. You did that 2 days ago as well, with equally arrogant ignorance of the discussion to cut & paste your default assumptions as if they were fact, YET AGAIN, which were then shown to be incorrect, and Once Again, MISSING THE POINT. 🙃

That seems to suggest specific knowledge of these "edge cases" where it's needed. If you don't actually know what some of those "edge cases" are, then how are you so sure it's needed for them?

I have specific knowledge of those Edge cases, and have even mentioned them to you before, because AGAIN I bothered to read and understand the material in their presentation as well as that provided to you.

it seems you didn't take the time to digest what the article relayed about InspireSemi's strategy:

So, they are taking a decidedly generalist approach, much more like Xeon Phi than Tenstorrent's chips.

No, it is clear you didn't even understand that the reason they have that generalist approach, is because folks like AMD & nV have very specialized designs that squeeze every last ounce of performance out of their much more advanced silicon for very specific targeted tasks , so InsprieSemi specifically mention how they can use that jack of all trades thinking and apply it to those edge cases where they can make their mark/money and not worry about AMD & nVIda beating them because they can pack more transistors in their 3/5 nm process vs Inspire's use of TSMC 12nm. They are specifically mentioning the very small boundary scenarios that they can exploit with their design, even in Ai, because they can go where AMD & nVidia ain't and also they can't simply follow with their money and TSMC relationship advantages.
Again, Exploiting the Edge Cases! 🧐

As before, YOU COMPLETELY MISSED THE POINT. 🙄
 
Last edited:

sitehostplus

Honorable
Jan 6, 2018
394
160
10,870
I think we should wait to see how it works in the real world before judging anything.

All that alleged power, there has to be a downside we can't see until it's out in the wild actually doing stuff.
 

KnightShadey

Reputable
Sep 16, 2020
131
76
4,670
As I said, I'm not in disagreement with them over this. In your zeal to score internet points

Dude, you're the one trying to score internet points, as I will quote : "(for the record) ... it should be noted." I mean who even talks like that unless they are solely concerned about scoring points ? 🤨

So to summarize YOU: Your assumptions are unquestionable facts, everyone else must be wrong and have ulterior motives if you disagree. 🤡
M'kay, Thanks ! 🥸🤙
 
Last edited:

KnightShadey

Reputable
Sep 16, 2020
131
76
4,670
I think we should wait to see how it works in the real world before judging anything.

All that alleged power, there has to be a downside we can't see until it's out in the wild actually doing stuff.
Definitely, at this point from their discussion it's only reached the tape-out stage, or at best first silicon. They mention they expect to have first customer shipments by the end of Q2 / beginning of Q3.

The biggest downside right up-front is scalability and low precision performance. It's not for repeating the same task over a warehouse of cards, it's about being able to do far more tasks quickly enough in different enough scenarios,not the quickest at same tasks. The PCIe restrictions currently limits their performance as they go beyond the card , and even more so beyond rack, etc.

They do mention that when they can move to TSMC 5nm in their version 2, they will be addressing both of those challenges with more low-precision processing and improved communication (in addition to PCIe6)
 

bit_user

Titan
Ambassador
you obviously didn't bother reading about where it can be useful, as instructed, even though it has been pointed out to you before. You also didn't bother to at least look at InspireSemi's own material which specifically references it towards the end. 🧐
Please state where they say that. The only relevant bit I can find in their press release is:

"delivering >6,000 CPU cores that support double precision math (64-bit floating point, aka FP64) required for many important HPC applications."

Source: https://inspiresemi.com/inspiresemi-announces-tapeout-of-its-thunderbird-accelerated-computing-chip/

Nothing about using fp64 for AI.

AGAIN, if you even bothered to to follow the provided links,
All you did was link to a bunch of Tenstorrent stuff. Their hardware doesn't support fp64, as I already showed.

or even go to the end of InspireSemi's own material (instead of lazily cutting & pasting Tom's brief slice you would know the very specific areas they think they can leverage their HPC-centric design towards AI with the use of higher precision.
Please provide a link and an exact quote, because I'm looking at their website and not seeing such an assertion.

This very thing has been brought to your attention before, and you ignore.
Where?

Even in your previous discussion with CmdrShprd the other day, you show that if people provide you evidence you dismiss it for any reason you can think of,
LOL, no. That was a case of him not knowing what he was talking about and throwing a lot of stuff at the wall. In the course of doing that, he inserted several quotes without sources and blamed me for not somehow knowing where he quoted it from. All of that was beside the point, because it was a tangent off of a tangent off of a tangent. Worse, it was actually a point he was arguing with no one, because either by choice or by accident he ended up arguing a detail he thought he could prove, instead of anything relevant to the original or secondary discussions. I pointed out that it was irrelevant and he left in a huff over the fact that I didn't share his godly opinion of his source, even though those claims (which I didn't dispute, BTW) were inconsequential to the original point of contention. I think he was just feigning indignation as a face-saving move, because he recognized he'd argued himself out onto a limb.

The funny thing is that I even proposed a simple experiment he could try that would illuminate the discussion, but he opted to ignore the one thing he could do that would actually move the discussion forward.

It's a really bad idea to wander into the middle of something and take it out of context, because your understanding was off the mark.

Which is why I pointed out that it was for those exceptions to the rule,
But which exceptions? You seem to know of some, so you ought to be able to provide examples. If you don't actually know of any, then maybe you're wrong.

You did that 2 days ago as well, with equally arrogant ignorance of the discussion to cut & paste your default assumptions as if they were fact,
Quote + Link?

Also, if you have issues with one of my posts, you should reply in-thread. Bringing up posts about different topics, in a different thread is not only whataboutism, it's unproductive and off-topic.

I bothered to read and understand the material in their presentation as well as that provided to you.
Well, you keep mentioning this presentation, but you provide no link? That's not very helpful.

it is clear you didn't even understand that the reason they have that generalist approach, is because folks like AMD & nV have very specialized designs that squeeze every last ounce of performance out of their much more advanced silicon for very specific targeted tasks ,
Ironically, that directly contradicts what these guys are saying:

I think there's some truth in both. CPUs are more general than GPUs, which are more general than the Sohu chip that Etched built. With each increase in generality, you sacrifice performance. That's one reason I'm concerned InspiredSemi's Thunderbird will be noncompetitive and effectively DoA. The only thing that might save it are the huge pricing increases and availability problems in going the Nvidia or AMD routes.

YOU COMPLETELY MISSED THE POINT. 🙄
I never missed the point that they're trying to build a more general accelerator. My original post discussed some of the tradeoffs they made and questioned whether such an approach could compete with the big GPUs, so long as any of them continued to play in both the AI and HPC markets.
 
Last edited:

bit_user

Titan
Ambassador
I think we should wait to see how it works in the real world before judging anything.
It might never get that far. We'll see.

All that alleged power, there has to be a downside we can't see until it's out in the wild actually doing stuff.
What I go back to is MooreThreads' S80 so grossly underperforming its specifications. Not sure if that was just due to bottlenecks in the architecture or maybe chip bugs, but I don't buy the excuse that it was 100% due to immature "drivers" (although those were also clearly a big factor).

Every big chip has hardware bugs, BTW. Most can be worked around with software or microcode, but sometimes the workarounds sacrifice too much performance.
 

bit_user

Titan
Ambassador
I mean who even talks like that unless they are solely concerned about scoring points ? 🤨[/B]
Let's just focus on facts and specific claims under contention.

So to summarize YOU: Your assumptions are unquestionable facts, everyone else must be wrong and have ulterior motives if you disagree. 🤡
M'kay, Thanks ! 🥸🤙
Trolling is against the forum rules, as are ad hominem attacks. If we're not going to focus on the facts of the matter, I predict this exchange will be short-lived.
 

KnightShadey

Reputable
Sep 16, 2020
131
76
4,670
Please state where they say that. The only relevant bit I can find in their press release is:..

... because I'm looking at their website and not seeing such an assertion.

I didn't say it was in their press release, and I'm certainly not going to be your spirit guide. 😒

Here's a hunt, perhaps you should actually read their material and their especially their discussions of their solution and it's applications, instead of regurgitating your truisms.
I'm certainly not going to make it easier for you, but if you do a proper investigation it is specifically addressed.... and yes, it has been brought to you attention before.
LOL, no. That was a case of him not knowing what he was talking about.... I pointed out that it was irrelevant... and he left in a huff over the fact that I didn't share his godly opinion of his source....

It's a really bad idea to wander into the middle of something and take it out of context, because your understanding was off the mark.

No, I use it as the example because it specifically addresses your manner of interacting with others in the forum you disagree with. The part about "his godly opinion" is very rich coming from you, especially one whose haughty criticism not only extends to others, but even the companies discussed here in this very thread, as if they know nothing of their own value/utility compared to all knowing BU. 😇🧖🏻‍♂️

But which exceptions? You seem to know of some, so you ought to be able to provide examples. If you don't actually know of any, then maybe you're wrong.

That could be true in other situations , and I freely admit there are times where I am wrong, possibly many such times.
However, I get the distinct feeling that that own self-awareness never occurs to you. 🤨
Like I said, I do know of examples, and they are clearly mentioned by InspireSemi, and if it were ANYONE else but you I would freely provide guidance, your path was enlightenment, and not keeping score.
That you can't find them yourself tickles me to no end; because now I realize that that probably bothers you as much or more than being contradicted, especially in your goal of keeping score in the forum. 🤣
 
Status
Not open for further replies.