News NASA's old supercomputers are causing mission delays — one has 18,000 CPUs but only 48 GPUs, highlighting need for updated compute

Page 2 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.
Mar 6, 2024
11
0
10
I know a fair bit about FPGAs and GPU programming. What I asked for was data. If you can't provide any recent data, then there's nothing more to discuss on the matter.


Not exactly what you're talking about, but a related example involving GPU processing of compressed data is how they support compressed textures!


Depends on the size of the tables, if using table-based compression. There are some decompression benchmarks where AMD's X3D processors smoke anything else, just because they happen to be able to fit the whole tables in their L3 cache. FPGAs don't usually have that much on-die memory.


You seem so sure of this, but you can't cite a single HPC paper or reference that indicates decompression is a bottleneck?
1) regarding decompression. What I first stated was a memory-wall. Decompression isn't itself the bottleneck. It's the memory latency and bandwidth itself.

Being able to do computation on compressed data is a huge win in over coming the memory wall.

I suspect the interesting middle comments on compression speeds and on textures are related to transforms and not computations.

We can transform data (compress it) but like a one way cryptographic hash it is rarely computable.

In fact we might say compressed data is a one way hash. It can be reversed only if you know the algorithm. But any compute to it will irreversibly lose information. In the same way any computation on a hash will irreversibly destroy the data, you can't hash 2 and 2 and add them and compute a hash that is equal to 4. For instance.


Anyway, when referring to data about FPGAs I'm simply confused what you're exactly asking for? I wouldn't say there's a benchmark comparison or something.

So rather I'm logically articulating where and why GPUs are strong, and where and why FPGAs are strong. Or even other types of accelerators.

And then hypothesizing based on what I know about programming orbital computations why we would choose one accelerator over another.

I'd want a circuit that can best parallelize the data. But it won't be a bunch of vector transforms...

It won't be a bunch of independent variables all needing computation acted on them.

As stated before, orbits especially are very interative. Iteration 1 must complete before iteration 2 begins.
 
Mar 6, 2024
11
0
10
@bit_user

Previously I had a document on Pleiades and FPGA accelerators but I'm buried in over 140 tabs at the moment. Recovering that document which I used to make that claim is not a priority.

However what I did run across was an interesting statement by a NASA official of why they DO NOT want GPUs.


While mostly they talk about "legacy code" he also mentions much about the fact they just don't see much performance increase due to GPU acceleration.

This lends well to my statement which is born out of the logic of what principle applications the HEC program seems to service. Mostly these interative data dependent computations.

Also the k40s are mentioned in this document but elsewhere which I'll probably stumble across again, there's very few of those. There's many more FPGAs.

I often times get the feeling there's like this....GPU cult in the computing world. It acts as if FPGAs are nowhere to be found. They are everywhere and common.

Accelerators are hard things to pinpoint in HPCs. It's easier to talk only about the nodes. Nodes are just the "core" but not the interconnect, accelerators, or any of the other pieces of the supercomputer.

It's just the rack so to speak.

I'm frustrated I didn't keep open the tab I had about the Pleiades FPGA. Didn't realize I found such a gem as when I now have to rediscover it a second time.
 

George³

Prominent
Oct 1, 2022
228
124
760
Source? Bribing a government official is a felony. You should have evidence of these allegations, or else don't make them.
Of course, it is not a statement, but an opinion. There can be no proof if it is a verbal agreement between the persons. But there is no other valid reason why anyone would insist on such an expense.
I'm curious as to why you insist on a source, when to have one, someone must have already taken the trouble to investigate on the ground and release the results of that investigation if they were able to obtain it. But the topic is too new, it is only now being discussed and there may not yet be any suspicion in the competent authorities. On the other hand, I have the freedom to suspect, based on many cases, the spending of budget money, which did not lead to the results desired by society, but which enriched the companies to which they were allocated.
 
Mar 6, 2024
11
0
10
@bit_user

Sorry if I seem all over the place but I'm not. I simply am not overly focused on the exact design of any one HEC computer.

Instead I've been trying to explain what I think is intuitive. That NASA architects don't think they need GPUs.

I have reinforced this belief further with the research paper that on page 80 here:

https://www.google.com/url?sa=t&sou...wQFnoECBcQAQ&usg=AOvVaw1u_5My2AT-c7U0TDoCl_TW

You can read about how the two principle paradigms, compute-centric or data-centric have huge impacts on the performance of any HPC.

I would then say from the previous link I did provide where Thigpen explained they see better performance from network and interconnect improvements than from addition of GPUs that NASA favors data-centric architecture.

I would add that this architecture makes sense for non-parallelized compute as I've been stressing.

I might spend some time looking for the FPGAs in Pleiades But I'm about to go to jacuzzi sooo....might drop off here.
 

bit_user

Polypheme
Ambassador
Of course, it is not a statement, but an opinion. There can be no proof if it is a verbal agreement between the persons.
An opinion is something like "software XYZ is well-designed". Alleging bribery is not an opinion. Whether bribery occurred is a factual matter.

But there is no other valid reason why anyone would insist on such an expense.
Nearly all of the Top 500 HPC installations have some amount of GPU compute. It would be weird if they didn't suggest equipping more GPU compute power!

I'm curious as to why you insist on a source,
Because you shouldn't spread allegations of conspiracy with zero evidence.

This falls in the category of a wild accusation, since what they cited (and it wasn't the main focus of the report, but a passing mention) is in line with standard practice. Furthermore, with GPU makers selling every HPC-grade GPU they can make, they wouldn't even have the incentive to take the risk.

Your claims make no sense and have no merit. There is nothing to discuss, here.
 

George³

Prominent
Oct 1, 2022
228
124
760
You seem unable to distinguish an opinion from a statement. I have decided not to visit this forum anymore so as not to argue with the wall.
 

bit_user

Polypheme
Ambassador
You seem unable to distinguish an opinion from a statement.
An opinion would be: "I don't believe NASA needs (more) GPU compute capacity. Therefore, I'm concerned about them saying they need more."

That's a legitimate point of view and leaves open the possibility that either they have needs you're not aware of (seems likely?) or that there is indeed some other explanation.
 

TechLurker

Reputable
Feb 6, 2020
176
103
4,760
Personally, I think it'd be kind of neat if they could at least update to EYPC systems, since they seem to be running on CPU-heavy tasks. Maybe also work out deals with AMD's semi-custom solutions for a purpose-built system, like some of the supercomputers AMD has designed for clients. AMD also has Xilinx as part of their portfolio, so if certain specialty chips are required, AMD has that to integrate.
 
  • Like
Reactions: bit_user

shady28

Distinguished
Jan 29, 2007
432
305
19,090
I'd like an article on Space X's supercomputer infrastructure. It would make an interesting comparison.

This was my immediate thought. I seriously doubt that compute infrastructure is NASA's main problem. And I'd bet they already have an order of magnitude or more compute infrastructure performance compared to SpaceX. They just don't have the talent to use it. It's another Eloi and Morlock story. The Eloi are at NASA, the Morlocks moved to SpaceX.
 
The problem of a lack of talent in high-tech jobs is typical in all levels of government.

Even mediocre tech talent can get great salaries in the private sector making a government job unappealing.

Add in typical .gov bureaucracy, Federal purchasing rules, a rapidly changing field and likely political lobbying it's amazing anything is accomplished.

I'd like an article on Space X's supercomputer infrastructure. It would make an interesting comparison.
The problem of lack of talent in high-tech jobs is universal. not just in the federal government. I work in the field as a system admin for a defense contractor, and i'll tell you even my coleagues are all mediocre (to say nothing for those I interact with outside of my company). shockingly mediocre and disinterested in their jobs for the most part.

the tech field has very few people actually interested in it working in it.
 

Steve Nord_

Prominent
Nov 7, 2022
56
7
535
NASA has been extremely screwed by congressional budgeting for years due to earmarking. Somewhere around half their budget is already spoken for due to mandatory programs. Back when Shelby was still a senator we had the building that was being worked on for the engines for the Constellation program that he forced to be completed despite the program being canceled and NASA not being able to use it for the engines for SLS. This sort of thing has dogged the agency for decades which is partly how we ended up with no shuttle replacement even in true planning before the program ended.

The ridiculous bureaucracy has undoubtedly caused a huge amount of problems getting/retaining talent. Who really wants to work on programs that are pointless, bad ideas or have a high likelihood of being retasked. The problem with computing here really seems like a situation of knowing there's no money to fix it so they keep moving along.
Boy, that FPGA core AI for making the admin game pay has some answering to do.
 

icycool

Distinguished
Mar 2, 2013
8
0
18,510
IDNeon has some good points.

However the main takeaway from the article is that NASA is a mismanaged mess. Some might argue underfunded. Some might say a bloated bureaucracy. Stephen Baxter has some unkind words to offer.
SpaceX has done many things in a relatively short time that NASA has somehow never managed in decades.
 
Mar 17, 2024
1
1
10
I'm sure NASA realizes that these systems require upgrading, but acknowledging that won't provide the funds necessary to do so. Only Congress can do that
 
  • Like
Reactions: bit_user

bit_user

Polypheme
Ambassador
SpaceX has done many things in a relatively short time that NASA has somehow never managed in decades.
NASA knows this and has specifically been encouraging development of the private sector, in this area. As has been pointed out, some of NASA's mismanagement extends all the way into how Congress treats it.

I'll bet most at NASA would prefer it focus just on space science. As far as I know, they have completely gotten out of the satellite-launch business and are even encouraging the private sector to get involved in deep space communications!

BTW, one notable thing no private sector operator has yet done is successfully land an another orbital body. They almost managed to land on the moon (with NASA's support), but not yet... Meanwhile, we should acknowledge NASA's extraordinary success rate at this extremely difficult task. I doubt non-engineers appreciate just how hard it is to build something so complex that has to work perfectly, the one and only time you try it! Just look at how many tries it takes SpaceX just to get a new rocket working reliably, and rockets are the "easy" part of an interplanetary mission!

I'm sure NASA realizes that these systems require upgrading, but acknowledging that won't provide the funds necessary to do so. Only Congress can do that
Yup. That's probably the main reason such reports are produced.
 
Last edited:
  • Like
Reactions: thestryker
SpaceX has done many things in a relatively short time that NASA has somehow never managed in decades.
Get back to me when SpaceX is required by their funding to adapt 4 engines designed to be reused that cost over $140 million each on each non-reusable rocket. That's just the most obvious example that leapt to mind there are many more. It's absolutely impossible to get things done when the non science/engineer folks want funding for businesses in their state and control the purse.

This is why NASA has been very successful with the rover program (minus that imperial/metric kerfuffle) the programs are run in house rather than contracting the whole thing.
 
  • Like
Reactions: bit_user

bit_user

Polypheme
Ambassador
This is why NASA has been very successful with the rover program (minus that imperial/metric kerfuffle) the programs are run in house rather than contracting the whole thing.
It's not just rovers, though. The James Webb Space Telescope is mind-bogglingly complex, considering not only the array of sensors, mirrors and optics, but also the whole unfolding and calibration procedure and its many-layered heat shield to block out the sun. If they hit a roadblock anywhere along the way, JWST would be dead in the water. It's too far out for any human mission to service it!

I wonder if we'll someday see even larger mirrors on a telescope capable of automatically replacing segments which receive damage from errant grains of rock and dust. JWST is already degrading a little faster than expected.

Tying this back to the article's subject matter: if you want to successfully land stuff on other bodies with atmospheres, and have robots capable of semi-autonomous operation when they get there: simulation, simulation, simulation! And running areodynamics & robotics simulations requires a lot of compute resources.
 
Last edited:
  • Like
Reactions: thestryker
Status
Not open for further replies.