News Linus Torvalds says RISC-V will make the same mistakes as Arm and x86

"from the hardware that you really have no idea how the hardware works"
From my experience that just always the truth for a software developer, especially the ones that don't write the low level stuff. If a SW dev write verilog/VHDL then the whole code is full of state machines, writing raw signal code with adder/mulplier/delaying stuff is something that their mind can't seem to handle.
You also notice this with the newest people graduating from school with a degree in electronics, everybody is becomming more high level minded and knowledge of low level things is getting lost.
 
  • Like
Reactions: mtrantalainen
"from the hardware that you really have no idea how the hardware works"
From my experience that just always the truth for a software developer, especially the ones that don't write the low level stuff. If a SW dev write verilog/VHDL then the whole code is full of state machines, writing raw signal code with adder/mulplier/delaying stuff is something that their mind can't seem to handle.
You also notice this with the newest people graduating from school with a degree in electronics, everybody is becomming more high level minded and knowledge of low level things is getting lost.
That's because they are totally different skills, hardware and software are very different, a pure software engineer should not be writing verilog or VHDL.

There is a very good reason that software developers can't write good HDL, it is because they lack the background knowledge needed, a pure software developer likely has never learned about digital logic, especially if they use a high level programming language and don't need to deal with bits directly. Unfortunately a lot of software engineers think that if it is code then they can do it but HDL isn't just learning another programming language. It is also mainly software developers who don't understand hardware properly themselves that create all the high level synthesis languages that then practically no one that knows what they are doing uses.

Software developers are not bad or worse than hardware developers like your comment seems to infer, they are just different skill sets with different backgrounds knowledge for completing different tasks that just so happen to be interconnected.
 
  • Like
Reactions: mtrantalainen
a lot of software engineers think that if it is code then they can do it but HDL isn't just learning another programming language.
I've definitely seen the reverse problem, where hardware engineers believe all of the hard problems are solved in the hardware domain and think software is trivial. These folks usually make a big mess. Not to say there aren't some hardware engineers who are competent software developers, but there's definitely an overconfidence problem with others.
 
  • Like
Reactions: mtrantalainen
A programmer may be writing at high level, but if he doesn't knows how the cache works, or if a CPU register will be pushed to memory and pulled back to CPU, the difference in performance is enormous.
The difference is noticeable only with very CPU optimized code. But nowadays who needs it ?
99% of softwares development is done with high level languages and frameworks with many abstractions layers.
Optimization is needed only partially by kernel and drivers developers.
 
  • Like
Reactions: Nikolay Mihaylov
The difference is noticeable only with very CPU optimized code. But nowadays who needs it ?
99% of softwares development is done with high level languages and frameworks with many abstractions layers.
Optimization is needed only partially by kernel and drivers developers.
My take is somewhat different. I'd say the problem of needing to understand the hardware is largely addressed by optimized libraries, languages, and compilers. If you need an optimized datastructure, chances of achieving better than the ones in standard libraries and language runtimes with your own hand-coded version are slim, and mostly only to the extent that you can tailor your implementation to the constraints of your specific use case.

I've definitely seen people get so carried away with loop-level code optimizations that they end up optimizing a bad datastructure with poor scalability. The wins you can get from clever code optimizations are usually just a few X, at best, while the wins from switching to scalable algorithms and datastructures can be orders of magnitude. Modern languages and libraries tend to be designed to make it easy to write code that scales well. We're finally exiting the dark ages of plain C code.
 
A programmer may be writing at high level, but if he doesn't knows how the cache works, or if a CPU register will be pushed to memory and pulled back to CPU, the difference in performance is enormous.
By the time you are writing java/python/c#/javascript this stuff is abstracted away 3 layers below you. And that's almost all of the software being written today. If you are thinking about how the cache or registers work at this level, you are wasting valuable time.
 
  • Like
Reactions: NinoPino
If you are thinking about how the cache or registers work at this level, you are wasting valuable time.
I'm not going to say it's never an issue, but a lot of these details are accounted for in best-practices guides you can find, which say things like "prefer placing variables on the stack instead of the heap", "avoid false sharing", etc.

A few years ago, there was an interesting "debate" between a Windows game programmer who was porting a game to Linux (for Google Stadia) and Linus Torvalds. The game programmer was complaining that the spinlock code that worked well on Windows performed badly on Linux. Linus had an epic reply, but it basically boiled down to the game code trying to outsmart the operating system and why that's a really bad idea.

TL;DR: going too low-level can hurt you!
 
Last edited:
  • Like
Reactions: NinoPino
Direct ISA (& its assemblers, etc.) vs low-level vs high-level just depends on your work and task. A skillful programmer can use any approach freely and adapt to the needs of the work of turn, however those 90%> the time ends up working in low-level (or direct ISA), as a small majority of programmers just have a basic Python 101 knowledge and little more at best, as high-level languages are inherently easier and less time-consuming to learn and use, so it is natural the workforce demographic is inclined toward them.

Any relatively sizeable project has both the low-end backend and high-level frontend programmers, since that is by far the best way to use the current workforce available, and there is nothing wrong with that, except that it is putting increasingly higher energy & hardware performance requirements for a stable and smooth runtime in a reasonable timeframe. This puts critically serious and currently no-way-to-solve issues for the environment that keep worsening.

Outside that, it really has no other inherent issue, but I think that inherent issue is serious, personally, and something the whole computation field should acknowledge and handle as it keeps increasing its share of the world's energy consumption.

New problems for a field requires new approaches, and a push toward high-level languages (and its users) and tools with more low-level concerns acknowledged, alongside hardware and low-level tools acknowledging how the high level (in performance relevant tasks) above them uses them, are both key to handle this issue.
 
Last edited:
It's good to see many of the Risc V implementers following Si-V's lead and using formal proof systems to guarantee everything. Then of course you imagine hardware gals meeting that with a separable set of slow-running gate basedope charge, homodyne, spinor, regenerative synchronous network and greying logics to drop into semiconductor and metal; maybe that undoes enough of the logic it is fed to leak state, remap memory, and run initializers that fight type constructors. I think they'll log the warnings if not make a graph universe of what they mean. Ah, the warning that AVX578 is suboptimal for odd bitcount enjoyers...meh.
 
The difference is noticeable only with very CPU optimized code. But nowadays who needs it ?
99% of softwares development is done with high level languages and frameworks with many abstractions layers.
And that's why a new computer feels slower and more unresponsive than a primitive pre-windows PC running DOS on the 90's.
 
  • Like
Reactions: NinoPino and Nyara
By the time you are writing java/python/c#/javascript this stuff is abstracted away 3 layers below you. And that's almost all of the software being written today. If you are thinking about how the cache or registers work at this level, you are wasting valuable time.
No. Go to stackexchange, and watch questions about slow code written in python, and you will frequently see solutions that work 100X or even 1000X faster.
Just reading a matrix by rows or by columns make an enormous difference in cache, even in Python.
If a function call needs to fetch data from 10 different places, it is orders of magnitude slower than the same function getting all data from a single parameter.
Many complex functions can be reduced to a small number of binary operations.
 
Last edited:
  • Like
Reactions: NinoPino
And that's why a new computer feels slower and more unresponsive than a primitive pre-windows PC running DOS on the 90's.
Yes and no. Features get layered on that didn't exist, back in those days. You can't directly compare today's software with back then.

One thing that happened in the 1990's was an unprecedented rise in single-threaded processing speeds. So, a program might be developed for a 386, but when you run it on a Pentium, it's suddenly like 10x as fast. If it was usably fast on the former machine, it's instantly responsive on the latter.
 
Just reading a matrix by rows or by columns make an enormous difference in cache, even in Python.
That's a 10x difference, at most. And if you just transpose it first, then numpy should have an optimized transpose.

Yes, Python has some performance minefields. I would not hold it up as a good example of a modern language. It was never intended to do the heavy lifting, but merely to be a scripting language.

When I'm talking about modern languages and runtimes, I mean things more like recent versions of C++ and Rust.
 
Yes and no. Features get layered on that didn't exist, back in those days. You can't directly compare today's software with back then.

One thing that happened in the 1990's was an unprecedented rise in single-threaded processing speeds. So, a program might be developed for a 386, but when you run it on a Pentium, it's suddenly like 10x as fast. If it was usably fast on the former machine, it's instantly responsive on the latter.
There is a lot of software still doing the same tasks as 20 years ago,, or with just a few extra features that does not warrant the change in uptime efficiency, and can be compared. The changes in resources and runtime efficiency today are various magnitude orders worse. Every Android APP is 90% api's calling each other those days.

That's a 10x difference, at most. And if you just transpose it first, then numpy should have an optimized transpose.

Yes, Python has some performance minefields. I would not hold it up as a good example of a modern language. It was never intended to do the heavy lifting, but merely to be a scripting language.

When I'm talking about modern languages and runtimes, I mean things more like recent versions of C++ and Rust.
I agree that trying to outsmart the compiler with modern C++ and Rust is a bad idea, both languages in their current form are already very machine conscious and most of the non-matching stuff is meant for portability, so before going with Rust unsafe mode or tinkering with abstract-less C++, you should do it if your task strictly requires it, and it will not compromise debugging or teamwork.

However, it is still relevant to know how the compiler and the hardware level works on a basic level, so you can avoid performance pits and cross-platform bugs.
 
My take is somewhat different. I'd say the problem of needing to understand the hardware is largely addressed by optimized libraries, languages, and compilers. If you need an optimized datastructure, chances of achieving better than the ones in standard libraries and language runtimes with your own hand-coded version are slim, and mostly only to the extent that you can tailor your implementation to the constraints of your specific use case.

I've definitely seen people get so carried away with loop-level code optimizations that they end up optimizing a bad datastructure with poor scalability. The wins you can get from clever code optimizations are usually just a few X, at best, while the wins from switching to scalable algorithms and datastructures can be orders of magnitude. Modern languages and libraries tend to be designed to make it easy to write code that scales well. We're finally exiting the dark ages of plain C code.
Agree on all but the "dark ages" of C code were born from the hardware of the time. No mental healthy programmer had choosed C vs Java (or whatever you like) to be productive.
The factor that you do not consider, in what you wrote, is that with the extremely powerful hardware we have today, nobody cares to choose the optimal way to code from a performance perspective, but the developers choose always the most easy way (often also the quickest) to solve a problem. The result may be ten times slower, but who cares, running in a browser constrained by network latency and bw or on a machine with 10000 MIPS the difference may be hardly noticeable.
 
  • Like
Reactions: vijosef
Agree on all but the "dark ages" of C code were born from the hardware of the time.
Well, C was a big step forward, at the time. Computers of the era didn't have enough memory or fast enough CPUs to compile modern languages like Rust. One reason C remained so popular is that its runtime doesn't do stuff without you knowing about it. There's no garbage collector, no hidden function calls, no hidden heap allocation, etc. That made it good for when you wanted tight control over what's happening or when you have very little headroom.

The factor that you do not consider, in what you wrote, is that with the extremely powerful hardware we have today, nobody cares to choose the optimal way to code from a performance perspective,
It depends, of course. These days, programmers have more luxury not to care much about performance, but it's still quite possible to stumble into a pitfall or do something boneheaded. However, I think game programmers spend a lot of time on code optimization and obviously the AI frameworks are optimized to the gills. There are plenty of niches where people do still spend considerable time & effort on code optimization.

but the developers choose always the most easy way (often also the quickest) to solve a problem. The result may be ten times slower, but who cares, running in a browser constrained by network latency and bw or on a machine with 10000 MIPS the difference may be hardly noticeable.
I've used Electron apps, too. And sat there scratching my head at why Adobe Acrobat (PDF reader) somehow always seems a little sluggish, no matter how fast my PC, not to mention MS Office apps.

I think a lot of that is due to successive rounds of architects who looked at security, localization, portability, GUI-skinning, and other requirements and decided the easiest way to address the requirement du jour was by adding yet another layer. By the time you reach the poor application programmer, there's not always much they can do about it.

Now, I don't mean to suggest that modern software stacks couldn't be optimized. I'm sure there a lot of room for improvement, if the incentive would be there, without many compromises on the requirements. It's just a combination of factors (deadline pressure, laziness, ignorance, powerlessness) that conspired to create the situation we're in. Not only because your average programmer doesn't know much about hardware.

Speaking of which, have you ever heard the phrase "knowing just enough to be dangerous"? Having a bit of hardware knowledge can potentially lead one down the path of premature optimization, which can make a big mess, without yielding much (if any) improvement.

When it comes to code optimization, always measure first! That way, you can make sure you're solving a real problem, and then quantify the effect of your changes to be certain you made a worthwhile (and positive!) change.

Beyond that, you can find guidelines and tips for writing efficient code. When it's easy and doesn't increase complexity to take the more efficient option, it should be a developer's default choice.
 
Last edited:
  • Like
Reactions: NinoPino
No. Go to stackexchange, and watch questions about slow code written in python, and you will frequently see solutions that work 100X or even 1000X faster.
Just reading a matrix by rows or by columns make an enormous difference in cache, even in Python.
If a function call needs to fetch data from 10 different places, it is orders of magnitude slower than the same function getting all data from a single parameter.
Many complex functions can be reduced to a small number of binary operations.
This is the type of softwares that are a small percentage of the total.
The majority of development is done on business softwares, databases, UIs, web, mobile and so on.
 
Thorvalds is wrong here, probably also because he doesn't understand anything about chip design (yes, ok, I only co-developed an 8-bit MCU whose main core has just 2300 transistors but is still Turing-complete).

All modern and powerful CPUs are internally VLIW CPUs - which is the XXL increase of RISC, so to speak. Yes, even Intel, AMD work that way. You have an amd64 CPU? Then it is internally more RISC than CISC, thanks to VLIW-translation.

Out-of-order is only possible with reorderable VLIW code. All hardwired CPUs are basically in-order and correspondingly inefficient.

Internally, the platform opcodes are translated into micro-ops - it is almost completely irrelevant which architecture is translated to which other architecture. IDT showed 25 years ago that it was possible to run x86, MIPS and later even ARM code on their C6 CPUs purely by swapping the micro code - which in this case also translated the opcodes - on the same CPU - and with high performance!

In this respect, a manufacturer of decent x86/amd64/ppc/arm CPUs could simply take its VLIW core and adapt the translator for RISC-V. This is really extremely trivial, as it is practically only a matter of 1:n opcode mappings, with n normally between 1 and 8.

Patents are most likely to interfere. Anyone who acquires a patent for register renaming, for example, normally does so within a narrow framework and cannot simply use it on any platform.
 
Thorvalds is wrong here, probably also because he doesn't understand anything about chip design (yes, ok, I only co-developed an 8-bit MCU whose main core has just 2300 transistors but is still Turing-complete).

All modern and powerful CPUs are internally VLIW CPUs - which is the XXL increase of RISC, so to speak.
I don't want to debate your claims, other than to point out that Torvalds worked at a company called Transmeta, for a couple years. Look it up.

As for his comments, I took them to apply primarily to system-level architecture stuff - not so much microarchitecture. Building a fast core is one thing, but if you really want it to scale well and efficiently handle server-oriented workloads, that layers on an additional degree of complexity, with further challenges that need to be mastered.

a manufacturer of decent x86/amd64/ppc/arm CPUs could simply take its VLIW core and adapt the translator for RISC-V.
Point of interest: Nvidia allegedly did this for ARM ISAs with their Project Denver, which managed to live for two generations before apparently being snuffed. IIRC, it didn't perform that spectacularly, on general-purpose code.
 
However, I think game programmers spend a lot of time on code optimization and obviously the AI frameworks are optimized to the gills. There are plenty of niches where people do still spend considerable time & effort on code optimization.
Game engine developers spends some time optimizing, and AMD/Nvidia/Intel helps with new features and drivers as well, but the rest of the game programming environment today will rarely optimize at all, the focus is on allowing the art team and direction to manage things relatively easily themselves, and companies will hire another artist over an extra programmer most times.

As for AI, literally using AI is employing a method of data analysis which is literally just brute forcing data in the least optimized way possible for an outcome, and the algorithms themselves for it are still a work in progress in terms of optimization (since performance is the biggest bottleneck at the moment.)

Avoiding AI by actually knowing how to do something accurately will always be various order of magnitude more efficient. AI is literally brute forcing knowledge gaps when accuracy is not important.

AI by itself is the next step of the high-level computing tasking in a way, and that is not negative by itself, just that it should be used more for research than run timing things all the time with it.

Our current hardware is not advanced enough to run time with it as we doing, and we are brute forcing scarce energy resources into it, when a human brain is various orders of magnitude of more energy efficient for the same tasks; we are literal quantum computers.

At least, AI is actually useful, unlike throwing away all that energy into inefficient cryptocurrency models. But the fact our economic models are not scaling the value of energy and natural resources like their real weight is extremely concerning, we are destroying the planet due to that.

And this is the real root why programming is disregarding optimization, it is literally only a concern when the economic cost of hardware becomes too inaccessible for the expected consumers. Energy not being free is also partially a limit, but being at 10-30 times cheaper it should be definitively is allowing for a lot of headroom, to the point hardware thermals are often times more the hard limit instead.
 
Last edited:
AI by itself is the next step of the high-level computing tasking in a way, and that is not negative by itself, just that it should be used more for research than run timing things all the time with it.

Our current hardware is not advanced enough to run time with it as we doing, and we are brute forcing scarce energy resources into it, when a human brain is various orders of magnitude of more energy efficient for the same tasks;
"AI" has two distinct advantages over humans. First, you can afford to spend like 10x as much money training a model to do something that you currently employ humans to do, because that trained model can be instantiated an arbitrarily large number of times, once its training is complete. You could theoretically take that one trained model and use it to replace 100 or 1000 human knowledge workers (assuming if it were nearly as good). You can't do that with humans. It takes about 2 decades and lots of money to raise and educate each human worker.

The other "cool trick" AI can do is spend all of its computation on just the task at hand. It doesn't need to eat, breath, focus and scan eyeballs over a screen, read and interpret the text, then translate what it wants to do into muscle movements to type them out on a keyboard and mouse. Instead, AI can basically just ingest and produce textual data, natively. People tend to miss this, when they're comparing our brain size & complexity to AI models. I don't know how much, but certainly most of our brain is just overhead, when it comes to knowledge work.

Furthermore, the amount of truly productive hours you can spend in a day is limited, while AI can run nearly 24/7. That further compensates for a difference in cost & capability.

the fact our economic models are not scaling the value of energy and natural resources like their real weight is extremely concerning, we are destroying the planet due to that.
Producing & sustaining so many humans is (so far) even more resource-intensive than AI!
; )
 
And transmeta made some mistakes. Biggest mistakes were taking vc money and selling out to Intel (who subsequently destroyed their tech and all emails).