News SiFive Readies RISC-V Desktop PC For Devs, New CPUs with Vector Extensions

other than being open source, what are the benefits of RISC vs x86/ARM/etc...

i know in the past it was a better optimized instruction set, but i thought that was pretty much incorporated into x86.

can anyone enlighten me?
 
  • Like
Reactions: artk2219
other than being open source, what are the benefits of RISC vs x86/ARM/etc...

ARM is a RISC, not so long ago ARM = Accorn RISC Machine, but now it (patented) just ARM.

Now in the topic of RISC vs CISC, the R in RISC stands for Reduced (as opposite Complex in the CISC), meaning that the number of instruction that can be executed (called Instruction Set) of RISC are significanly lower compare to the CISC, and the instruction on the RISC is (usually) fixed length compare to variable length in CISC, this translate to significantly more simple to create circuitry (in term of number of transistor budget) to execute that Instruction Set, which result fewer trace, fewer power needed, more efficient, etc.

The chip needed to run RISC is significantly smaller (in term transistor budget) than CISC, and needed far less power to operate. The catch is, because number of Instruction is limited, to perform complex calculation it has to do more step (cycle) than the CISC. For example, if for particular computation the CISC only need 3 cycle, the same computation maybe need 15-25 cycle in RISC, because the RISC need to emulate that complex computation into several simple computation.

In general, if the load is 70:30 (simple:complex) computation, the the RISC machine with same clock speed is 30% slower than CISC machine, but need much less power to operate, this is why the RISC is widely adopted in portable/mobile device and devices that do not need complex computation. This of course is in assumption that everything else is being equal, no optimization, no specificc circuit, etc.
 
Yet to see any performance figures from these new (alternative) CPUs. Can they even compete with the current crop of CPUs, in terms of performance (to be useable)? Another issue is development and software ecosystem. 🤔
 
  • Like
Reactions: artk2219
ARM is a RISC, not so long ago ARM = Accorn RISC Machine, but now it (patented) just ARM.

Now in the topic of RISC vs CISC, the R in RISC stands for Reduced (as opposite Complex in the CISC), meaning that the number of instruction that can be executed (called Instruction Set) of RISC are significanly lower compare to the CISC, and the instruction on the RISC is (usually) fixed length compare to variable length in CISC, this translate to significantly more simple to create circuitry (in term of number of transistor budget) to execute that Instruction Set, which result fewer trace, fewer power needed, more efficient, etc.

The chip needed to run RISC is significantly smaller (in term transistor budget) than CISC, and needed far less power to operate. The catch is, because number of Instruction is limited, to perform complex calculation it has to do more step (cycle) than the CISC. For example, if for particular computation the CISC only need 3 cycle, the same computation maybe need 15-25 cycle in RISC, because the RISC need to emulate that complex computation into several simple computation.

In general, if the load is 70:30 (simple:complex) computation, the the RISC machine with same clock speed is 30% slower than CISC machine, but need much less power to operate, this is why the RISC is widely adopted in portable/mobile device and devices that do not need complex computation. This of course is in assumption that everything else is being equal, no optimization, no specificc circuit, etc.

what type of workload would be considered simple vs complex? the article speaks of AVX type loads being added. wonder what else would be considered complex vs simple calculations.

seems like some sort of hybrid chip (which are starting to see the light of day) would be the eventual way to go. whatever load is present could be sent to the right cores for the job. with the non-optimized cores taking overload work if they are not already doing something else. but that's just a wild guess on my part.
 
  • Like
Reactions: artk2219
The difference between CISC and RISC is actually the structure of the instruction itself. 😉

Typically a single RISC instruction consists of several simple sub-instructions (micro operations) that the CPU core can execute directly. The efficiency of RISC is that one single "instruction" can do multiple integer and arithmetic operations together. That is why RISC can execute more operations than CISC per single instruction cycle or per clock cycle... 🤠

On the other hand, a single CISC instruction will be need to be translated into sub-instructions (micro operations) that can be executed within the CPU core. Usually one CISC instruction can only do a single integer or arithmethic operation. Addtion of much more complex (for multiple operations per instruction), SIMD-style and vector instructions helps alleviate some of these shortcomings. AVX2 and AVX512 are the latest additions... 🤓

AVX-style instructions are typically a CISC type, needs to be translated. :vip:
 
Last edited:
  • Like
Reactions: artk2219
other than being open source, what are the benefits of RISC vs x86/ARM/etc...
Careful - RISC and RISC-V are very different things.

RISC is a CPU design approach first popularized in the 1980's, while RISC-V is a RISC-like CPU "instruction set architecture" (ISA) which was developed by an academic + industry consortium and introduced in 2010.

RISC-V is not open source, as misstated by the article. Rather, it is open and royalty-free (meaning anyone can design their own RISC-V CPU without any sort of license). By contrast, ARM is open and non-free (meaning anyone can implement it, but you have to pay ARM for the privilege), and x86 is proprietary (meaning Intel will not let any newcomers start making x86 CPUs).

The main advantage of RISC-V is simply that it's newer and better-matched to current semiconductor manufacturing capabilities and toolchains. x86 is a creature its day, and keeps getting pumped up with botox and plastic surgery to appear fun an exciting, but at a phenomenal cost in terms of engineering resources and power-efficiency. In the end, it's fighting a losing battle.
 
Last edited:
  • Like
Reactions: artk2219
Typically a single RISC instruction consists of several simple sub-instructions (micro operations) that the CPU core can execute directly. The efficiency of RISC is that one single "instruction" can do multiple integer and arithmetic operations together. That is why RISC can execute more operations than CISC per single instruction cycle or per clock cycle... 🤠

On the other hand, a single CISC instruction will be need to be translated into sub-instructions (micro operations) that can be executed within the CPU core. Usually one CISC instruction can only do a single integer or arithmethic operation.
I think you've got that backwards. The R in RISC is for "Reduced", while the "C" in CISC is for "Complex".

A typical example would be how a CISC instruction can combine the instruction's core function (let's say dividing one number by another) with address arithmetic and a load and/or store, in case a source or destination was in memory instead of a register.

Classical RISC instructions would operate register-to-register. If you wanted a load or store, that was a separate instruction. And if you needed to do any address arithmetic (such as adding a constant offset), that would be another instruction - just a normal addition instruction. The idea was that by making the instructions simpler, they could be implemented more efficiently and you saved silicon budget for optimizations like pipelining or superscalar execution. They would also tend to clock higher. The theory was that the net effect would be higher throughput, in spite of the need to execute more instructions to do the same work.
 
  • Like
Reactions: artk2219
Yet to see any performance figures from these new (alternative) CPUs.
Here are some benchmarks from their earlier generation, compared with the ARM-based Nvidia Tegra TX2. It's not pretty, but I think the point was more to have a functional software development platform than a performance-competitive core:

https://www.phoronix.com/scan.php?page=news_item&px=SiFive-RISC-V-Initial-Benchmark

Can they even compete with the current crop of CPUs, in terms of performance (to be useable)?
Performance will take a while, and needs someone with extremely deep pockets to invest BIG in a high-performance core. Just look at how long it's taken ARM to reach a compititive stance vs. x86!

FWIW, I think Alibaba's RISC-V design is a fairly wide superscalar core:


Another issue is development and software ecosystem. 🤔
Right now, Linux seems well-supported. And with Linux support, you can build most open source software for it. So, for embedded applications and certain cloud uses, it could be reaching viability.
 
Last edited:
  • Like
Reactions: artk2219
The main advantage of RISC-V is simply that it's newer and better-matched to current semiconductor manufacturing capabilities and toolchains. x86 is a creature its day, and keeps getting pumped up botox and plastic surgery to appear fun an exciting, but at a phenomenal cost in terms of engineering resources and power-efficiency. In the end, it's fighting a losing battle.

that makes sense. a new way of looking at chip design vs band-aid updates to keep an old platform current. never thought of it that way. but like ipv4 vs 6 even newer, better optimized and designed for modern uses is not enough most of the time to get us to move forward. too much invested in the old tech to move happily. it would take a good leap in performance past x86 to get us to move from it for sure. that much i know is true.
 
  • Like
Reactions: artk2219
it would take a good leap in performance past x86 to get us to move from it for sure. that much i know is true.
Well, perhaps you've heard that Apple is ditching x86 for their own ARM designs? ARM is another RISC-like CPU instruction set architecture (ISA). So, it's very possible.

To fully appreciate how much efficiency x86 is leaving on the table, check this out:
N1.png

(Source: https://www.anandtech.com/show/15967/nuvia-phoenix-targets-50-st-performance-over-zen-2-for-only-33-power )

Click for a bigger view. Even Intel's 10 nm Ice Lake CPU can't touch the ARM chips in efficiency, and only just matches Apple's A13 in performance.

As far as I know, there's no fundamental reason why someone couldn't squeeze nearly comparable levels of performance and efficiency out of a RISC-V design, but it will take much time and money to get there.
 
  • Like
Reactions: artk2219
very interesting. learned something new today. i knew they were very energy efficient (hence the mobile love for them) but did not realize they were approaching such performance abilities.

here's to hoping they get the funding they need to make it better. would be nice to get some more competition in the desktop cpu world.

i recall reading about apple dumping x86 but figured it was mostly a desire to keep all tech in house if possible to avoid anyone else being able to mess with their ability to function. think everyone in the tech industry has butted up against copyright issues in one way or another. to be totally self sustained for hardware is a great way to stay out of the crossfire of copyright wars.
 
  • Like
Reactions: artk2219
I think you've got that backwards. The R in RISC is for "Reduced", while the "C" in CISC is for "Complex".

A typical example would be how a CISC instruction can combine the instruction's core function (let's say dividing one number by another) with address arithmetic and a load and/or store, in case a source or destination was in memory instead of a register.

Classical RISC instructions would operate register-to-register. If you wanted a load or store, that was a separate instruction. And if you needed to do any address arithmetic (such as adding a constant offset), that would be another instruction - just a normal addition instruction. The idea was that by making the instructions simpler, they could be implemented more efficiently and you saved silicon budget for optimizations like pipelining or superscalar execution. They would also tend to clock higher. The theory was that the net effect would be higher throughput, in spite of the need to execute more instructions to do the same work.
That is incorrect, RISC not related register-to-register operations as it can also be done with CISC as well (with MOVs). Address generation in CISC is also automatically handled.. 😛

The actual reason its "Reduced" is because those sub-instructions (micro operations) are only few bits wide (meaning it can be as little as 8 to 16 primitive instructions just adequate for internal CPU execution) within the main instruction body itself. RISC can have use this very small set of sub-instructions (micro operations) in various combinations to generate a single CISC-like operation or to generate multiple operations (as mentioned). 😉

On the other hand, usually CISC have more instructions ranging from 8-bit wide (256 instuctions) to 32-bit wide (up to 4 billion instructions). Hence "Complex" as the whole large instruction set can consist of highly complex CPU instructions (include memory block operations, SIMD, vector, etc) . And each of these CISC instruction will be translated into RISC-like micro-operations internally within the CPU core. 🤓

This means a simple memory operation with one single CISC instruction will use several sub-instructions within a single RISC instruction, just for example... 🤖
 
Last edited:
  • Like
Reactions: artk2219
i recall reading about apple dumping x86 but figured it was mostly a desire to keep all tech in house if possible to avoid anyone else being able to mess with their ability to function.
Funny thing is that they implemented an ARM chip, which relies on patents and standards set by ARM. With Nvidia now owning ARM, Apple is forced to deal with Nvidia - a supplier they'd long-ago banned from their products.

So, if Apple hadn't invested so much in ARM, by this point, perhaps they'd delay the transition and go for RISC-V to be truly independent of anyone. However, I think it's too late for that, and they now are going to have to see how things play out with Nvidia.
 
  • Like
Reactions: artk2219
That is incorrect, RISC not related register-to-register operations as it can also be done with CISC as well (with MOVs).
I never said CISC CPU couldn't use all-register operands, just that RISC CPU would (for instance) have that restriction.

From Wikipedia:

The RISC designs, on the other hand, included only a single flavour of any particular instruction, the ADD, for instance, would always use registers for all operands.

The actual reason its "Reduced" is because those sub-instructions (micro operations) are only few bits wide (meaning it can be as little as 8 to 16 primitive instructions just adequate for internal CPU execution) within the main instruction body itself. RISC can have use this very small set of sub-instructions (micro operations) in various combinations to generate a single CISC-like operation or to generate multiple operations (as mentioned). 😉
Can you cite any references on that? Again, it sounds like your mixing up CISC and RISC, with a dash of VLIW thrown in there. VLIW is the one where they pack multiple instructions together.
 
  • Like
Reactions: artk2219
Can you cite any references on that? Again, it sounds like your mixing up CISC and RISC, with a dash of VLIW thrown in there. VLIW is the one where they pack multiple instructions together.
Have you seen how RISC instructions look like in bits? In most RISC implementations, a single instruction have group of bits (that "few bits wide") with each group of bits sub-intructions assigned to certain micro operations within the CPU core execution unit itself. This the very minimal number of instructions per group is how the term "reduced" came about. These sub-instructions are executed directly in the CPU core (no translation). This was the original design and structure of RISC. 😉

But as CPU technology progressed with the need for more complex arithmetic, memory and floating point operations to enable more performance and and computing features, these minimal set of sub-instructions has been expanded quite alot (to the point its now closer to CISC with some translation required for highly complex operations particularly involving multiple simultaneous data operations such as matrix calculations for example ).. 🤓

VLIW consists of multiple sequence of instructions that can be fused together and executed as a single operation (for increased parallelism), with each instruction can be either RISC or CISC type (or even a mixture of SIMD and vector instructions). 🤠
 
  • Like
Reactions: artk2219
Long time since I posted but it seems it's needed again.

Firstly there is no such thing as CISC, there is RISC and not-RISC, or specifically designs that follow the RISC philosophy and design's that don't. The RISC design philosophy is simply that all instructions are the exact same size and take exactly one cycle to execute. Those two concepts, when taken together, enable much simpler pipeline and scheduling. Now not everyone followed those requirements exactly, but the idea is generally followed by every design these days.

About x86, there hasn't been a pure x86 CPU designed in decades now. Intel and AMD CPU's are internally RISC CPU's that use an instruction decoder to convert x86 instructions into Load/Store RISC instructions, schedule those instructions on internal ALU/AGU/MMU units, then return the result to the CPU stack for the next instruction to use. Intel's current uArch has 4 ALU's per code along with a bunch of other special purpose units.

https://en.wikichip.org/wiki/intel/microarchitectures/coffee_lake

In the RISC world an execution instruction can only go register to register, we need to load (read) our values into the register from memory and store (write) the output back to memory. x86 on the other hand allows memory address's to be used as a source for many instructions, requiring several cycles to execute the instruction.

ADD AX C800H (add the value at memory address C800H to the BX register and output in the AX register)
MOV 4000H AX (copy the output to memory address 4000H)

that would translate into
LOAD R1 C800H
ADD R1 R2
STORE R2 4000H

This can also be combined as,
ADD 4000H C800H (add the value at C800H to the value at 4000H and store at 4000H)

that's
LOAD R1 4000H
LOAD R2 3800H
ADD R1 R2
STORE R1 4000H

Back when memory was super expensive saving space by having large multi-purpose instructions was thought to be important. These instructions would take up less space in memory and require less machine code to execute. When memory got much cheaper the RISC design philosophy came about as a way to reduce the complexity of the instruction scheduler on the CPU.

Agner has a list of how long each instruction type takes on Intel and AMD architectures.

https://www.agner.org/optimize/instruction_tables.pdf

We can see that most register to register instructions unsurprisingly take 1 CPU cycle to execute, while those involving memory address's take a few more as they need to first be broken into separate read and write uOps.
 
  • Like
Reactions: artk2219
Firstly there is no such thing as CISC, there is RISC and not-RISC, or specifically designs that follow the RISC philosophy and design's that don't. The RISC design philosophy is simply that all instructions are the exact same size and take exactly one cycle to execute. Those two concepts, when taken together, enable much simpler pipeline and scheduling. Now not everyone followed those requirements exactly, but the idea is generally followed by every design these days.
Because each set of sub-instructions is assigned to a micro operation directly within the CPU execution unit. Depending on the design of the RISC CPU core architecture (especially the super wide ones with multiple ALUs), multiple sub-instructions allows several micro operations to executed simultaneously within either a CPU execution cycle or one clock cycle. This is why RISC is highly efficient per clock or per CPU execution cycle. Although that is not always the case, still depends on the possible combination of those sub-instuctions and whether there are dependancies (some cannot be executed in the same cycle but require the next one).🤓

About x86, there hasn't been a pure x86 CPU designed in decades now. Intel and AMD CPU's are internally RISC CPU's that use an instruction decoder to convert x86 instructions into Load/Store RISC instructions, schedule those instructions on internal ALU/AGU/MMU units, then return the result to the CPU stack for the next instruction to use. Intel's current uArch has 4 ALU's per code along with a bunch of other special purpose units.
Already mentioned this earlier
On the other hand, usually CISC have more instructions ranging from 8-bit wide (256 instuctions) to 32-bit wide (up to 4 billion instructions). Hence "Complex" as the whole large instruction set can consist of highly complex CPU instructions (include memory block operations, SIMD, vector, etc) . And each of these CISC instruction will be translated into RISC-like micro-operations internally within the CPU core.
😉

In the RISC world an execution instruction can only go register to register, we need to load (read) our values into the register from memory and store (write) the output back to memory. x86 on the other hand allows memory address's to be used as a source for many instructions, requiring several cycles to execute the instruction.
Depending on RISC implementation, memory operations can be excuted within a single RISC instruction (using multiple sub-instructions within the instruction itself). As already mentioned earlier..
The actual reason its "Reduced" is because those sub-instructions (micro operations) are only few bits wide (meaning it can be as little as 8 to 16 primitive instructions just adequate for internal CPU execution) within the main instruction body itself. RISC can have use this very small set of sub-instructions (micro operations) in various combinations to generate a single CISC-like operation or to generate multiple operations (as mentioned). 😉
And also..
This means a simple memory operation with one single CISC instruction will use several sub-instructions within a single RISC instruction, just for example...
:vip:

When it comes to SIMD and AVX instructions, imagine the (large) amount of complex combination of MOVs, MULTs and ADDs sub-instructions required in traditional RISC (and some of which cannot be always executed simultaneously). Also uses up lots of instruction cycles as well. Hence these SIMD and AVX operations will have to follow the CISC route (where all the required micro operations are already preset for the operation). 🤠
 
Last edited:
  • Like
Reactions: artk2219
Funny thing is that they implemented an ARM chip, which relies on patents and standards set by ARM. With Nvidia now owning ARM, Apple is forced to deal with Nvidia - a supplier they'd long-ago banned from their products.

So, if Apple hadn't invested so much in ARM, by this point, perhaps they'd delay the transition and go for RISC-V to be truly independent of anyone. However, I think it's too late for that, and they now are going to have to see how things play out with Nvidia.

Ah but Nvidia doesn't own ARM yet, and hopefully they never will. Also thats why RISC-V is seeing such a surge in development, Nvidia hasn't exactly been the best partner to work with or be bought by, and their history speaks for itself.