News Loongson Technology Develops Its Own CPU Instruction Set Architecture

Admin · Apr 16, 2021

LoongArch to power next-generation Loongson CPUs.

Loongson Technology Develops Its Own CPU Instruction Set Architecture : Read more

setx · Apr 16, 2021

Using RISC-V would be a nice move.
Inventing their own ISA is only going to waste software development efforts.

Pyrostemplar · Apr 16, 2021

However, only time will tell whether the company can actually develop a competitive ecosystem for its LoongArch.

As if it mattered in China. The Party just needs to make it mandatory.

hotaru.hino · Apr 16, 2021

setx said:
Using RISC-V would be a nice move.
Inventing their own ISA is only going to waste software development efforts.

It's probably based on some existing ISA anyway and claiming it's some "brand new tech that will put the West to shame"

Also it's China. They're willing to go through the trouble of shifting map coordinates around so satellite imagery doesn't match up with road maps for "national security"

shady28 · Apr 16, 2021

"...it had developed its own CPU instruction set architecture (ISA), Loongson Architecture (LoongArch) that has nothing to do with architectures designed outside of China. "

And we're supposed to believe that, why?

InvalidError · Apr 16, 2021

shady28 said:
And we're supposed to believe that, why?

Putting together your own ISA is easy enough, the really tedious part is figuring out what instructions are most useful and performance-critical, most of which achievable by profiling software on existing architectures. The rest is mainly a matter of coming up with your own names, an efficient operation encoding scheme and operand packing formats, then plugging all of that into a cross-compiler such as a customized GCC to spit out native binaries.

An ISA is just the instruction set with supporting architectural features such as the registers and flags instructions can access, it isn't tied to any particular implementation in software emulators, re-compilers, FPGA implementations, silicon, etc. Anyone can write a minimalist one on a napkin if they wanted to, that isn't the hard part.

shady28 · Apr 18, 2021

InvalidError said:
Putting together your own ISA is easy enough, the really tedious part is figuring out what instructions are most useful and performance-critical, most of which achievable by profiling software on existing architectures. The rest is mainly a matter of coming up with your own names, an efficient operation encoding scheme and operand packing formats, then plugging all of that into a cross-compiler such as a customized GCC to spit out native binaries.

An ISA is just the instruction set with supporting architectural features such as the registers and flags instructions can access, it isn't tied to any particular implementation in software emulators, re-compilers, FPGA implementations, silicon, etc. Anyone can write a minimalist one on a napkin if they wanted to, that isn't the hard part.

You just spewed a bunch of garbage, you clearly don't know what an ISA is in this context.

Meanwhile "LoongArch" is turning out to be a modified MIPS64 ISA where they've merely renamed some of the instructions.

https://www.realworldtech.com/forum/?threadid=201588&curpostid=201593

By: Ariadne Conill (ariadne.delete@this.dereferenced.org), April 16, 2021 10:53 pm

"
I did some digging into this, as I maintain the MIPS64 port in Alpine, and was planning to target Loongson at some point.

From what I can tell, LoongArch is just a fork of the MIPS ISA, in a similar way to how MIPS32r5 and MIPS64r6 are backward-incompatible forks of the MIPS ISA. I conclude this based on the fact that LoongArch offers the same extensions that MIPS CPUs do, just with slightly different names, as can be seen in this translated press release. For example, what MIPS CPUs call the Virtualization Extension (VZ), LoongArch calls the LoongArch Virtualization Extension (LVZ). Another example is that the MIPS SIMD instructions (MSA) are renamed to LoongArch Vector Extension (LSX).

Specifically, I believe LoongArch to be a fork of MIPS64r6. "

InvalidError · Apr 18, 2021

shady28 said:
You just spewed a bunch of garbage, you clearly don't know what an ISA is in this context.

There is no 'in this context' here, an ISA is just a set of definitions by definition. If China wants to rip off the open-source MIPS ISA and spare itself the trouble of writing their own together from scratch, more power to them.

ginthegit · Apr 19, 2021

shady28 said:
You just spewed a bunch of garbage, you clearly don't know what an ISA is in this context.

Meanwhile "LoongArch" is turning out to be a modified MIPS64 ISA where they've merely renamed some of the instructions.

https://www.realworldtech.com/forum/?threadid=201588&curpostid=201593

By: Ariadne Conill (ariadne.delete@this.dereferenced.org), April 16, 2021 10:53 pm

"
I did some digging into this, as I maintain the MIPS64 port in Alpine, and was planning to target Loongson at some point.

From what I can tell, LoongArch is just a fork of the MIPS ISA, in a similar way to how MIPS32r5 and MIPS64r6 are backward-incompatible forks of the MIPS ISA. I conclude this based on the fact that LoongArch offers the same extensions that MIPS CPUs do, just with slightly different names, as can be seen in this translated press release. For example, what MIPS CPUs call the Virtualization Extension (VZ), LoongArch calls the LoongArch Virtualization Extension (LVZ). Another example is that the MIPS SIMD instructions (MSA) are renamed to LoongArch Vector Extension (LSX).

Specifically, I believe LoongArch to be a fork of MIPS64r6. "

From what I am reading, i think Titan's understanding of the ISA is actually accurate. Considering that the x86 and x64 along with the RISC achitectures are essentally the same thing just with different register access and optimised coding for the CISC.
As an engineer who can write in Assember code directly (albeit simple code), I find that the old ISA is weak and slow and needed optimisations. For example using the MOV instruction is lagely pointless as it is just a simple instruction to load a register or offload a register to some other entry, and largely is for structural and internal coding. Because Mov is requred and written so many times within each instruction (for any high level language like C and Java), it can be incorporated (as is done with CISC instructions) to be written out of the coding as it is implied when any other function like ADD or IF statement is used, and it can always be assumed that the working register is Ax. It means that a New ISA can completely disregard it as a call and each instruction is hardwired.
I have for a long time thought that there are many changes to the ISA which have been long awaited to make computers quicker and more efficient (in the same way that direct x 12 did to all its predecessors.

setx · Apr 19, 2021

InvalidError's position on the ISA is correct: it's trivial to create new ISA. But why almost no one does that? Because ISA without software for it is completely useless.

It looks like they want to maintain some compatibility with MIPS with more extensibility:

Instruction encoding is totally different than MIPS (more instruction formats) but (almost) all MIPS instructions can be translated to exactly one LoongArch instruction.

So nothing outstanding from technical standpoint: not a breakthrough but also not a reinventing the wheel.

ginthegit · Apr 20, 2021

setx said:
InvalidError's position on the ISA is correct: it's trivial to create new ISA. But why almost no one does that? Because ISA without software for it is completely useless.

It looks like they want to maintain some compatibility with MIPS with more extensibility:

So nothing outstanding from technical standpoint: not a breakthrough but also not a reinventing the wheel.

It is a shame really, Making a new ISA would be an excellent idea to reduce the ability of hackers, and also reduce calls and latency. If an new internal registry system were to be built, an optimised ISA could be a game changer for Speed and process. It is the Whole idea of the advantages of CISC over RISC so why not.
Compatibility with other ISA is not really necessasy if you create a compiler that can optimise the new code to be translated from the old code.
Its something I dont understand. To get old programs to work on different platforms, you just force the source code through a compiler dedicated to the Chip and the ISA it understands. So getting Windows 10 to work on ARM is as simple as optimising the code through an ARM compiler. The only drawback is the speed at wheich it runs.

hotaru.hino · Apr 20, 2021

ginthegit said:
It is a shame really, Making a new ISA would be an excellent idea to reduce the ability of hackers

Security through obscurity will only get you so far. I mean for kicks, imagine if China did make a new ISA, but required a security clearance to even work with it.

and also reduce calls and latency. If an new internal registry system were to be built, an optimised ISA could be a game changer for Speed and process. It is the Whole idea of the advantages of CISC over RISC so why not.

Most modern processors take the best of both worlds in implementation. The only thing at this point that we could improve is trying to find ways to make software as linear as possible. But if you get that far and you have a large set of data points, we have a better option to run it on: a GPU.

ginthegit said:
Its something I dont understand. To get old programs to work on different platforms, you just force the source code through a compiler dedicated to the Chip and the ISA it understands. So getting Windows 10 to work on ARM is as simple as optimising the code through an ARM compiler. The only drawback is the speed at wheich it runs.

Yes and no. Everything above the kernel level space is likely abstracted enough that you can compile it for a different target and be fine. It's everything at the kernel level that can't simply be compiled for another target as-is. Especially if it's poking at quirks of the ISA or in some cases, system requirements. For example, you can't load a bog standard amd64 build of Linux onto a PS4 despite the PS4 having an x64 processor. Why? Because the amd64 build of Linux still adheres to the IBM PC standard and expects things that the PS4 doesn't have because the PS4 wasn't designed to be IBM PC compliant.

InvalidError · Apr 20, 2021

ginthegit said:
an optimised ISA could be a game changer for Speed and process.

The biggest bottleneck to most CPUs today is conditional jumps, not the hardware architecture or instruction set. There is very little that the ISA can do about typical software having conditional branches every 10-20 instructions, that's why CPUs focused on single-thread performance have ridiculously deep out-of-order superscalar speculative execution to brute-force their way through code.

Making software faster is mostly in software developers' court, got to reduce the amount of speculative branches and make those that cannot be eliminated more predictable. There isn't a whole lot more that hardware can do besides brute-force harder by adding more of everything.

ginthegit · Apr 21, 2021

InvalidError said:
The biggest bottleneck to most CPUs today is conditional jumps, not the hardware architecture or instruction set. There is very little that the ISA can do about typical software having conditional branches every 10-20 instructions, that's why CPUs focused on single-thread performance have ridiculously deep out-of-order superscalar speculative execution to brute-force their way through code.

Making software faster is mostly in software developers' court, got to reduce the amount of speculative branches and make those that cannot be eliminated more predictable. There isn't a whole lot more that hardware can do besides brute-force harder by adding more of everything.

Conditional jumps through IF or FOR statements would easily benefit from direct access to a new Jump register. The Line being read and loaded up to the ISA hardware can automatically do the searches for the new jump address while the comparison is being made, so that the Jump address is ready or almost ready by the time that the comparison in the conditional jump is made. Accessing that new jump space, file or Class would then be quickly accessed using matricies codes to access the RAM or Diskspace required. The simple fact is that the 8086 IBM ISA dealt with a simple varied set of instructions that allowed the user to do anything they wanted to if they could figure out the coding for it. The General purpose registers were developed because of coding like Bubblesort that would need temp stores that could allow quicker access, but still had to have user control to directly load Data onto the registers. A new ISA can figuratively be slightly more complex, in that an IF command automatically loads the operands into new General purpose registers (reducing the mov instructions and making conditional statements less intense by parralleling the work load directly. As most games with user control are mostly controlled by conditional commands, changes like this to the instruction and ISA would not require code change at all.

And Like AI, I have little patience for the superscalar speculative pipelines (being as I am an Engineer that specialises in both coding and Digital construction), find that it is a unnecessary work around to the said above solution. The pipelines are only as good as the designer designed them to be and, can be a bonus for video and audio (as they just contunually move through the same instructions over and over again, the only difference is the interpretation of the data. When Pipelines do fail by interuption or fault, they fail dramatically and actually reduce the benefit of the pipelining, which in games is most of the time (except for proceedural calls and rendering of video or sound etc). Out of Order execution is a bust for Pipelining and a waste of Transistors that couls be better used for optimising registers.

ginthegit · Apr 21, 2021

"Yes and no. Everything above the kernel level space is likely abstracted enough that you can compile it for a different target and be fine. It's everything at the kernel level that can't simply be compiled for another target as-is. Especially if it's poking at quirks of the ISA or in some cases, system requirements. For example, you can't load a bog standard amd64 build of Linux onto a PS4 despite the PS4 having an x64 processor. Why? Because the amd64 build of Linux still adheres to the IBM PC standard and expects things that the PS4 doesn't have because the PS4 wasn't designed to be IBM PC compliant."

Have you ever messed with the Kernel, it works as a sorting layer that is like indexing for an OS. For code outside an OS, which the actual execution code in assembly is, it is irrelevant. The Kernal is considered a high level code or mid level code that the software and hardware use to allow the OS to both monitor and keep track of the progress and timing of operations. The Old DOS games, like "syndicate wars" had no need for Kernels, and only because of the malpractice of Bill Gates, that he was litigated for, made an OS a prerogative to run most games (as its Kernel routed graphics calls through a HAL controlled by the Kernel.

Note, that until Windows Vista, previous versions of Windows could use the Kernel or have direct access to system resources (so you could effectively manually control the addressing of your hardware without Kernel interuption, and the Kernel would not interupt it, as your interuptions would have access directly to the IRQ's in the hardware. Then Bill decided that the Kernel would protect direct access to the resources and force everything to go through the Kernel (for Security apparently), even though now, because of this, anyone that hacks your Kernel can track every thing you do on windows at all levels.

Surprisingly Linux took that same approach, but as a superuser you can bypass its total control. And from time to time, updates can change your Kernel to a completely different format, to prevent hackers from learning and attacking the Kernel that is oh so much easier in Windows.

To prove my point, Look at the old game Syndicate Wars form 30 Years past. It had its own Renderer (made by Software programmers that understood hardware), it had complex 3D graphics with a twistable screen, wasn't limited to 32bit or 64bit memory addressing, but what the Programmers could decide based on Hardware detected, and had all of its own Kernel level driven control layer that worked seemless without all the problems that we face in modern games. People were inventive back then, and they proved that you didn't need generalised APIs or Kernels if you knew how to do it yourself. Now lazy engineers and coders have ruined innovation, and in Intel's case that shows with its failure to get 10nm running in a cost effective way.

So if you only thinking of the CPU having access to the accumulator as the control register, you are limiting the way in which the coding can be optimised. This also leads to this 'In order' and 'Out of Order' only, execution argument. My proposal would be to make more registers and have parallel loading registers that directly deal with the Conditional statements, dealing with the argument and both resolution at the same time (providing they are not nested), reducing countless MOV calls, and rather than handing off these conditional calls to the Cache, could have direct register access open to them.
New ISA's could also be made to deal with parallel computing better, rather than createing Kernels and other abstraction layers to sort out threads and execution.

Compilers, I see you dont really understand them. Compilers are made to avoid messing with the Kernel (apart from giving the essentail information to them), but why should that be? If you make a complete new ISA on an architecture, China could choose to have its compilers not only compile programs, but also compile Kernels in a modular way, Large Cache on Processors could hold the Kernels rather than having them offload on to the main RAM which is massively slower.

As always, the probelm is complexity and originality. AMD has competently proved that its R&D guys have this in spades per person more than Intel, but Intel has the money, and drives back innovation (like it did with SSEa and 3D now etc etc, and holds back innovation due to money.

The fact we have 4 different ISA's already means that making a new one is not difficult, rather than having to boost the same crap 8086 design and brute forcing it with speed and code to make things faster, when simple innovation and properly written compilers could offset that problem.
The cell processor was an example of this. It could have been a game changer, but the software engineers were to lazy and locked their function into set tasks, which not only wasted all the potential, but obseleted an optimised approach to Parallel computation with a single thread using multi core approach.

The limits are to the eye of the beholder and he who holds the intellectual property. And Intel is one of the biggest culprits of holding this back.

And yes, I do have enough knowledge to create my own CPU core using my knowledge of Digital, but like they did with the cell processor, even if I was to give it to them for Free, the lazy programmers would need to understand hardware (which most dont) to want to adopt its use. Hence they love Kernel's and API's, because it means they dont ever have to undertand the hardware.

InvalidError · Apr 21, 2021

ginthegit said:
Conditional jumps through IF or FOR statements would easily benefit from direct access to a new Jump register. The Line being read and loaded up to the ISA hardware can automatically do the searches for the new jump address while the comparison is being made, so that the Jump address is ready or almost ready by the time that the comparison in the conditional jump is made.

Looks like you have very little idea of how modern CPUs work.

Speculative execution is when a CPU guesses which branch will be (not) taken, executes based on that assumption and discards the work that it did on speculation if it the guess turned out to be wrong. Modern CPUs can be 200+ instructions deep into speculative branches. Your "jump address register" is unnecessary since the jump address is already encoded in the jump instruction itself.

Out-of-order execution is when a CPU looks ahead of the current instruction pointer or speculative branch (up to 352 instructions ahead for Intel's Sunny Cove) to figure out the most effective execution order based on instruction dependencies and latencies, then re-arrange them to maximize execution resources usage. That's how modern CPUs can still manage to average 3+ instructions per clock despite branches. It also means 200+ instructions at various stages of execution potentially getting thrown out when a prediction is wrong.

The problem with jumps is that prediction will never be 100% reliable and to fill that 200+ instructions deep re-order buffer, the CPU will end up 10+ layers deep into predictions.

hotaru.hino · Apr 21, 2021

I'm just going to say this: 🤦‍♀️

ginthegit · Apr 23, 2021

InvalidError said:
Looks like you have very little idea of how modern CPUs work.

Speculative execution is when a CPU guesses which branch will be (not) taken, executes based on that assumption and discards the work that it did on speculation if it the guess turned out to be wrong. Modern CPUs can be 200+ instructions deep into speculative branches. Your "jump address register" is unnecessary since the jump address is already encoded in the jump instruction itself.

Out-of-order execution is when a CPU looks ahead of the current instruction pointer or speculative branch (up to 352 instructions ahead for Intel's Sunny Cove) to figure out the most effective execution order based on instruction dependencies and latencies, then re-arrange them to maximize execution resources usage. That's how modern CPUs can still manage to average 3+ instructions per clock despite branches. It also means 200+ instructions at various stages of execution potentially getting thrown out when a prediction is wrong.

The problem with jumps is that prediction will never be 100% reliable and to fill that 200+ instructions deep re-order buffer, the CPU will end up 10+ layers deep into predictions.

Pffft and again you are just quoting what other people have said. Do you really want me to explain how easy the design of a CPU is from Digital logic gates using State logic and Next state diagrams, lots of Shift registers and Tri-state buffers.... No! I doubt you understand it at all. I can design my own CPU from scratch from knowing Advanced Sequential logic techniques using D-type latches etc etc is exactly what I teach my students Once every year.
The in order and out of order execution actually dates back to what we know as Synchronous and Asychronous execution. The nerits of Asychronous computing are theoretically much better than Sychrounous, as they would mean that they could work more like the Human Brain (hence AI is now a big thing, to try to learn the algorythms that lead to asychronous thought, because they can't learn it direct from Neural Networks or Fuzzy logic Theory (which I also teach). So sychronous it stays as I think that even the Best AI chips will never break the secret of how human brains work.

The Fact that the M1 RISC chips that avoid complex ISA and Large pipelines are now proving that they can possibly beat and excell CISC is also another Proof that CISC and Pipelining is a wasteful and deliciously poor decision to make.

Now the fact that you say the Pipeline predictive branches are already encoding the MOV codes into the instruction line also proves my point of changing the ISA. Of those 200+ predictive choices made by the AI style branch, 25-40% of them will be MOV instructions followed by a register (wasting even more cycles and having to gather the address to put into the Cache or the MEMORY (big slow down)) Supporting my Idea for a change in the ISA. I could easily design an alternative system that could involve no more than 50 logic gates that could change the IF statement into an automatic address gatherer, that when it picks the Address from the Memory, it could place that address from the memory directly into a register, meaning that the next operand would not need to both access the compare, load it to the Bx and then AX, then make the comparison, access the memory stack to find where to put the result, MOV it there and then have to write the address code into the Library area of the memory for Quick access. Each FOR or IF statement has to uses 8+ MOVs to deal with both data and addressing, and on a 64bit addressing system, it means more wasted time waiting for the Shift registers to Fill, empty and reload data (which works like RAM CAS,RAS and TAS latency). You simply don't have the knowledge to fight this argument.

The fact that ISA stays the same is, like with MOOREs law, Industrial standards are the Prefered (and lazy choice). Back when 3D!NOW was introduced (a few developers decided to make their compilers code for it and got a sizable boost, but Intel stifeld the competition with Money and forced itself to be the Industrial standard with SSE and SSE2 etc, trying to edge AMD out of the market but standards.

The ISA could be seriously upgraded, as too could the registry set, but I guess you just dont get it!

This isnt about pedictive coding, its about ISA and the advantages of Possible change to a relative old acchitecture from IBM that is 50 years+ old.

InvalidError · Apr 23, 2021

ginthegit said:
This isnt about pedictive coding, its about ISA and the advantages of Possible change to a relative old acchitecture from IBM that is 50 years+ old.

You should go read some architectural overviews of ARM CPU architectures. The main reason ARM CPUs are getting faster is because the core designs are picking up more of those same pipelining, out-of-order, superscalar, speculative, etc. tricks. Also, Intel did not invent most of those tricks, it got them from DEC which came up with them ~30 years ago for its Alpha CPUs. Since true RISC CPUs need more instructions to perform the same work, DEC needed those tricks long ahead of its time to get a competitive advantage.

ginthegit said:
This isnt about pedictive coding, its about ISA and the advantages of Possible change to a relative old acchitecture from IBM that is 50 years+ old.

IBM did not create x86, it was just Intel's first major client for x86. x86 isn't 50+ years old either since Intel's 8086, the 16bits chip after which the instruction set is named, was launched in 1978, 43 years ago.

ginthegit · Apr 23, 2021

InvalidError said:
You should go read some architectural overviews of ARM CPU architectures. The main reason ARM CPUs are getting faster is because the core designs are picking up more of those same pipelining, out-of-order, superscalar, speculative, etc. tricks. Also, Intel did not invent most of those tricks, it got them from DEC which came up with them ~30 years ago for its Alpha CPUs. Since true RISC CPUs need more instructions to perform the same work, DEC needed those tricks long ahead of its time to get a competitive advantage.

IBM did not create x86, it was just Intel's first major client for x86. x86 isn't 50+ years old either since Intel's 8086, the 16bits chip after which the instruction set is named, was launched in 1978, 43 years ago.

The Question here was who created the ISA? IBM made Intel what it is today by creating the interchangable port capatability and took away from both Intel and Zylog what they accomplished by the ISA they generated. Intel indeed came out with the core, but it was being designed well before its manufacture date, with other parties involved. Intel were the company that benefitted by making the first powerful PC, but the ISA was around long before (even used by the Computer used in WW2). So yes, I made an incorrect use of naming, but in context of this argument, the 8086 ISA was around long before the 8086 in various iterations.
x86 was only a concept that came around because of iterative aditions and the naming applied to it. AMD made the x64 but this is taking away from the argument of the ISA. The ISA is well over 50 years old, and we still use it as a defacto Instruction set that could be easily upgraded to a better system. But standards made by Intel, AMD etc are holding back innovation.

InvalidError said:
You should go read some architectural overviews of ARM CPU architectures. The main reason ARM CPUs are getting faster is because the core designs are picking up more of those same pipelining, out-of-order, superscalar, speculative, etc. tricks. Also, Intel did not invent most of those tricks, it got them from DEC which came up with them ~30 years ago for its Alpha CPUs. Since true RISC CPUs need more instructions to perform the same work, DEC needed those tricks long ahead of its time to get a competitive advantage.

Pipelining is only useful in ARM because the Chips are mostly used for Multimedia or general use where specific Pipelines or Architecture has been added to the chip to run certain functions. Remember Pipelining is a term that can describe Code that is ofloaded to a dedicated chip to natively handle that particular line of code.
Pipelining is useful when watching Video online for example, when you can see the bar at the bottom of the page loading up much faster than you are watching it, as it has already been decoded and waiting to be watched. But for those browsing through videos quickly, Pipelining is a bust and wastes processing power and memory trying to load up what essentially the user will not watch/listen too. So yes RISC has its advantages using Pipelining because using other dedicated hardware for acceleration will always be used this way, and is effectively the way that pipelining works.
Pipelining in CISC works a little differently than in RISC, as CISC adds to the ISA and had either microcode or Hardware to accelerate the process, and these are what add the PIPELINE effect. MMX for example was the equivalent of RISC and their Dedicated Hardware, albeit, the core of the CISC directly interpets and handles the code sending it to each ISA area, where as RISC sends the sode to the appropreate hardware module that decodes it independant. RISC, when it has these Hardware extensions, by definition has extra ISA codes that activate these Multimedia Hardware extensions (which is arguably a ISA addition even if it is a DMA call, as ISA are in reallity DMA calls to sub circuits to open the BUS to recieve it. So RISC becomes CISC in a way when it has extra Hardware modules added to it that need a DMA call to activate, it just isn't literally added to the ISA of the RISC, but the compiler and the Kernel do it instead.
I undestand what Pipelines are, and as RISC keep integrating more and more Hardware modules like AI, Graphics, Multimedia etc etc etc, you are essentially adding Pipelines that can work way ahead of a user. This is always going to be an advantage in tasks that keep using the same decode and send to commands like Music and Video and even games to an extent.

InvalidError · Apr 23, 2021

ginthegit said:
Pipelining is useful when watching Video online for example

Hm, no. That has absolutely nothing to do with pipelining. Go (re-)take compsci 101.

hotaru.hino · Apr 23, 2021

ginthegit said:
The Fact that the M1 RISC chips that avoid complex ISA and Large pipelines are now proving that they can possibly beat and excell CISC is also another Proof that CISC and Pipelining is a wasteful and deliciously poor decision to make.

I wanted to poke at something real quick, but I'll probably stop responding afterwards.

Although it hasn't gotten into any real detail as of late, the number of pipeline stages on Apple's SoC's has been estimated to be 12 for the A6 (https://www.anandtech.com/show/6330/the-iphone-5-review/8) and 16 for the A9 (https://en.wikipedia.org/wiki/Apple_A9). According to Wikichip, Kaby Lake has 14-19 stages (https://en.wikichip.org/wiki/intel/microarchitectures/kaby_lake).
The core difference between a RISC and CISC system now and days is simply the addressing modes avaialble in the opcodes. RISC can only do data modification using registers or immediate values. CISC (or at least x86) allows for data modification using memory locations. However, internally a CISC instruction using memory locations can easily be translated to a series of load/store instructions in RISC and the CPU does the same amount of work anyway.
- Of note, ARM has a variable length instruction set, the THUMB-2 instruction set, for precisely the same reason why CISC has a variable length instruction set: code density.
- Also in order to be ARMv8 compatible, the CPU has to support AArch64, ARM32, and THUMB-2. (https://armkeil.blob.core.windows.n...d-multimedia/ARMv8_InstructionSetOverview.pdf)

I really don't see any advantage or disadvantage between ARM and x86 and as I've parroted around before, the ISA in modern CPU design is largely irrelevant. What matters is how you implement the CPU. If the ISA is the only thing that matters, then even MediaTek's ARM SoCs should smoke the living heck out of Intel's CPUs clock for clock.

ginthegit · Apr 23, 2021

InvalidError said:
Hm, no. That has absolutely nothing to do with pipelining. Go (re-)take compsci 101.

LOL

https://db.inf.uni-tuebingen.de/staticfiles/teaching/ss09/dbcpu/dbms-cpu-2.pdf

Try to keep up mate. If you can't get that Pipelining isnt actually having hardware lines stacked one after the other, but a concept of having the processor running to maximumise the process of the SINGLE Processor.
the above link explains Pipeline concept 101, and explains how the pipline process is continually using the same processor over and over again, storing the data in a special register. and then moving through the same hardware to fill the next area repeat and rinse...

Now, if you are ascribing it to the Graphics Pipeline process, well, this is s;ightly different.

The whole process of Pipelining actually is hampering the process of linking multiple cores together, because the algorythms to do this become infinately more complex.

Pipelining hits are good, but pipelining misses are bad. And the slides will explain 101 Why!

ginthegit · Apr 23, 2021

hotaru.hino said:
I wanted to poke at something real quick, but I'll probably stop responding afterwards.

Although it hasn't gotten into any real detail as of late, the number of pipeline stages on Apple's SoC's has been estimated to be 12 for the A6 (https://www.anandtech.com/show/6330/the-iphone-5-review/8) and 16 for the A9 (https://en.wikipedia.org/wiki/Apple_A9). According to Wikichip, Kaby Lake has 14-19 stages (https://en.wikichip.org/wiki/intel/microarchitectures/kaby_lake).

The core difference between a RISC and CISC system now and days is simply the addressing modes avaialble in the opcodes. RISC can only do data modification using registers or immediate values. CISC (or at least x86) allows for data modification using memory locations. However, internally a CISC instruction using memory locations can easily be translated to a series of load/store instructions in RISC and the CPU does the same amount of work anyway.

Of note, ARM has a variable length instruction set, the THUMB-2 instruction set, for precisely the same reason why CISC has a variable length instruction set: code density.

Also in order to be ARMv8 compatible, the CPU has to support AArch64, ARM32, and THUMB-2. (https://armkeil.blob.core.windows.n...d-multimedia/ARMv8_InstructionSetOverview.pdf)

I really don't see any advantage or disadvantage between ARM and x86 and as I've parroted around before, the ISA in modern CPU design is largely irrelevant. What matters is how you implement the CPU. If the ISA is the only thing that matters, then even MediaTek's ARM SoCs should smoke the living heck out of Intel's CPUs clock for clock.

No no, I welcome what you put, because It is proving my point. ARM and RISC in general were meant to live under the concept (by definition) that it continues to use the simple ISA only to achieve the same result that the CISC can use additional ISA for. It means that it is ,by definition, not supposed to have extensions like MMX or SSE, as these are part of the CISC. But RISC instead is mounting Hardware modules on Dies and using DMA to directly access these modules and can thus ofload Pipelining to specific modules, and then the Core will wait for a response expected as a return for the offloaded date to synchronise. On RISC, Pipleines can completly avoid the main CPU, but do not have to.

But as the other guy keeps on not understanding is that Pipelining is actually slowing down response times of CPUs (due to it constantly being at high speed until the designated memory is full) then it can go back. But this adds levels of complexity that means that Multicore approach cannot get involved without causing latency that would actually save time. Theoretically, you can use a 2nd and 3rd core to Pipeline while core 0 stays open for IRQ requests without having to wait.

But your point has only strengthened mine. and weakened TITANS.

InvalidError · Apr 23, 2021

hotaru.hino said:
I really don't see any advantage or disadvantage between ARM and x86

Exactly. Once x86 adopted instuction decoders that output RISC-like micro-OPs to simplify the rest of the execution pipeline, RISC and CISC became effectively indistinguishable. Even RISC CPUs use micro-OPs to handle instructions that don't have an 1:1 relationship with execution resources.

News Loongson Technology Develops Its Own CPU Instruction Set Architecture

Administrator

Distinguished

Commendable

Glorious

Distinguished

Titan

Distinguished

Titan

Distinguished

Distinguished

Distinguished

Glorious

Titan

Distinguished

Distinguished

Titan

Glorious

Distinguished

Titan

Distinguished

Titan

Glorious

Distinguished

Distinguished

Titan

Share this page