Thanks for dropping in! Please feel welcome to share details and clear up any misconceptions. I hope you'll find a receptive audience.
Regarding the GPU, can you at least tell us:
- Will it be multi-core or just a proof-of-concept single-core design?
- Will it feature SIMD? How wide?
- What about SMT?
- VLIW?
- Will it have a classical cache hierarchy or just directly-addressed local memory?
- What about external memory? Will there be any? What type? Are you designing the memory controller or licensing it?
- Will it have a display controller? If so, what standards will it support? Again, license or build?
- Same questions for PCIe.
- What sort of special-function hardware will it have? ROPs? Texture engines?
- Will this be targeted at deploying on a FPGA or something designed for fabrication as an ASIC?
That's all I've got, right now. Feel free to answer as many or as few as you're comfortable speaking about.
: )
Thanks for the questions, didn't expect a response so quickly!
You can follow the GPU project here, it will be open source:
https://github.com/adam-maj/tiny-gpu
So far I haven't added any of my work except for the file structure (which will likely change).
The aim of the project is to make a very simple readable GPU with 0 depth file structure + <10-20 files so it's very straightforward to understand, and then to use it in a blog post I'll be making to help people understand the core concepts of GPUs without all the complexity of a full implementation.
Given that, my current design has:
- 12 instruction ISA
- Multi-core (I think that's kinda critical)
- SIMD/SIMT - I'm currently thinking just 2-4 cores, and each core has a warp size of 4 threads per warp.
- Each core will have it's own warp scheduler/queue, fetcher/decoder, 1 ALU per thread, 1 LSU per thread, and branch unit
- Haven't decided yet on shared/cache memory but it seems important. Currently I'm using external memory w/ DRAM and the LSU will handle the async here & a register file for each thread in each core.
- No display controller/memory controller, as it will mainly be designed for simulation. I suspect TinyTapeout cells wont be able to hold the actual design, so I'll use the GDS visualizer + simulation for this project, and maybe make a much smaller version for TT06 or use a different design.
- No VLIW, no SMT (in general, the way im thinking about it is that most things that are CORE to the GPU architecture model I'll include, many things that are useful optimizations (which in reality are also core in modern GPUs in practice), I'll exclude from my design and spend time talking about them in the blog post)
- No PCIe
- No special function hardware - you can think of it like a GPGPU (by technicality). I've written some kernels with my ISA (matadd/matmul for now, and 2 more planned)
- Not targeted for FPGA, just for ASIC
Hope this answers your questions - regardless of the skepticism I'm just happy people are interested enough to post their thoughts/skepticism/questions