How much would Core 2 Quad Q9300 at 2.5GHz bottleneck a GTX960 4GB for learning deep learning?

edervishaj

Reputable
May 19, 2014
7
0
4,510
I have a desktop with a Core 2 Quad Q9300, mobo GA-P43-ES3G rev1.0. I want to take my first steps into deep learning and do not want to spend too much now for new mobo/CPU/RAM. How much would my current CPU bottleneck a GTX 960 specifically for Deep Learning?
 
Solution
Your RAM is already at 1:1, you just want to make sure it stays that way when you do any adjustments. This is assuming you have 400MHz DDR2-800 PC-6400 RAM.

In Gigabyte BIOSes, under MIT set "CPU Host Clock Control" to Enabled, then CPU Host Frequency (Mhz) to 400. System Memory Multiplier should be changed to the lowest setting of 2.0 to set 1:1. CPU vCore should be set at this point to + one tenth of a volt higher than the default shown under "PC Health Status" (a bit on the high side but a good starting point), and DRAM voltage to whatever it says on the sticker or package of the RAM (keep in mind that default voltage for DDR2 is 1.8v and "Auto" tends to set it at 2.0v at 400MHz)

CPU speed apparently doesn't matter much (and Tim...


A little but not too much. Best thing to do would be to get a quad core 3GHz Xeon (cheap + you have to mod it to make it fit into your motherboard so you will learn something interesting).
 
Probably not much as it's mostly GPU and only one thread. It's hurt by the limited memory bandwidth and latency of the FSB so you'll want to up that to 400 which would give you 3GHz, same as a Q9650 or E5450. It has only half as much cache but apparently that doesn't matter.

You'll want to set the RAM at 1:1 for 800MHz and if you can, 4-4-4-12 with static tREAD Value at 8 to reduce latency. This may require some extra voltage to the Vcore, MCH and DRAM. You do have at least 4GB, right?
 


How much bottleneck would you expect? I have still to buy the GTX960 and I want to save at this moment to build at a later time a powerful rig for serious deep learning.
Any other GPU you could suggest maybe? (Nvidia only)

 


Why nVidia only ? For budget gaming you could get sick performances with an R9 280x or an R9 290x ! I am an nVidia fanboy, but I have to admit these cards have good price/performance.

If you want nVidia ones, take a GTX 960/950ti, but do not take a GTX 970 as it would bottleneck your CPU, and it has only 3.5GB+512MB of VRAM (nVidia engies fucked up a bit when making that card...).
 


It is because I am only interested in Deep Learning and not gaming. And I am restricted by the various deep learning frameworks in that they support only cuDNN of NVIDIA.
 

edervishaj,

I'm not certain as to the particular application intended for deep learning, but unless you will be relying on a remotely accessed (cloud /API) program, the system required to run neural nets are in general high-end workstations. Here is a suggested system from an article: "Building a Deep Learning (Dream) Machine" :

Chassis: Carbide Air 540 High Airflow ATX Cube
Motherboard: Asus X99-E WS workstation class motherboard with 4-way PCI-E Gen3 x16 support
RAM: 64GB DDR4 Kingston 2133Mhz (8x8GB)
CPU: Intel(Haswell-e) Core i7 5930K (6 Core 3.5GHz)
GPUs: 3 x NVIDIA GTX TITAN-X 12GB
HDD: 3 X 3TB WD Red in RAID5 configuration
SSD: 2 X 500GB SSD Samsung EVO 850
PSU: Corsair AX1500i (1500Watt) 80 Plus Titanium (94% energy efficiency)
Cooling: Custom (soft piped) Water Cooling for both the CPU and GPUs: a refilling hole drilled in the top of the chassis, and transparent reservoir in the front

That's a quite expensive system: the i7 5930K costs $638 US, an Asus X99-E WS costs $609, the RAM about $500, 2X Samsung 500GB EV0 costs $350, 3X Wd Red 3TB= $570, 3X Titan X 12GB is $3,600 (=$6,300), and complete with OS- about $6,700 total.

The emphasis on this fkind of system is CPU single-thread performance in association with many CUDA cores as co=processors,

CPU: Compare the performance of the i7-5930K with a Q9300:

___ i7-5930K: (6-core @ 3.5 /3.7Ghz): Average Passmark CPU Mark = 13636 / Single Thread Mark = 2089

___ Core2 Quad Q9300: 4-core @ 2.5GHz = 3180 / 1064

GTX 960 Throttling: In Performance Test baselines, the G3D Mark on a Q9300 / GTX 960 system ranges from 4534 to 5109: When overclocked to 3.7GHz, the Q9300 /GTX960 can make a Passmark 3GD mark as high as 5133, but the Passmark average for the GTX 960 = 5848. From these results, it appears that the Q9300 can not take full advantage of the GTX 960.

The article cited mentions the necessity of a wide memory bandwidth and 40PCIe lanes, to which I would stress hyperthreading- which the Q9300 is not.

> In my view, the LGA775 platform can not, by substitution of faster CPU, GPU, and drives represent a positive cost /performance result. The system would be faster, but not in proportion to the expenditure and effort.

There is a possibility to change the GA-P43-ES3G motherboard for a good LGA1150 ATX supporting Xeon E3-v3 CPU’s with 2X PCIe 3.0 x16 slots. for example: ASRock H87WSA-DL:

ATX Server Motherboard LGA 1150 Intel H87 DDR3 1600/1333 > $110

Add to that a used a Xeon E3-1200 v3 series CPU for example:

Intel Xeon E3-1231V3 Quad-Core (BX80646E31231V3) Processor "Clean pull” > $180

The Xeon E3-1231 v3 has a Passmark CPU Mark of 9630 and very good Single Thread Mark of 2171- higher than the i7-5930K and in the top tier.

However, a better cost /performance solution is possible at less cost and effort and will have a better future upgrade potential. You might consider selling the current system - without the GTX960- and Ebay completed sales listings show a Dell XPS 420/ Q9300 selling for $150 and a Dell Studio 650 /Q9300 was $60, so your system would probably be worth somewhere between those two figures. Say it was the average of about $100. Consider applying the sales amount to, for example:

HP Z420 Workstation Intel Xeon E5-1620 @ 3.60GHz 16GB RAM 500GB HDD 54859MA > Sold for $119.95 (12.8.16)

Another good candidate is the Dell Precision T3600, of which many were sold with the E5-1620.

The E5-1620 is 4-core @ 3.6 /3.8Ghz is hyperthreading and the Passmark average CPU Mark = 9091, Single-Thread = 1932. The E5-1620 system uses DDR3-1600 -up to 64GB, has USB 3.0, and the disk system is SATA III 6GB/s.

The highest Passmark Disk Mark for a Q9300 system is 7511 a RAID 0 of 2x Samsung 850 Evo 500GB and the top memory score for 16GB is 1238, but the Samsung 850 Evo 500GB on an E5-1620 system = 9018 and the top memory score for 16GB is 2898.

Add to that system a good 250GB SSD and 1 or 2TB HD and eventually add RAM to 32GB total.

The E5-1620 system can use up to an 8-core CPU, and run fast SSD's, have up to 64GB of RAM instead of 16GB,- I'd have the maximum eventually- includes all but one slot as PCIe instead of PCI, SATAIII 6GB/s disk controllers, and USB 3.0 peripherals. Best of all, it's not a expensive upgrade. If your system is worth $100 and the replacement is $120. Even if your system is worth $70 and the new one is $200, it's still very worthwhile.

BambiBoom

CAD / 3D Modeling / Graphic Design:

HP z420 (2015) (Rev 3) > Xeon E5-1660 v2 (6-core @ 3.7 / 4.0GHz) / 32GB DDR3 -1866 ECC RAM / Quadro K4200 (4GB) / Samsung SM951 M.2 256GB AHCI + Intel 730 480GB (9SSDSC2BP480G4R5) + Western Digital Black WD1003FZEX 1TB> M-Audio 192 sound card > 600W PSU> > Windows 7 Professional 64-bit > Logitech z2300 2.1 speakers > 2X Dell Ultrasharp U2715H (2560 X 1440)
[ Passmark Rating = 5581 > CPU= 14046 / 2D= 838 / 3D= 4694 / Mem= 2777 / Disk= 11559] [6.12.16]

Analysis / Simulation / Rendering:

HP z620 (2012) (Rev 3) 2X Xeon E5-2690 (8-core @ 2.9 / 3.8GHz) / 64GB DDR3-1600 ECC reg) / Quadro K2200 (4GB) + Tesla M2090 (6GB) / HP Z Turbo Drive (256GB) + Samsung 850 Evo 250GB + Seagate Constellation ES.3 (1TB) / Creative Sound Blaster X-Fi Titanium PCIe sound card / 800W / Windows 7 Professional 64-bit > Logitech z313 2.1 speakers > HP 2711x (27" 1980 X 1080)
[ Passmark System Rating= 5675 / CPU= 22625 / 2D= 815 / 3D = 3580 / Mem = 2522 / Disk = 12640 ] 9.25.16
[ Cinebench R15: OpenGL= 119.23 fps / CPU = 2209 cb / Single core 130 cb / MP Ratio 16.84x] 10.31.16





 
Yeah I was first thinking of buying a 1060 6GB as per the article of Tim Dettmers but I suspect that the bottleneck would be bigger. Also I think GTX960 is at a lower price (I'm even considering a used one since I just want to get started, try out models beginner to mid level). So by pushing FSB to 400MHz how much would you expect me to lose in processing power of GTX960?
 




These "cheapest options" are the best options, because a core 2 quad would definitely bottleneck a 1060.
 

Why yes it would, for gaming. Note the hardware guide I linked to in post 3 only suggests a 2GHz dual-core CPU for one GPU, because all the heavy lifting is done in the GPU. The bigger concern is keeping the GPU fed from RAM and disk, so a SSD would help a lot too.
 


Im sorry but the link to HP Z420 is not working if it was a link. Where can I find that system for almost the same price as the date specified?
I know that for big/very deep models a $5k+ workstation is needed. What I am looking for is something to start with and possibly by just upgrading the GPU of the system you suggested to work through my master thesis. But I am still new to the field.
 


Yes I have 4GB in 1:1 333MHz. I am not too much familiar with overclocking :/. I managed to buy a GTX770 4GB. I have attached the CPUZ details for my desktop (sorry for the very long post). How do I go with setting the 4GB in 1:1 400MHz?

CPU-Z TXT Report
-------------------------------------------------------------------------

Binaries
-------------------------------------------------------------------------

CPU-Z version 1.78.3.x64

Processors
-------------------------------------------------------------------------

Number of processors 1
Number of threads 4

APICs
-------------------------------------------------------------------------

Processor 0
-- Core 0
-- Thread 0 0
-- Core 1
-- Thread 0 1
-- Core 2
-- Thread 0 2
-- Core 3
-- Thread 0 3

Timers
-------------------------------------------------------------------------

ACPI timer 3.580 MHz
Perf timer 2.441 MHz
Sys timer 1.000 KHz


Processors Information
-------------------------------------------------------------------------

Processor 1 ID = 0
Number of cores 4 (max 4)
Number of threads 4 (max 4)
Name Intel Core 2 Quad Q9300
Codename Yorkfield
Specification Intel(R) Core(TM)2 Quad CPU Q9300 @ 2.50GHz
Package (platform ID) Socket 775 LGA (0x4)
CPUID 6.7.7
Extended CPUID 6.17
Core Stepping M1
Technology 45 nm
TDP Limit 95.0 Watts
Core Speed 1999.7 MHz
Multiplier x Bus Speed 6.0 x 333.3 MHz
Rated Bus speed 1333.2 MHz
Stock frequency 2500 MHz
Instructions sets MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, EM64T, VT-x
L1 Data cache 4 x 32 KBytes, 8-way set associative, 64-byte line size
L1 Instruction cache 4 x 32 KBytes, 8-way set associative, 64-byte line size
L2 cache 2 x 3072 KBytes, 12-way set associative, 64-byte line size
Max CPUID level 0000000Ah
Max CPUID ext. level 80000008h
Cache descriptor Level 1, D, 32 KB, 1 thread(s)
Cache descriptor Level 1, I, 32 KB, 1 thread(s)
Cache descriptor Level 2, U, 3 MB, 2 thread(s)
FID/VID Control yes
FID range 6.0x - 7.5x
Max VID 1.238 V



Temperature 0 58 degC (136 degF) (Core #0)
Temperature 1 58 degC (136 degF) (Core #1)
Temperature 2 48 degC (118 degF) (Core #2)
Temperature 3 46 degC (114 degF) (Core #3)
Clock Speed 0 1999.75 MHz (Core #0)
Clock Speed 1 1999.75 MHz (Core #1)
Clock Speed 2 1999.75 MHz (Core #2)
Clock Speed 3 1999.75 MHz (Core #3)



Chipset
-------------------------------------------------------------------------

Northbridge Intel P45/P43/G45/G43 rev. A3
Southbridge Intel 82801JR (ICH10R) rev. 00
Graphic Interface PCI-Express
PCI-E Link Width x16
PCI-E Max Link Width x16
Memory Type DDR2
Memory Size 4 GBytes
Channels Single
Memory Frequency 333.3 MHz (1:1)
CAS# latency (CL) 5.0
RAS# to CAS# delay (tRCD) 5
RAS# Precharge (tRP) 5
Cycle Time (tRAS) 15
Row Refresh Cycle Time (tRFC) 44
Command Rate (CR) 2T
Host Bridge 0x2E20

MCHBAR I/O Base address 0x0FED14000
MCHBAR I/O Size 4096

Memory SPD
-------------------------------------------------------------------------

DIMM # 1
SMBus address 0x52
Memory type DDR2
Module format Regular UDIMM
Manufacturer (ID) Kingston (7F98000000000000000000)
Size 2048 MBytes
Max bandwidth PC2-6400 (400 MHz)
Part number
Serial number 23226378
Manufacturing date Week 22/Year 11
Number of banks 8
Data width 64 bits
Correction None
Nominal Voltage 1.80 Volts
EPP no
XMP no
AMP no
JEDEC timings table CL-tRCD-tRP-tRAS-tRC @ frequency
JEDEC #1 4.0-4-4-12-16 @ 266 MHz
JEDEC #2 5.0-5-5-15-20 @ 333 MHz
JEDEC #3 6.0-6-6-18-24 @ 400 MHz

DIMM # 2
SMBus address 0x53
Memory type DDR2
Module format Regular UDIMM
Manufacturer (ID) Apacer Technology (7F7A000000000000000000)
Size 2048 MBytes
Max bandwidth PC2-6400 (400 MHz)
Part number 78.A1GA0.C04
Serial number 02309143
Manufacturing date Week 14/Year 09
Number of banks 8
Data width 64 bits
Correction None
Nominal Voltage 1.80 Volts
EPP no
XMP no
AMP no
JEDEC timings table CL-tRCD-tRP-tRAS-tRC @ frequency
JEDEC #1 3.0-3-3-9-12 @ 200 MHz
JEDEC #2 4.0-4-4-12-16 @ 266 MHz
JEDEC #3 5.0-5-5-18-23 @ 400 MHz

DIMM # 1



Monitoring
-------------------------------------------------------------------------

Mainboard Model P43-ES3G (0x000001F6 - 0x00A449B4)

LPCIO
-------------------------------------------------------------------------

LPCIO Vendor ITE
LPCIO Model IT8718
LPCIO Vendor ID 0x90
LPCIO Chip ID 0x8718
LPCIO Revision ID 0x5
Config Mode I/O address 0x2E
Config Mode LDN 0x4
Config Mode registers
00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
00 00 00 00 00 00 00 00 04 00 00 00 00 00 00 00 00
10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20 87 18 05 00 00 01 3F 00 01 88 01 00 01 00 00 00
30 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
40 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
50 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
60 02 90 00 00 00 00 00 00 00 00 00 00 00 00 00 00
70 00 02 00 00 04 04 00 00 00 00 00 00 00 00 00 00
Register space LPC, base address = 0x0290


Hardware Monitors
-------------------------------------------------------------------------

Hardware monitor ITE IT87
Voltage 0 1.12 Volts [0x46] (CPU VCORE)
Voltage 1 1.90 Volts [0x77] (DDR)
Voltage 2 3.31 Volts [0xCF] (+3.3V)
Voltage 3 4.95 Volts [0xB8] (+5V)
Voltage 7 12.42 Volts [0xC2] (+12V)
Voltage 8 3.12 Volts [0xC3] (VBAT)
Temperature 0 39 degC (102 degF) [0x27] (System)
Temperature 1 30 degC (86 degF) [0x1E] (CPU)
Fan 0 2464 RPM [0x112] (FANIN0)
Fan PWM 0 0 pc [0x0] (FANPWM0)
Fan PWM 1 0 pc [0x0] (FANPWM1)
Fan PWM 2 0 pc [0x0] (FANPWM2)
Register space LPC, base address = 0x0290

00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
00 11 10 70 00 FF FF 00 37 FF 87 54 09 07 12 FF FF
10 FF FF FF 76 D7 FD FC FD 01 FF FF FF FF FF FF FF
20 46 77 CC B8 09 FF FF C1 C3 27 1E FE 80 19 FF FF
30 FF 00 FF 00 FF 00 FF 00 FF 00 FF 00 FF 00 FF 00
40 7F 7F 7F 7F 7F 7F 5F 74 2D 40 9C 22 FF FF FF FF
50 9F 2A 7F 7F 7F 50 F8 F8 90 F8 05 12 60 00 00 00
60 00 14 41 23 90 03 FF FF 00 14 41 23 90 03 FF FF
70 00 14 41 23 90 03 FF FF FF FF FF FF FF FF FF FF
80 00 00 00 00 00 00 00 00 00 00 00 00 00 00 02 00
90 FF 00 00 00 FF 00 00 00 FF FF FF FF FF FF FF FF
A0 00 00 00 00 00 00 00 FF FF FF FF FF FF FF FF FF
B0 FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
C0 FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
D0 FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
E0 FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
F0 FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF

Hardware monitor NVIDIA I/O
Clock Speed 0 594.00 MHz [0x252] (Graphics)
Clock Speed 1 399.60 MHz [0x252] (Memory)
Clock Speed 2 1512.00 MHz [0x252] (Processor)

Hardware monitor NVIDIA NVAPI
Temperature 0 54 degC (129 degF) [0x36] (GPU)
Fan PWM 0 100 pc [0x64] (FANPWMIN0)

 
Your RAM is already at 1:1, you just want to make sure it stays that way when you do any adjustments. This is assuming you have 400MHz DDR2-800 PC-6400 RAM.

In Gigabyte BIOSes, under MIT set "CPU Host Clock Control" to Enabled, then CPU Host Frequency (Mhz) to 400. System Memory Multiplier should be changed to the lowest setting of 2.0 to set 1:1. CPU vCore should be set at this point to + one tenth of a volt higher than the default shown under "PC Health Status" (a bit on the high side but a good starting point), and DRAM voltage to whatever it says on the sticker or package of the RAM (keep in mind that default voltage for DDR2 is 1.8v and "Auto" tends to set it at 2.0v at 400MHz)

CPU speed apparently doesn't matter much (and Tim Dettmers even suggested underclocking to 2GHz) but Core 2 really benefits a lot from faster FSB and as low latency as you can get on the memory. Unless your RAM is factory rated at 4-4-4 it may take some tinkering with the voltage to get there (or it may never make it), and I can tell you that all 40-series chipsets have oddly terrible default values for Static tRead Value only when a 45nm chip is installed! You should be able to set that at 9 with 5-5-5 RAM and at 8 for 4-4-4 to produce a latency below 70ns as measured by MaxxMem.
 
Solution