[SOLVED!] FSB clock frequency unstable during gpu usage, Radeon HD7870

Max73

Distinguished
Jul 11, 2012
10
0
18,510
Hello everybody, first of all thanks for reading this long post.

My system has
ASUS M2A-VM HDMI
2 DDR2 SDRAM of 2GB each
1 floppy drive
Sapphire Radeon HD7870
AMD Phenom II, X4 910e
PSU: Nexus NX-5000 R3, 530W, max combined power = 511.5 W, 41Amps on 1 12V line.
Windows 7

http://s12.postimage.org/9kw7ykhsd/CPU_Z_Capture.jpg

I came across OCCT a software to monitor temperatures after which i bought a new cooler for my new cpu, and i noticed very soon that the FSB frequency (the so called front serial bus frequency) is unstable. Not by few percent, i mean stuff like an ordinary 200 MHz drops down do 80 MHz and can reach 600 MHz !
Now, since my processor has a multiplier 13x, when these instability shows up, chances are it will crash, and it happens once a day, causing also a RAID 1 failure and a subsequent rebuild which takes 3 hours !

This is the FSB frequency with stock cooler, under heavy load (combined CPU and GPU stress from world community grid, boinc manager which was running in the background)
http://s22.postimage.org/7pzh6sj5d/2013_03_02_21h22_Frequency_Bus.png


With the new cooler temperatures improvd by 10C, reaching the low 60 in full load, still the FSB is very unstable, and i still experience crashes.
The FSB frequency was then this under CPU stress alone (OCCT cpu stress)

http://s11.postimage.org/flk9ku64z/CPU_stress_Alone_2013_03_10_13h34_Frequency_Bus.png

http://s15.postimage.org/9of5g2gpn/CPU_stress_Alone_Vcore_2013_03_10_13h34_Voltage.png


And if I stress only the GPU this (OCCT gpu stress):

http://s16.postimage.org/856w8z6k5/GPU_Alone_FSB_2013_03_13_20h43_Frequency_Bus.png

http://s16.postimage.org/ifnjwthit/GPU_Alone_Vcore_2013_03_13_20h43_Voltage_CPU_VCO.png

With combined stress it gets even worse. I tried to reduce the clock of the GPU and the CPU multiplier down to 12 from 13, but to no avail... I also tried to tweak the vCore to raise it to 1.25 from the auto setting that was at 1.175, yet no improvement. Anybody an idea what is going on? Thanks in a advance for any tip!






 

ShadyHamster

Distinguished
It could be a cpu/motherboard compatibility issue, it looks like that motherboard is am2?
Have you tried updating the bios at all?
Disable the auto overclocking feature, if that is even available.
Have you manually set the fsb to 200?

800mhz HT link seems rather low, even for an am2 board, is that the default HT link clock? (its been so long since i've seen an am2 pc, can't quite remember the default clocks).
 

First turn off OC software. It is overclocking the FSB.
Second would be to disable "Speed Spectrum" in the bios. This setting varies the FSB speed to lower the chance of EMI interference.
Then disable AMD "Cool n Quiet in the bios. Speed Step etc...

 

Max73

Distinguished
Jul 11, 2012
10
0
18,510


Thanks to everybody for the tips! HT default is 1000, but i reduced it for safety. I did have some success using AMD overdrive and putting 10% lower power usage although now it shows GPU usage 100% with HwInfo64, and underclocking the CPU from 13x to 12x, and reducing memory from 800Mhz to 533MHz.. still FSB frequency unstable (80MHz - 275MHz oscillations like once a minute, :(.. ) but no more crashes since 3 days already under continuous cpu+gpu crunching (boinc manager, world community grid). I believe it is because now during the peak frequencies of 300MHz the memory which is (Kingston 2x2GB DDR2 SDRAM PC6400, 800MHz) goes to 533*275/200 < 800 which is still in spec and in those peaks the processor from 200x13 (default) to 275x12 (FSB peak but with one multiplier less ), which is still a huge 3.3GHz compared to the 2.6 original speed of Phenom 910e.

this is the log of the last 2 days (HwInfo running in background during crunching):
http://s23.postimage.org/ddcsdlb1n/Capture.jpg

The bios is the latest available for my ASUS M2A VM HDMI (v. 5001), which should make it compatible to AM3 processors like Phenom II, that now should be seen as an AM2+ processor as far as i understood.

The PSU (nexus nx 5000 r3 ) has a power of 530W should be plenty according to some power calculators i found on the net. But although this PSU can make 41 Amperes on a single 12V line it seems that if i keep the GPU idle the FSB is freaking stable! But with an older CPU (Athlon X2 BE-2400), stability was no issue! So, is it really a PSU issue, or am i leaking current somewhere? In total I have

ASUS M2A-VM HDMI
2x2GB DDR2 SDRAM
2 seagate constellation ES HD of 1TB each in RAID1
1 floppy
sapphire hd 7870
Phenom II X4 910e Deneb CPU (which is 65W on paper).

according to: http://www.extreme.outervision.com/PSUEngine, i need 360W, Is 530W then not enough for this combination?

Also i wonder where is the FSB clock of 200 MHz produced ? Is this PLL or divider or both on the motherboard chipset? Its temp is never above 34C, which doesn't seem too high.. Why does it behave this way ? Could it be insufficient power? Can i check this? Voltage spikes also not measured by the sensors..

FSB to 200MHz is the minimum i can set in the BIOS, and can only be set manually. I also disabled Cool n quiet and all that stuff, included spread spectrum already two weeks now, but did not help. Moreover spread spectrum would change the freq by fraction of a MHz, not 50MHz, so this could not be it!

There is no overclocking software running in the background that i know of, except AMD overdrive which came with the catalyst drivers, which, as i said, i used to underpower the gpu, but this i did it only these last days. HwInfo alone also shows the instability in the logs, when OCCT was not running.
Do I really need a new mobo/PSU, or build a new PC from scratch? Or is there a way to stabilize the FSB frequency? Thanks a lot for any advice!
 

Max73

Distinguished
Jul 11, 2012
10
0
18,510
I have an update, the instability source is somewhat linked to the Core Voltage on the GPU:

Side by side the
FSB frequency | GPU Vin0
(note that Vin0 is read from OCCT, used in idle no-stress mode; this Vin0 measures the target vdcc not the actual vdcc sensor which can be read, as i found out later, with GPU-z)

http://s7.postimg.org/z3grc1vd7/UPDATED_FSB_DEPEND_ON_GPU_VOLTAGE.jpg

So, the culprit is somehow related with the GPU voltage going up to 1.22V from 0.82V, when GPU is under stress. In those phases the PLL generating the 232MHz FSB loses lock (this is also the case if i use the default 200MHz setting). But why?? Underpowering the GPU by 30% helped reducing the FSB fluctuations, but the real question is :

I manage to control the GPU voltage with Trixx, after disabling AMD overdrive. However the Vin0 shown by OCCT (which is a control register value rather than a VDCC sensor!) does not change, whereas the VDCC shown by GPU-Z is updated with the control. Raising or lowering this voltage has shown little success. The logs of GPU-Z show that the VDCC current goes around 40A under stress and peaks at 100A once every about 10 minutes.
I think this HD 7870 on my mobo looks like a recepy for problems! :heink:

I am not sure if it may be relevant in this case, but for completeness i put this anyways, the Vcore of the CPU, around 1.18V-1.22V, which in BIOS setting is 1.100, +100mV, all power savings off, spread spectrum, cool n quiet. LLC and Add Turbo Voltage are not available with the ASUM-M2A-VM BIOS 5001, unfortunately.

http://s22.postimg.org/elqbvbuqp/2013_03_22_19h19_Voltage_CPU_VCORE.png


Thanks in advance for any input.
 

Max73

Distinguished
Jul 11, 2012
10
0
18,510
Yet another update,

also this time i repeated the WCG computations to stress the GPU, and used the GPU-Z logs for VDCC voltage and current, using two settings in TRIXX:
VDCC = 1.050
VDCC = 1.300

I noticed the following:
Under VDCC=1.300, VDCC current is around 40A, and 2 spikes of 100A in 20 minutes. VDCC=1.219 default value also shows similar behaviour. The current halves and no spikes when i set VDCC=1.050.

In both cases FSB clock is unstable during the periods that VDCC is high, and no spikes when VDCC is low (0.82V, no GPU stess).

here the images:
http://s22.postimg.org/cu4knt0mp/GPU_LOGS_1300_FSB.jpg
http://s14.postimg.org/uzce5i0k1/GPU_LOGS_1050_FSB.jpg

No special event that i could detect simultaneous to FSB spikes, which all occur under GPU stress. Any idea? I do know that my mobo is a PCIE2.0 whereas Sapphire HD 7870 is PCIE3.0 (3.0 in theory is backward compatible to 2.0), but i wonder whether that is the actual problem..

So, the BIG Q, is this a faulty sapphire or is it an incompatibility issue?
 

Max73

Distinguished
Jul 11, 2012
10
0
18,510
I finally found a solution that worked, and in a place i was not checking at all!

SOLUTION:
===========
I used setFSB (setfsb_2_2_134_98) to change the PCI bus to 90.1 MHz instead of the standard 100MHz,
and that did the trick! After this all FSB frequency spikes disappeared,
and i could put the GPU to its original default settings both for clock and voltage.

Unfortunately this has to be done at each startup, because the BIOS has no control on the PCIe clock!
===========

This is a screen capture from setFSB
http://s8.postimg.org/cbw2auhg5/set_FSB_Capture.jpg

the short slider (lower one) is set for PCI express 90.1 MHz, and the upper one
was not set from the setFSB gui but from the BIOS to 235MHz.

Before trying this in fact i had seen that 235 MHz had shown somewhat less dangerous and less frequent frequency spikes, during GPU usage. This had been chosen in combination with GPU underclocking (900MHz iso 1050) and undervolting (1050mV iso 1218mV, which had removed GPU current spikes to 100A!), to provide an FSB freq in the order of [200-280] MHz, with 1 spike per minute.
The CPU multiplier had been reduced to 12x from 13x, and memory to 533 from 800, in order to handle the FSB highest freq of 280MHz.

After changing PCI express to 90MHz, spikes did not show up in the last 24hrs, even with the GPU in its default settings.
I came across set FSB in order to set the PLL registers of the clock generator chip (ICS951462), with the aid of the clock generator datasheet. Doing so I hoped to resolve the FSB stability by changing the selection of the frequency source, which i thought could be the reason of the PLL losing lock. Instead, this was not helpful.

I have no idea why the problem can be linked to the PCIe frequency, possibly because the 100MHz generated by the same ICS chip puts it somehow under stress (how?), or because the 100MHz somehow generated cross talk back to the IC itself, e.g. via the supply lines or the ground plane. I think the only direct link between PCIe and the clock generator chip is the PCIe clock signal itself, and possibly some control signals back to the ICS; this may discard the hypothesis that large current consumption by the GPU could change the voltage on the ICS chip pins... but i have no clue! Any idea on what is the truth is welcome!

Anyways the system is now stable (FSB @ 235MHz stable at last!) as shown in this graph over 20 min:
http://s23.postimg.org/he1mfs5nf/GPU_LOGS_1218_FSB.jpg,

whereas these were the logs before the fix:
http://s17.postimg.org/c4xsamw6n/GPU_LOGS_1300_FSB.jpg
http://s7.postimg.org/gcpuyz67f/GPU_LOGS_1050_FSB.jpg

I hope this can help someone else trying to put a Radeon new generation board, onto a 5 years old motherboard ! :D