Suddenly, memory errors, hundreds of them!

riffwraith

Distinguished
Aug 8, 2008
14
0
18,510
About two months ago,I put together a new computer,and everything was fine for about a month and week.
After that, I started getting crashes (random reboots) and errors in orthos and memtest (about 80 errors or so, and always in test 4. Memory addresses were on both sticks), whereas previously I had been able to run both with no problems for 10+ hours each.

I noticed that orthos errored much faster when running large fft/blend rather than small, so I figured it was a ram issue. I rotated the ram, which didn't do anything, and then switched them to the other two ram slots, which seemed fine. I ran memtest and orthos for 12 hours again each, and everything was fine...for about a week.

This morning I woke up and noticed that windows had rebooted and was at the login screen. Upon logging in, I got an error about a corrupted file and that I'd have to run chkdsk. I ran orthos on large fft, and it immediately rebooted. I went into memtest, and it started spitting out a lot of errors. Not just in test 4, but also in the other 3 tests before it. About 1500 errors before even finishing test 4.

Is it possible that the problem is with the cpu or motherboard, and not the ram? I find it very odd that it runs for weeks at a time, and then suddenly I have issues. Later this week I'm going to try testing out the ram on a friend's machine,and his ram on mine, so hopefully, that will confirm a ram error.

My hardware:
Intel Core 2 Duo e7200 (overclocked, or stock, I still get ram errors)
Gigabyte EP35-DS3L
2GB (2X1GB) DDR2 800 Crucial Ballistix, being run at Crucial's specified settings.

If it is a ram issue, will I be able to RMA it? I no longer have the packaging or receipts, since I sent them in for a rebate, which I'm currently waiting on.

Thanks.
 

auscanzukus

Distinguished
Jul 26, 2008
953
0
18,990
Post your settings:

CPU Clock Ratio
CPU Host Frequency
PCI Express Frequency
C.I.A. 2

System Memory Multiplier
Performance Enhance
CAS Latency Time
Dram RAS# to CAS# Delay
Dram RAS# Precharge Delay
Precharge Delay (tRAS)

DDR2 OverVoltage Control
PCI-E OverVoltage Control
FSB OverVoltage Control
(G)MCH OverVoltage Control
CPU Voltage Control

While Crucial may rma your ram, but they can't guarantee overclockability. That means you can rma it 100 times and if your o/c is unstable, you'll have the same problems.

You need to stabilize your overclock. Let's start with bios default & memtest overnight. If stock fails, then yes, it may be hardware.
 

riffwraith

Distinguished
Aug 8, 2008
14
0
18,510
Development:
Computer rebooted today while it was idling, and now windows will not boot, except into safe mode. This is indicative of corrupted system files right?

I reset everything to stock and am now running memtest. I think I'm going to run it for 20 hours, reinstall windows xp, and run orthos for 20 hours.

If everything works then, I'm going to try a modest overclock.
 

riffwraith

Distinguished
Aug 8, 2008
14
0
18,510
Memtest ran for 20 hours with no problems.
I tried repairing my windows install, but it still refused to boot, so I'm now in the process of a format and fresh install.

Fingers crossed.
 

auscanzukus

Distinguished
Jul 26, 2008
953
0
18,990
What's the exact error and which cpu? i.e.

Test 1, 4000 Lucas-Lehmer iterations of M19922945 using 1024K FFT length.
Torture Test ran 0 minutes 3 seconds - 0 errors, 0 warnings.

You have to quit all other programs including file-sharing, antivirus, etc. except for cpu temp monitor. Unplug from internet (auto update). Stop all scheduled tasks.

Is the pc plugged in a surge suppressor?

By stock, do you mean loading bios default or setting the cpu/ram settings manually?
 

riffwraith

Distinguished
Aug 8, 2008
14
0
18,510
I didn't have anything installed. Hm. Autoupdate though.
I also realized that it was on bios defaults, and incorrect settings, so I went in and manually changed the ram voltage and timings to the rated ones.
My pc is plugged into a surge protector.

Currently running small fft test.

Here's the error:

1:Test 1, 1000 Lucas-Lehmer iterations of M62390271 using 3584K FFT length.
1:FATAL ERROR: Rounding was 0.4951171875, expected less than 0.5
1:Hardware failure detected, consult stress.txt file.
1:Torture Test ran 19 hours, 19 minutes 10 seconds -1 errors, 0 warnings.
1:Execution halted.

2:Torture Test ran 19 hours, 19 minutes 10 seconds - 0 errors, 0 warnings.
2:Execution halted.
 

auscanzukus

Distinguished
Jul 26, 2008
953
0
18,990
Autoupdate can interference with stress testing cuz it will download updates and install them.

You need to not touch anything in bios or else it can change the outcome of a stress test. Auto/default doesn't mean manual. Rated specs don't always work. i.e. specs says 4-4-4-12. 5-4-4-12 may be more stable. Go back to bios & load default.

Then d/l this, burn as image to a cd/dvd, boot it up, go to CPU Tests, Mersenne Prime Test, let it run for 24 hours. Make sure there's no sudden change to room temp. Gradual change is ok. Leave the door open.

http://www.ultimatebootcd.com/

If stock fails, rma the mobo & ram.
 

riffwraith

Distinguished
Aug 8, 2008
14
0
18,510
I just got done running a small fft test (at 800 mhz, 4-4-4-12) for slightly over 29 hours with no errors.
I'm pretty much home free, right?
Or should I stll run the mersenne prime test?
 

riffwraith

Distinguished
Aug 8, 2008
14
0
18,510
Everything was until monday night. It hard booted, and now I periodically get errors in firefox, corrupted disk messages, and occasionally it doesn't boot the first time (gets to windows xp screen with the bar and reboots). Memtest shows errors within seconds.

This is such a mess.
I'm just going to buy some new ram, and if that doesn't work, new mobo, and if that still doesn't work...a new cpu.
Hopefully I can stop at the ram, but I strongly suspect it's something more serious than that.