May 20, 2020
13
0
10
Hello

I just built a new PC for my work. I do motion graphics in Adobe After Effects but also play Single Player games from time to time. My PC idle temps are around 40C to 50C and When I did Stress Test on OCCT v5.5 with Large Data set and Thread Count and Instruction set on Auto.

For the first 6-7 minutes, CPU temps were around 75-80C, and then from 7 to 15 minutes, they were around 82C to 88C. They did go up to 92C for 1 sec once or twice but then came down to 85ish.

I am using NZXT H710i case with all stock fans. 3x120mm at the front intake fans and 1x140mm fan throwing hot air out.
I am using the Corsair H150i Pro AIO cooler, mounted on top with stock fans in push configuration.

I am very new to this whole PC building thing and worried about these temps. I haven't overclocked the CPU. Some people are saying that temps should be under 80C and some are saying that 9900k is hot CPU so it's fine.

Can you guys guide me that are these temps are fine or should I be worried? What should I do to make it better?


i9-9900K
16x4GB Corsair Vengeance RGB Pro
GTX 1060 6GB MSI Gaming X
Gigabyte Z390 Ultra
 
Solution
https://www.enterprisestorageforum....silent-data-corruption-the-backup-killer.html

https://www.christian-engelmann.info/publications/fiala12detection2.pdf

https://pdfs.semanticscholar.org/96d0/660ac270f002261309e7c78491ccbfecdf8b.pdf

Overall, the biggest problem with instability is bit flipping and silent data corruption. It is not detectable, unlike other forms of corruption and instability, until the problem has cumulatively corrupted something to the point where it begins having problems, crashing, errors, shut downs, etc.

It's the entire reason ECC (Error correction checking) memory exists. Even relatively stable memory configurations with non-ECC memory can and do experience this, but...
May 20, 2020
13
0
10
Hi

It was on top as an exhaust as a Push configuration. Front Fans were bringing air into the case.

But yesterday after a little bit of research from the internet, I changed the configuration. I've put the radiator on front.

OLD: View: https://imgur.com/BChHYXS

NEW: View: https://imgur.com/c4nHxoZ

ERROR: View: https://imgur.com/HFkweVY


but now when I try to Stress Test with OCCT, it starts showing errors. It is not showing what the error is.

I just changed the positions. Nothing else. CPU is at stock. RAM is on XMP profile 1 3200MHz.

But in almost a 10-minute stress test, Temps were better than last time. They were below 80C mostly. they spiked to 80+ but instantly came down. Not much difference in idol temps. they were mostly 45 to 50 previously and now they are at 40 to 45C.

Can you guide me on what to do? or recommend any other software where I can know what the error is?
 
Last edited:
I'd power off, switch off the PSU or unplug it, pull all the memory and the graphics card, double check that all the connections going to the board are solidly seated, then reseat the memory and graphics card. Do not forget to reconnect the PCIe auxiliary power to the graphics card afterwards. Power back on and see how it does.

To be clear, once you eliminate any errors you are seeing, forget OCCT for now. Download and run Prime95. Choose the Small FFT setting and disable AVX2 and then AVX once it un-greys after disabling AVX2. Run that for 15 minutes or until you see temps exceed 85°C. If you can't keep temps below 85°C (Usually we'd say 80°C is the "safe" temp for most Intel CPUs, but most people simply can't do that on the 9900K) and if you can keep it below 85°C running Small FFT then you will never likely see those temps any other time unless you are running games or applications that use AVX instructions. If you do, then you can configure an AVX offset in the BIOS.
 
May 20, 2020
13
0
10
Hi

So I did what you recommended. I unplugged the RAMs and GPU. Checked all cables. Then put the RAMs and GPU back in and started the PC again.

So I started the OCCT test again to check if the error is still there. For almost 10 minutes, everything was fine. Temps were good like 68 to 72 average. but after 10 minutes it started showing error again so I stopped the test.

View: https://imgur.com/k3GCpZL


Should I do the Prime95 test now? or should I work on eliminating the OCCT error first? and what more should I do to eliminate those errors?

Thank you very much for all your help.

UPDATE: I am using Corsair TM650W 80+ Gold Power Supply. Due to COVID-19, import/export is stopped so RMx 850W or 750W was not available anywhere in my country. I don't know if it's helpful or not for this but I just wanted to let you know.
Also my room temperature is around 30-35C.
 
Probably memory related with four DIMMs installed.

What is your exact memory kit model?

Is that one kit that came with four DIMMs, or two kits that each came with TWO DIMMs? Try removing the two DIMMs that are in the A1 and B1 slots. Those are the the one closest to the CPU and the one two slots over from that one. Leaving only the DIMMs in the second and fourth slots which are two slots and four slots over from the CPU. Then try your tests again. Also, make sure that XMP is enabled in the BIOS for the memory.
 
May 20, 2020
13
0
10
Hello

You were right. It could be RAMs.

I am using
1) VENGEANCE® RGB PRO 32GB (2 x 16GB) DDR4 DRAM 3200MHz C16 Kit
2) VENGEANCE® RGB PRO 32GB (2 x 16GB) DDR4 DRAM 3200MHz C16 AMD Ryzen Kit


I removed the Ryzen kit and left the Intel kit into the 2nd and 4th slot and did the OCCT test for 16 minutes. There was no error this time.

As I told earlier, due to COVID, I couldn't find the Intel one for 64GB, and Ryzen kit was also compatible with the Intel platform so I just got one of that.

But before changing the position I did the OCCT test once before for 15 minutes and it didn't show any error then. But now it is suddenly showing error.

Now the problem is that I can't even return these :(

What should I do? keeping using these? or what do you suggest?

About Temps: Temps are great now( I suppose). on the OCCT test (16 minutes): Mostly below 70C and for some time it was between 70 to 75C.


Before your last reply I did AIDA64 Extreme test for 20 minutes almost. It was smooth and temps were below 80C most of the time.

Do you think these temps are good? and should I now do the Prime95 test?


THANK YOU SO MUCH FOR YOUR GUIDANCE <3
 
With only the Intel kit installed, go into the BIOS, make sure XMP is enabled, then RAISE the DRAM voltage by .020v

So that means it should be set at 1.370v instead of 1.35v. Save settings and shut down. Install the other two DIMMs and then power on. Test using Memtest86. You will want to make your bootable memtest media BEFORE you do any of this other stuff.

Memtest86


Go to the Passmark software website and download the USB Memtest86 free version. You can do the optical disk version too if for some reason you cannot use a bootable USB flash drive.

Create bootable media using the downloaded Memtest86 (NOT Memtest86+, that is a different, older version and is outdated). Once you have done that, go into your BIOS and configure the system to boot to the USB drive that contains the Memtest86 USB media or the optical drive if using that option.


Create a bootable USB Flash drive:

1. Download the Windows MemTest86 USB image.

2. Right click on the downloaded file and select the "Extract to Here" option. This places the USB image and imaging tool into the current folder.

3. Run the included imageUSB tool, it should already have the image file selected and you just need to choose which connected USB drive to turn into a bootable drive. Note that this will erase all data on the drive.



No memory should ever fail to pass Memtest86 when it is at the default configuration that the system sets it at when you start out or do a clear CMOS by removing the CMOS battery for five minutes.

Best method for testing memory is to first run four passes of Memtest86, all 11 tests, WITH the memory at the default configuration. This should be done BEFORE setting the memory to the XMP profile settings. The paid version has 13 tests but the free version only has tests 1-10 and test 13. So run full passes of all 11 tests. Be sure to download the latest version of Memtest86. Memtest86+ has not been updated in MANY years. It is NO-WISE as good as regular Memtest86 from Passmark software.

If there are ANY errors, at all, then the memory configuration is not stable. Bumping the DRAM voltage up slightly may resolve that OR you may need to make adjustments to the primary timings. There are very few secondary or tertiary timings that should be altered. I can tell you about those if you are trying to tighten your memory timings.

If you cannot pass Memtest86 with the memory at the XMP configuration settings then I would recommend restoring the memory to the default JEDEC SPD of 1333/2133mhz (Depending on your platform and memory type) with everything left on the auto/default configuration and running Memtest86 over again. If it completes the four full passes without error you can try again with the XMP settings but first try bumping the DRAM voltage up once again by whatever small increment the motherboard will allow you to increase it by. If it passes, great, move on to the Prime95 testing.

If it still fails, try once again bumping the voltage if you are still within the maximum allowable voltage for your memory type and test again. If it still fails, you are likely going to need more advanced help with configuring your primary timings and should return the memory to the default configuration until you can sort it out.

If the memory will not pass Memtest86 for four passes when it IS at the stock default non-XMP configuration, even after a minor bump in voltage, then there is likely something physically wrong with one or more of the memory modules and I'd recommend running Memtest on each individual module, separately, to determine which module is causing the issue. If you find a single module that is faulty you should contact the seller or the memory manufacturer and have them replace the memory as a SET. Memory comes matched for a reason as I made clear earlier and if you let them replace only one module rather than the entire set you are back to using unmatched memory which is an open door for problems with incompatible memory.

Be aware that you SHOULD run Memtest86 to test the memory at the default, non-XMP, non-custom profile settings BEFORE ever making any changes to the memory configuration so that you will know if the problem is a setting or is a physical problem with the memory.
 
May 20, 2020
13
0
10
UPDATE:

So I did the memtest86 test with intel rams with XMP turned off. I didn't know it's gonna take 6 hours but anyway there were no errors found.

so then I put the Ryzen rams so now all 4 rams are in slots and started the PC and did the OCCT test for 20 minutes and temps were below 80C mostly. Also there were no errors found this time. (XMP profile was disabled)

So then I turned on the XMP profile of 3200 Mhz and then did the OCCT test again for 20 minutes and the same results. No errors found and temps were a little bit higher than the last test but still they were mostly under 80C.

So if there are no errors showing with rams now, should I continue to the Prime95 test now? or still, I have to do the memtest86 with all 4 rams? if I should do the memtest86 then should I do with the XMP off or on?


Thanks man for all your help.
 
May 20, 2020
13
0
10
With only the Intel kit installed, go into the BIOS, make sure XMP is enabled, then RAISE the DRAM voltage by .020v

So that means it should be set at 1.370v instead of 1.35v. Save settings and shut down. Install the other two DIMMs and then power on. Test using Memtest86. You will want to make your bootable memtest media BEFORE you do any of this other stuff.

So I tried to follow your exact instructions after testing the intel kit. But when I got to BIOS I saw that DRAM is 1.2 selected as Auto.

View: https://imgur.com/YWBJPjv


So Should I change it to 1.35v first or to 1.37 as you advised? Please can you guide me here?
 
So I tried to follow your exact instructions after testing the intel kit. But when I got to BIOS I saw that DRAM is 1.2 selected as Auto.

View: https://imgur.com/YWBJPjv


So Should I change it to 1.35v first or to 1.37 as you advised? Please can you guide me here?
Well, the spec mentioned 1.35v. The 1.2v comes from SPD which is mentioned to be 1.2v. I would manually set 1.35v per the specifications.

Darkbreeze is already guiding you in troubleshooting and I do not want to butt-in. Yet sharing experience is what the forums are for.

I was able to bring 8700k down to below 85C at 100% AVX2 load and below 75C at 100% SSE/SSE2 load, after changing the thermal paste below the CPU lid, undervolting it by 0.1v with dynamic voltage offset (now running overclocked with x50 and AVX offset 3 ).
EpQkSu7.png


Since delidding voids the warranty, in your case I would first try undervolting with ITU and testing for stability and throttling under heavy AVX2 load. Also, check the gap between temperature shown on the CPU core vs the one reported by the watercooler after it stabilizes, it might give you a clue about the heat transfer efficiency.
 
Last edited:
May 20, 2020
13
0
10
Well, the spec mentioned 1.35v. The 1.2v comes from SPD which is mentioned to be 1.2v. I would manually set 1.35v per the specifications.

Darkbreeze is already guiding you in troubleshooting and I do not want to butt-in. Yet sharing experience is what the forums are for.

I was able to bring 8700k down to below 85C at 100% AVX2 load and below 75C at 100% SSE/SSE2 load, after changing the thermal paste below the CPU lid, undervolting it by 0.1v with dynamic voltage offset (now running overclocked with x50 and AVX offset 3 ).
EpQkSu7.png


Since delidding voids the warranty, in your case I would first try undervolting with ITU and testing for stability and throttling under heavy AVX2 load. Also, check the gap between temperature shown on the CPU core vs the one reported by the watercooler after it stabilizes, it might give you a clue about the heat transfer efficiency.


Well, I live in a 3rd world country so delidding is really not an option for me. It's not available in my country. I will have to rely on good cooling solution only.

About ram I'll try to do the adjust the DRAM manually after researching about it. I'm new to this and I don't want to mess anything.
 
Yes, set the XMP profile value, with ONE set of DIMMs installed. Test that in Memtest86 for one pass. If there are no errors, then go back into the BIOS, bump the DRAM voltage up from 1.35 to 1.37v, save settings, then get out of there, shut down, install the other set of DIMMs, power on, make sure XMP is still enabled for both sets and that it will POST and that in the BIOS it is showing the correct speed for the full amount of memory, and then test again in Memtest86 with all four DIMMs at the XMP profile speed with the bump in DRAM voltage to see if it is stable with all four DIMMs installed.

That is my advice for now. If you can't get all four DIMMs to pass four passes of Memtest86 then bump the DRAM voltage up another notch. Technically you could go all the way to 1.45v on the DRAM voltage but if you cannot get it stable with all four DIMMs by the time you get to 1.39v, then it's unlikely you are going to and the reason is likely because there is simply too many differences between the two memory kits for the motherboard to accommodate the disparity.
 
May 20, 2020
13
0
10
So I have run Memtest86 with XMP off and it passes the test.

But when I turn on the XMP then it shows the error on Test #7 (Moving inversions, 32-bit pattern).

I have tried bumping up the Dram voltage to 1.37 and then to 1.39 but it didn't work.
Then I tried to run the rams at 3000MHz and then on 2666Mhz but still the same error.

What are my options now?

It passes the test with XMP turned off so it means RAM is fine physically.

What other things I can try?

Thank you.
 
Try increasing the VCCIO voltage in the BIOS to 1.1 and the VCCSA (System agent) voltage to 1.2. Then save BIOS settings, and retest with Memtest. If it still doesn't work, then you're out of luck because you need to find a 64GB kit that is compatible with your board AND comes in a single tested kit.
 
May 20, 2020
13
0
10
Try increasing the VCCIO voltage in the BIOS to 1.1 and the VCCSA (System agent) voltage to 1.2. Then save BIOS settings, and retest with Memtest. If it still doesn't work, then you're out of luck because you need to find a 64GB kit that is compatible with your board AND comes in a single tested kit.

Okay, tell me one thing. Let's suppose I keep using these with these errors as it is. What will happen in long term? Because I haven't faced any blue screen or crash yet. So if this problem is not solved and I keep using these, what could happen? And how much time can it last until something happen?

I know you can't tell me exact time but any idea or guess.
 
https://www.enterprisestorageforum....silent-data-corruption-the-backup-killer.html

https://www.christian-engelmann.info/publications/fiala12detection2.pdf

https://pdfs.semanticscholar.org/96d0/660ac270f002261309e7c78491ccbfecdf8b.pdf

Overall, the biggest problem with instability is bit flipping and silent data corruption. It is not detectable, unlike other forms of corruption and instability, until the problem has cumulatively corrupted something to the point where it begins having problems, crashing, errors, shut downs, etc.

It's the entire reason ECC (Error correction checking) memory exists. Even relatively stable memory configurations with non-ECC memory can and do experience this, but having an unstable configuration because of compatibility problems, overclocking, lack of voltage, or whatever the reason is, just makes it much worse.
 
Solution