Testing thermal compliance
Once you get to the Windows desktop, the first thing you will want to do is open HWinfo (Sensors only option), Core Temp or Ryzen Master and take a look at what your core and package temperatures are doing. At idle your core temps should be somewhere below 40°C in the majority of cases. Preferably somewhere in the mid to low 30’s. On some newer very high core count models, or if you are using the stock cooler (In which case you shouldn't be overclocking anyhow), then it might not be below the 40°C threshold.
If you are not overclocking and are ONLY testing the thermal compliance of the stock configuration, then don't be TOO concerned by a high idle temperature UNLESS you also have a high load temperature that is outside of spec. This WILL be affected by whatever the ambient temperature is in the room where you are, so if your are in a very warm region and have no air conditioning going you may have an idle temp that is a bit closer to 40. For cooler ambient rooms or regions it will likely show low 30’s-ish. Be aware that unless you have excessively high idle temps, say, above 40°C, then what your actual idle temps are is practically irrelevant. Cooler idle temps are not indicative of much of anything specific.
Very HIGH idle temps however DO indicate that there is likely a problem with an incorrectly installed CPU cooler heatsink, too high of CPU core voltage or some other cooling or voltage related issue. If you are using one of those other utilities I warned about in the beginning of this tutorial, it may also be that the utility is reporting falsely. In that case, go get HWinfo or CoreTemp and check again.
If idle temps seem fine, then leave your monitoring application open and run Prime95 (Either version 26.6 or the latest version with AVX/AVX2 disabled).
Choose the Small FFT option (NOT "Smallest FFT") and allow it to run for fifteen minutes. If you are using the latest version of Prime95 (Version 29.8 or newer) then you NEED to be sure to disable the AVX and AVX2 options in the main options window. When you disable AVX2 the option to disable AVX will become available. If at any point your core or package temperatures exceed 80°C for Intel or AMD Ryzen platforms, then click the “Test” menu at the top of the Prime95 window and select “stop” or “exit”. Do not simply click the "X" in the top right corner as that will NOT stop the stress test, it will only minimize it to the tray.
You MUST click Stop or Exit from the drop down TEST menu at the top left of the window to stop the stress test.
If you have an older AMD system that is pre-Ryzen, then measuring thermals is going to be a little different. If your AMD system IS a Ryzen based system, then testing will be the same as for Intel based systems.
On pre-Ryzen AMD systems they did not make, by any definition, measuring core temps accurate or reliable. On the low end of the scale their thermal sensor readings have long been laughable, sometimes showing temps that are well below what the ambient temperature is, which of course is not possible without some kind of Peltier cooler or active refrigeration. At the other end of the thermal range it’s not much better.
This is because AMD does not actually implement their sensor designs to be determined in the same way that Intel does. AMD uses a method known as distance to Tjmax. Tjmax being, in this case, the temperature at which AMD has determined bad stuff will start happening such as thermal throttling, shut downs and damage.
For this reason when you are testing thermal compliance, or just monitoring for general purposes, you need to be aware of this difference and purposely either use applications designed for use with AMD processors or make some settings changes in other utilities that will allow you to see distance to Tjmax rather than estimated core temps.
There are a couple of ways you can do this. First off, HWinfo generally has the appropriate fields which are labeled as Distance to Tjmax. I feel like the better choice though is either CoreTemp or AMD overdrive for monitoring Distance to Tjmax on AMD platforms. In CoreTemp you will need to go into the Options tab, click on Settings and on the Advanced tab check the box next to the setting for “Show distance to Tjmax in temperature fields” in order to change from the default and likely inaccurate core temperature display.
AMD overdrive shows Distance to Tjmax by default, and I don’t think there is any other way to monitor CPU thermals in that utility anyhow. Either of these is probably a good choice, but it’s also likely worth checking either CoreTemp or AMD overdrive against what you see for Distance to Tjmax in HWinfo and if the readings are pretty close to the same, just use HWinfo as there is a lot of other information available in the sensors display that is not available with these other two.
Regarding the actual Distance to Tjmax sensor readings, what you do NOT want to see is anything closer than 10°C Distance to Tjmax, ESPECIALLY if you are only in the first phase of your overclock configuration and have only made minor changes to the CPU multiplier and voltage at this point. If it drops below ten degrees to Tjmax you are getting very close to your thermal ceiling and need to revisit either your cooling solution or voltage settings.
If you can run the Prime95 version 26.6 or 29.8 (With AVX/AVX2 disabled) Small FFT (NOT "Smallest FFT". ONLY use the "Small FFT" for our purposes.) torture test for 15 minutes without exceeding 80°C for Intel/AMD Ryzen or dropping below 10°C thermal margin (AKA Distance to Tjmax) for AMD, then you are to some degree or other within specification for thermal tolerance.
It's important to note that when stopping or attempting to exit Prime95 that you MUST use the drop down file menu and choose either "Stop" or "Exit". Simply clicking the X in the top right corner like you would for most programs will not stop the test, and will leave it running in the system tray.
If you are very close to the edge however, this may be a warning sign that you don’t have much overclocking headroom since we’ve only barely set our multiplier to what is basically the all core equivalent of the default single core Turbo frequency (Speed). IF that is the case, you will want to either be very careful going forward or stop and think about upgrading your CPU cooler and perhaps looking at whether your case and case fan situation is really sufficient for what you are trying to do.
Stability testing
So, if you passed the thermal compliance phase the next step will be to test stability. I cannot over stress the importance of not cutting corners when it comes to stress testing. Do not listen to naysayers who try to tell you that if you simply run this or that for 15 minutes, or an hour, or can pass a specific benchmark without errors, that your system is stable. Do not listen to people who say that if it is only a gaming system then stability isn’t important so long as it doesn’t crash. This is unreliable.
It IS important, no matter WHAT you do on the system. Unstable CPU or memory configurations can thoroughly degrade an operating system, game files or other parts of your file system to the point of eventually making them unusable. Instability is also probably not the best thing for the hardware itself.
Do the tests. Do them for the length of time they should be done for and do not cut corners even though it is tempting to do. You will only be hurting yourself in the long run.
Open Realbench and run a 1 hour stress test to begin with. Choose the Stress test option by clicking on the Stress test button. Choose the one hour option. Set the memory option to approximately half of your total installed memory. We are not worried about testing memory right now. If you have more than 16GB of memory, choose the up to 16GB option. If you have 16GB of memory, choose the up to 8GB option. If you have 8GB or less, choose the up to 4GB option.
If you pass the 1 hour stress test and plan to try increasing your overclock a bit higher, then you can start again just as you did in the beginning but bump the CPU core frequency up by another 100-200mhz. If it will post and boot into windows, repeat the thermal test and the stress test.
If it will not POST and boot into windows, or if you get errors or bluescreens at any point, then you will need to bump up your CPU core voltage a bit and try again. We went over that in the beginning so that should be self explanatory at this point.
If you were not able to pass the one hour stress test, then also, you will want to go back into the bios and bump the voltage up a small amount. By small amount, I mean whatever minimal increment the BIOS will allow you to adjust it upwards in. If the voltage was at 1.32v and did not pass, or would not POST, or there were errors or bluescreens or the screen went black and (**)restarted then try bumping the CPU core voltage up to 1.325. If it was at 1.3v try 1.31. Etc.
(**Assuming it did not do so because of a low quality power supply. Very important to have a high quality power supply if you are going to be overclocking. Watts are not the only consideration. A unit with good, clean power that has low ripple and electrical noise is very important in order for the motherboard and voltage regulators to remain stable and not overheat as well.)
Every time you make a change in the bios to increase the CPU core voltage, YOU MUST RUN the thermal tests again to verify you are still within tolerance.
If however you passed the one hour stress test with no errors, no problems of any kind, and do not wish to raise the level of your overclock, or at any time if you get to the point where you are happy with the speed you have achieved up to that point, then you can go ahead and run the Realbench stress test again except this time run it for a full 8 hours.
If it passes that, then close Realbench and open Prime95 again. Choose the Blend test and run that for 8 hours. If it passes that, your system is probably about as stable as can be expected under almost any circumstances and you can call it a day if you are able to pass both of these tests and are still within thermal compliance. It’s worth noting that you may want to periodically check your maximum thermal readings on your monitoring software which you should leave running alongside any stress tests, just to make sure that you don’t exceed thermal limits while testing.
If you remained below the thermal ceiling when you ran the Small FFT Prime95 torture test though, you should not have any issues with thermals on either of these other tests anyway.
If you wish to take the stability testing one step further IN ADDITION to having passed both the Realbench 8 hr test and the Prime95 Blend mode 8 hr test, you can run Prime95 Small FFT for 24 hours and if you pass that there is little else you can do to assure that your system is stable in regard to your CPU overclock settings.
At this point you can move on to using your system normally again, or, if you wish to push things a little further to see how much more you can squeeze out of it, then you can simply start the whole process over again moving up incrementally from where you left off but it is terribly important that you always perform the thermal and stability tests after any changes so you don't end up creating tremendous problems for yourself later on or inadvertently damaging your hardware with an overclock that is beyond what your cooling system, motherboard and CPU are capable of sustaining.
If you have successfully achieved the overclock you were hoping for, then congratulations and at this point you can reconfigure your memory XMP settings or continue on to either tightening your memory timings or overclocking your memory, if you plan to do so.
Quick and dirty overview of overclocking validation procedure.
Set CPU multiplier and voltage at desired settings in BIOS. Do not use presets or automatic utilities. These will overcompensate on core and other voltages. It is much better to configure most core settings manually, and leave anything left over on auto until a later point in time if wish to come back and tweak settings such as cache (Uncore) frequency, System agent voltage, VCCIO (Internal memory controller) and memory speeds or timings (RAM) AFTER the CPU overclock is fully stable.
Save bios settings (As a new BIOS profile if your bios supports multiple profiles) and exit bios.
Boot into the Windows desktop environment. Download and install Prime95 version 26.6.
Download and install either HWinfo or CoreTemp.
Open HWinfo and run "Sensors only" or open CoreTemp.
Run Prime95, either version 26.6 OR the latest version WITH the AVX and AVX2 options disabled in the settings menu that pops up when you start up Prime95, and choose the "Small FFT test option". Run this for 15 minutes while monitoring your core/package temperatures to verify that you do not exceed the thermal specifications of your CPU.
(This should be considered to be 80°C for most generations of Intel processor and for current Ryzen CPUs. For older AMD FX and Phenom series, you should use a thermal monitor that has options for "Distance to TJmax" and you want to NOT see distance to TJmax drop below 10°C distance to TJmax. Anything that is MORE than 10°C distance to TJmax is within the allowed thermal envelope.)
If your CPU passes the thermal compliance test, move on to stability.
Download and install Realbench. Run Realbench and choose the Stress test option. Choose a value from the available memory (RAM) options that is equal to approximately half of your installed memory capacity. If you have 16GB, choose 8GB. If you have 8GB, choose 4GB, etc. Click start and allow the stability test to run for 8 hours. Do not plan to use the system for ANYTHING else while it is running. It will run realistic AVX and handbrake workloads and if it passes 8 hours of testing it is probably about as stable as you can reasonably expect.
If you wish to check stability further you can run 12-24 hours of Prime95 Blend mode or Small FFT.
You do not need to simultaneously run HWinfo or CoreTemp while running Realbench as you should have already performed the thermal compliance test PLUS Realbench will show current CPU temperatures while it is running.
If you run the additional stability test using Prime95 Blend/Small FFT modes for 12-24 hours, you will WANT to also run HWinfo alongside it. Monitor HWinfo periodically to verify that no cores/threads are showing less than 100% usage. If it is, then that worker has errored out and the test should be stopped.
If you find there are errors on ANY of the stability tests including Realbench or Prime95, or any other stress testing utility, you need to make a change in the bios. This could be either dropping the multiplier to a lower factor or increasing the voltage while leaving the multiplier the same. If you change voltage or multiplier at ANY time, you need to start over again at the beginning and verify thermal compliance again.