Question Weird thing which happened after upgrading BIOS to V. 2503 ?

Oct 12, 2024
3
0
10
The system (as built by the vendor, about 1.5 years ago):

Motherboard: Asus ROG Maximus Z790 Hero
BIOS: one of the early versions, which only allowed 64 GB of RAM (as far as I can tell)
GPU: NVIDIA GeForce RTX 4070 Ti
CPU: 13 Gen Intel i9-13900K, running with ASUS AI overclocking at 5.8 GHz.
ASUS ROG liquid cooling
RAM: 64 GB, 4x16 GB PNY DDR5 6200 PC5-49600 (at 4800 MHz, as set by the vendor)
System drive C: RAID 1, 2x1 TB PNY CS 1031 SSD (one in M2.1 socket, one in M2.2 socket)
Data HDD D: RAID 1, 2x2 TB Western Digital
Data HDD E: RAID 1, 2x2 TB Western Digital
OS: Windows 11

It was running fine and booting reasonably fast (15-20 sec), with CPU temperatures not exceeding 78-89 deg. C even at high CPU load .

About 10 months ago I started running pretty big jobs in NASTRAN (90-100% CPU load, scratch files in terabytes, run time 8-10 hrs) - and hit the wall with RAM and disk space.
Following that, I rebuild the comp a bit:

  • updated BIOS to version 18xx (which allowed 128 GB of RAM)
  • removed the 1 TB PNY stick from M2.2 socket on the motherboard
  • installed 2xSamsung 990 Pro 4 TB SSDs in M2.2 and M2.3 sockets (RAID 0) on the motherboard (to be used as scratch)
  • installed the Hyper Card with 2 M2 slots in PCIEX16(G4) slot and enabled Dual Channel in BIOS
  • installed 2xSamsung 990 Pro 1 TB SSDs on Hyper Card in RAID1 configuration (to be used for system)
  • transferred the Windows system to the Hyper Card using EaseUS software
  • formatted the PNY stick im M2.1 slot to use as additional data storage
  • installed 128 GB RAM - 4x32 GB G.Skill Trident Z5, DDR5 6400 (not a matched set, but stable at 5600 MHz).

It was running fine for about 6 months with pretty heavy CPU loads, CPU temperatures still at about 80-85 deg.C.
Then - occasional NASTRAN failure (exit without leaving any messages - which is unusual). Then - blue screen now and then. I also noticed that the CPU temperatures were hitting 95-98 deg.C. The comp was becoming unstable...which I suspect might have been caused by the instability issue with 13900K and 14900K.

So - couple of days ago I decided to update BIOS to v2503. Did this using USB Flashback. Reason - I suspected the instability issue with my i9-13900K.
The comp would start only in Safe mode - after 5-10 minutes, with error messages "GPU header corruption" and "Intel Rapid Storage technology degraded".
Back to BIOS - GPU header corruption cleared after "Boot Sector Recovery Policy" set to "Auto".

AND HERE IS THE WEIRD THING:
-the system RAID 1 now shows 1xPNG 1 TB stick (which was empty) and 1x1 TB Samsung (Samsung with the system on it)
-the other 1x 1TB Samsung stick (initially system stick in RAID1 with another 1 TB Samsung) now showing as a separate SSD in BIOS
-after booting the system (another 5-10 minutes) the PNG stick shows in Disk Management as "Offline"

Back to BIOS:
-checking the Dual Channel on Hyper card - it was back to default factory setting after BIOS update to v.2503. This was what most likely messed up the RAID1 configuration on the 2x1 TB Samsungs (system drive on Hyper card). OK, I set it back to Dual Channel, and removed both SSDs from the fake RAID1 (1xPNG and 1x Samsung).
-trying to restore RAID1 on the system SSDs (2x1TB Samsung 990 Pro) - it does not work
-checking the properties SDs (1TB PNG and 2x Samsung) in BIOS - the Samsung stick which was kicked out of RAID1 is listed as a single SSD, but in properties it still has some link to RAID1 (???). OK, I removed the RAID structure from the Samsung stick.
-after this - both 1 TB Samsung sticks gave no problems - RAID1 configuration on the system was restored
-the system boots normally - although it takes about 1-1.5 minutes.
-the 1 TB PNG stick is brought back online

All is back to normal (fingers crossed...)

Can anyone explain [redacted vulgarity] happened? How is it possible that the RAID1 configuration (after BIOS update) was changed from 2x 1TB Samsung to 1x1 TB PNG + 1x1TB PNG ?
I understand that the Dual Channel on the Hyper card in factory settings is not enabled after BIOS update- which broke RAID1 on the system drive - but kicking out one Samsung stick from RAID1 and replacing it with empty PNG stick - I don't understand this. Particularly that the 1x1 TB Samsung stick which was kicked out of RAID1 still had some link to RAID1.

I am not terribly smart or knowledgeable when it comes to BIOS - and most certainly I just hack my way through problems - but maybe someone can shed some light on the issue (and possibly indicate what I might have messed up).
 
Last edited by a moderator:

Lutfij

Titan
Moderator
Welcome to the forums, newcomer!

There seems to be one more BIOS update pending, which is 2603;
I'd try that and see if it helps your predicament. I'm curious to learn how much of a speed boost you got by running Samsung 990 Pro's in RAID0.

How is it possible that the RAID1 configuration (after BIOS update) was changed from 2x 1TB Samsung to 1x1 TB PNG + 1x1TB PNG ?
It's possible that there was a corruption in your BIOS. We usually advise users to clear their CMOS after flashing their BIOS to the latest version, so prior BIOS settings aren't intact.
 
Oct 12, 2024
3
0
10
Thank you for the response. My initial post was created after all the actions described in the post, and after the first successful boot. The second boot, however, wasn't as good: the comp appears to boot normally, but the wired keyboard and the wired mouse are not working.
Tried everything - switching to different USB ports, restarting the comp etc. - no cigar.
They only work in BIOS (lucky me) - so at least I could revert to the previous version of BIOS - 2301. But the issue (keyboard and mouse) still persists. Both have their lights on - but they simply don't respond.
Since they work in BIOS - I am guessing the USB ports are OK after all.

After reverting to 2301 - of course the settings for the Hyper card went back to a single M2 slot, which destroyed RAID1 configuration on the Hyper card (system drive). After restoring RAID1 on the Hyper card - the message is that RAID1 is "Rebuilding". I am not even sure that it is actually "Rebuilding" - since in BIOS there is no progress shown.

I am not sure the mouse/keyboard problem has anything to do with the system - please correct me if I am wrong.
I saved the BIOS 2301 current settings in a file, but I guess I can't attach it here...
I am so pissed off that I am seriously considering throwing away both the motherboard and the CPU (both 18 months old and with a pretty price tag) and buying new ones (although both are still on warranty).

Oh, one more thing is happening: after boot - a little window pops up with a message "Access violation at address 04969FA5 in module dip4.dll. Read of address 00000000".
This is probably ASUS vs Corsair link issue...but again, I have no idea how to fix it since I have no access to HDDs...Hopefully RAID1 will rebuild OK in a day or so, but
 
Oct 12, 2024
3
0
10
"I'm curious to learn how much of a speed boost you got by running Samsung 990 Pro's in RAID0."

Regarding the 990 Pro in RAID0 - hard to say, as I made many changes.
But having 20-80 million read/write operations to SATA HDD slows down the analysis for sure.
Let's say this:
With default NASTRAN configuration (RAM, CPUs etc) the job was taking 10 hrs.
Changing the NASTRAN configuration (increasing RAM available and enabling all 32 CPUs) shortened the job to 4 hrs.
Increasing the RAM from 64 to 128 GB and assigning the scratch area on Samsungs 990 Pro further reduced the analysis time to about 40-70 minutes.
Not bad :)
The 990 Pro are FAST...to say the least.