Question Can undervolting laptop CPU/iGPU cause dGPU death?

Jun 17, 2021
3
0
10
So right to the point: My lenovo e550 laptop worked fine for a day after undervolt to -101,6 mV for CPUcore, cache and igpu using Throttlestop. Things looked nice, temperatures were -10 degrees lower from standard high values and WarThunder ran smooth without throttling. General usage was non problematic for a day. But then when playing WarThunder i decided to switch to desktop while gaming, system suddenly froze and restarted with a bit of artifacts, which i expected (something happend when switching between dGPU and iGPU?). It booted back to windows but only for short period of time. Laptop went to BSOD (THREAD_STUCK_IN_DEVICE_DRIVER).

So I reverted voltages to default settings in safe mode and uninstalled Throttlestop. But problem persisted without ability to log into windows, stopped by same BOSDs i mentioned. I figured out problem went away once i disabled dGPU (AMD r7 M265) in safemode, laptop then functioned normally on integrated graphics without any crashes. So my question is: Is it possible that i somehow fried my dedicated gpu or VRAM? Can i test it somehow to be sure? I know log files would help you to get wiser, but i need some guidence on that. What i ve tried so far:

  • reinstalling various drivers for dedicated graphics, none of that had any impact on problem; always causing same crashes after installation
  • clean reinstall Windows 10 booting from usb, no impact
  • BIOS default settings and CMOS battery reset, no impact
  • ran SFC and DISM scripts, no impact
  • CHKDSK, no impact
Lenovo ThinkPad E550; model: 20DF5004NXS
CPU: i5-5200U CPU 2.20GHz; 2.19 GHz
iGPU: Intel HD5500
dGPU: AMD r7 M265
RAM: DDR3L; 2 sticks - 16,0 GB
OS: Win 10 64-bit
BIOS: 1.35 (J5ET64WW)


I have a feeling it somehow went bad because of unecessarily drastic igpu undervolting. I m sorry if I was not clear, i will add info if needed. Thank you for you time.
 

thomas4204

Reputable
Dec 24, 2018
408
25
4,740
So right to the point: My lenovo e550 laptop worked fine for a day after undervolt to -101,6 mV for CPUcore, cache and igpu using Throttlestop. Things looked nice, temperatures were -10 degrees lower from standard high values and WarThunder ran smooth without throttling. General usage was non problematic for a day. But then when playing WarThunder i decided to switch to desktop while gaming, system suddenly froze and restarted with a bit of artifacts, which i expected (something happend when switching between dGPU and iGPU?). It booted back to windows but only for short period of time. Laptop went to BSOD (THREAD_STUCK_IN_DEVICE_DRIVER).

So I reverted voltages to default settings in safe mode and uninstalled Throttlestop. But problem persisted without ability to log into windows, stopped by same BOSDs i mentioned. I figured out problem went away once i disabled dGPU (AMD r7 M265) in safemode, laptop then functioned normally on integrated graphics without any crashes. So my question is: Is it possible that i somehow fried my dedicated gpu or VRAM? Can i test it somehow to be sure? I know log files would help you to get wiser, but i need some guidence on that. What i ve tried so far:

  • reinstalling various drivers for dedicated graphics, none of that had any impact on problem; always causing same crashes after installation
  • clean reinstall Windows 10 booting from usb, no impact
  • BIOS default settings and CMOS battery reset, no impact
  • ran SFC and DISM scripts, no impact
  • CHKDSK, no impact
Lenovo ThinkPad E550; model: 20DF5004NXS
CPU: i5-5200U CPU 2.20GHz; 2.19 GHz
iGPU: Intel HD5500
dGPU: AMD r7 M265
RAM: DDR3L; 2 sticks - 16,0 GB
OS: Win 10 64-bit
BIOS: 1.35 (J5ET64WW)


I have a feeling it somehow went bad because of unecessarily drastic igpu undervolting. I m sorry if I was not clear, i will add info if needed. Thank you for you time.
It shouldn't have damaged anything and if it was holding onto the undervolt the CMOS reset would have cleared it. Sounds to me like your laptop dosn't like the dGPU. What model is the dGPU?
 
Jun 17, 2021
3
0
10
It shouldn't have damaged anything and if it was holding onto the undervolt the CMOS reset would have cleared it. Sounds to me like your laptop dosn't like the dGPU. What model is the dGPU?
Thanks for the reply. This one is dGPU https://www.techpowerup.com/gpu-specs/radeon-r7-m265.c2484 Its just really weird for me that it happend right the day after undervolting. Also hwinfo is reporting 100+ billions of errors on VRAM. I ve read that its ok to have few thousands errors but this is probably not right. Once im in windows i m able to turn on dgpu for like 20 seconds till it crashes so i m able to make hwinfo logfile untill it crashes.
 
Jun 17, 2021
3
0
10
Hi, i m posting results from minidump which i ran through windbg. Just leaving it here for possible future inspection. This error code basically repeats every time i try to reinstall any drivers (OEM or non OEM). the moment they are either installed, or not finished installing, same kind of bsod appears. I m playing with thought that the Win 10 version might be possible cause. Will experiment with it later since i have some school to do, possible going down to 1803 version. Hopefully someone can bring light to the problem since i m quite a technoob:


Microsoft (R) Windows Debugger Version 10.0.21349.1004 AMD64
Copyright (c) Microsoft Corporation. All rights reserved.


Loading Dump File [C:\Users\yo_whats_up\Desktop\dumpfiles\062221-27093-01.dmp]
Mini Kernel Dump File: Only registers and stack trace are available


Path validation summary *
Response Time (ms) Location
Deferred srv*
Symbol search path is: srv*
Executable search path is:
Windows 10 Kernel Version 19041 MP (4 procs) Free x64
Product: WinNt, suite: TerminalServer SingleUserTS Personal
Edition build lab: 19041.1.amd64fre.vb_release.191206-1406
Machine Name:
Kernel base = 0xfffff8073fc00000 PsLoadedModuleList = 0xfffff8074082a290
Debug session time: Tue Jun 22 19:45:03.288 2021 (UTC + 2:00)
System Uptime: 0 days 0:01:06.031
Loading Kernel Symbols
...............................................................
................................................................
..................................
Loading User Symbols
Loading unloaded module list
..........
For analysis of this file, run !analyze -v
*** WARNING: Unable to verify timestamp for dxgkrnl.sys
nt!KeBugCheckEx:
fffff8073fff6b90 48894c2408 mov qword ptr [rsp+8],rcx ss:0018:ffff820f552bf140=00000000000000ea
2: kd> !analyze -v
***
  • *
  • Bugcheck Analysis *
  • *
***

THREAD_STUCK_IN_DEVICE_DRIVER_M (100000ea)
The device driver is spinning in an infinite loop, most likely waiting for
hardware to become idle. This usually indicates problem with the hardware
itself or with the device driver programming the hardware incorrectly.
If the kernel debugger is connected and running when watchdog detects a
timeout condition then DbgBreakPoint() will be called instead of KeBugCheckEx()
and detailed message including BugCheck arguments will be printed to the
debugger. This way we can identify an offending thread, set breakpoints in it,
and hit go to return to the spinning code to debug it further. Because
KeBugCheckEx() is not called the .BugCheck directive will not return BugCheck
information in this case. The arguments are already printed out to the kernel
debugger. You can also retrieve them from a global variable via
"dd watchdog!g_WdBugCheckData l5" (use dq on NT64).
On MP machines it is possible to hit a timeout when the spinning thread is
interrupted by hardware interrupt and ISR or DPC routine is running at the time
of the BugCheck (this is because the timeout's work item can be delivered and
handled on the second CPU and the same time). If this is the case you will have
to look deeper at the offending thread's stack (e.g. using dds) to determine
spinning code which caused the timeout to occur.
Arguments:
Arg1: ffff89872170d040, Pointer to a stuck thread object. Do .thread then kb on it to find
the hung location.
Arg2: 0000000000000000, Pointer to a DEFERRED_WATCHDOG object.
Arg3: 0000000000000000, Pointer to offending driver name.
Arg4: 0000000000000000, Number of times "intercepted" BugCheck 0xEA was hit (see notes).

Debugging Details:
------------------

*** WARNING: Unable to verify checksum for win32k.sys

KEY_VALUES_STRING: 1

Key : Analysis.CPU.mSec
Value: 8734

Key : Analysis.DebugAnalysisManager
Value: Create

Key : Analysis.Elapsed.mSec
Value: 193072

Key : Analysis.Init.CPU.mSec
Value: 843

Key : Analysis.Init.Elapsed.mSec
Value: 61457

Key : Analysis.Memory.CommitPeak.Mb
Value: 80

Key : WER.OS.Branch
Value: vb_release

Key : WER.OS.Timestamp
Value: 2019-12-06T14:06:00Z

Key : WER.OS.Version
Value: 10.0.19041.1


BUGCHECK_CODE: ea

BUGCHECK_P1: ffff89872170d040

BUGCHECK_P2: 0

BUGCHECK_P3: 0

BUGCHECK_P4: 0

FAULTING_THREAD: ffff89872170d040

BLACKBOXBSD: 1 (!blackboxbsd)


BLACKBOXNTFS: 1 (!blackboxntfs)


BLACKBOXPNP: 1 (!blackboxpnp)


CUSTOMER_CRASH_COUNT: 1

PROCESS_NAME: System

STACK_TEXT:
ffff820f552bf138 fffff8074784333d : 00000000000000ea ffff89872170d040 0000000000000000 0000000000000000 : nt!KeBugCheckEx
ffff820f552bf140 00000000000000ea : ffff89872170d040 0000000000000000 0000000000000000 0000000000000000 : dxgkrnl+0x4333d
ffff820f552bf148 ffff89872170d040 : 0000000000000000 0000000000000000 0000000000000000 fffff807478431e0 : 0xea
ffff820f552bf150 0000000000000000 : 0000000000000000 0000000000000000 fffff807478431e0 0000000000000028 : 0xffff8987`2170d040


STACK_COMMAND: .thread 0xffff89872170d040 ; kb

SYMBOL_NAME: dxgkrnl+4333d

MODULE_NAME: dxgkrnl

IMAGE_NAME: dxgkrnl.sys

FAILURE_BUCKET_ID: 0xEA_IMAGE_dxgkrnl.sys

OS_VERSION: 10.0.19041.1

BUILDLAB_STR: vb_release

OSPLATFORM_TYPE: x64

OSNAME: Windows 10

FAILURE_ID_HASH: {ea458ad2-d5ab-aa6c-7a11-54653c70dfb8}

Followup: MachineOwner
---------