News Intel's next-gen Arrow Lake CPUs might come without hyperthreaded cores — leak points to 24 CPU cores, DDR5-6400 support, and a new 800-series chipset

Page 3 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.
Tin, no VMs.
That would mean you would have to disable HT on a host and migrate the VM to said host to find out if HT gave you such a speed up. That said the VM will still have a certain number of vCPUs allocated to it. A VM with 8 vCPUs on a HT enabled host would use 4 cores and 8 total threads. However, on a host without HT enabled you would get 8 full cores. Only way 4c/8t can be as fast as 8c/8t in threaded applications is if the instructions are small enough that 1/2 or less of the resources on a core are being used. At that point you would in theory get a 100% increase in performance from HT. That just doesn't happen.

What probably is happening is by diabling HT on a host you have halved the number of vCPUs available in the virtualized environment. Therefore your 8 vCPU VM now has to wait for 8 physical cores to be available for any processing to be done; whereas on a host with HT it only had to wait for 4 phycial cores to be available. Most likely your hosts are over provisioned (assigning out 10 vCPUs but only having 8 physical CPUs) on CPU resources (common practice and a 25% over provisioning is about standard). Since there aren't enough physical resources available, the VM has to wait for resources to become available. For example say you have 16 physical cores and the VM needs 8 cores. The VM MUST wait until it can gain access to 8 cores before it gets CPU time. Now say you have two 2 vCPU VMs running on the same host and they are waiting for resources and 6 cores come available. They will take the 4 cores (each only needs 2 cores) once available leaving your 8 vCPU VM still waiting for CPU time. If you have HT enabled those 6 physical CPUs now are viewed as 12 vCPU by the hypervisor. That means you can run both 2 vCPU VMs AND the 8 vCPU VM at the same time. This means the "speed up" you are seeing by HT is due to CPU Ready % being lower since there are 2t per core instead of HT itself being responsible in conjunction with your code.
 
Let me restate myself, perhaps it was misunderstood:

SQL Server running on tin. No VMs. HT gave improvements of up to 80% with no cases of performance loss for the workloads that we ran, which were datawarehouse and ODS loads.
 
Let me restate myself, perhaps it was misunderstood:

SQL Server running on tin. No VMs. HT gave improvements of up to 80% with no cases of performance loss for the workloads that we ran, which were datawarehouse and ODS loads.
By tin you mean bare metal and or physical appliance? Even so your experience is the exception and not the rule. In most cases I have seen the speed up by HT is about 30% as HT allows for more execution units on a physical core to be used. In order to get an 80% speed up you would need your execution units to only be half used.
 
SQL Server running on tin. No VMs. HT gave improvements of up to 80%
What kind of CPU?

When running with HT disabled, is it possible the server was configured to use more threads than there were CPU cores available for it to use (i.e. not being used by anything else)? That's definitely a way I could see it get a disproportionate speedup from using HT - if threads were getting starved out. However, it'd be an artificial case, because the solution would be just to decrease the number of threads and suddenly your non-HT performance would improve.
 
I agree that our use case was the exception.

It wasn't an appliance. They were two servers, exactly the same config. It was initially SQL Server 2008, eventually upgraded all the way to 2017. I don't recall the exact CPU model, but they were dual CPU Xeons, 6C12T per CPU. We followed Microsoft's recommended settings for things like MAXDOP (the maximum degree of parallelism) for the servers, experimented with our own (because, of course, "we knew better" lol) and then discovered that Microsoft knew this particular aspect of their own product better than we did since their settings for our server setup worked best. (I guess MS and Intel really did earn the "Wintel" label the way the SQL Server works so well with Intel CPUs. -)

Our bottleneck was always the storage in our daily runs (our workloads were bursty since they were typically batch loads in the morning and then staggered batch processing scheduled for every few hours for weekdays, and then hammered at month ends and year ends.)
 
It wasn't an appliance. They were two servers, exactly the same config. It was initially SQL Server 2008, eventually upgraded all the way to 2017.
That is a physical appliance.

When running with HT disabled, is it possible the server was configured to use more threads than there were CPU cores available for it to use (i.e. not being used by anything else)? That's definitely a way I could see it get a disproportionate speedup from using HT - if threads were getting starved out. However, it'd be an artificial case, because the solution would be just to decrease the number of threads and suddenly your non-HT performance would improve.
they were dual CPU Xeons, 6C12T per CPU
That for sure is a possibility with only 12c/24t. Where I work we do some cloud hosting and I've seen Server 2008 R2 VMs running Oracle with 28 vCPU. It is very possible that your work load needed more CPU grunt than 12c alone could provide, hence the large increase in performance by HT. If you had 24c your speedup would probably have been 100%.

Our bottleneck was always the storage in our daily runs
To be honest unless you have an in RAM DB the storage layer will always be your bottleneck. DBs can get around some of the storage bottlenecks with caching and predictive calls, however, if you make a call to a table it didn't expect then it still has to go to storage.
 
  • Like
Reactions: bit_user
That is a physical appliance.
Nope. Servers ordered from our provider to our specifications. Initially it was supposed to be only used for our DW and reporting, but then over the years it's role expanded to internal file share, and FTP server, one VM for two years, and some other (very!) small applications because other teams "didn't have budget." We also upgraded the machine's RAM once and the storage twice.

An appliance is a single-purpose sealed black box from a vendor that you're not allowed to touch. We almost bought a Microsoft Parallel Datawarehouse Applicance, but exco didn't approve the funding.




To be honest unless you have an in RAM DB the storage layer will always be your bottleneck. DBs can get around some of the storage bottlenecks with caching and predictive calls, however, if you make a call to a table it didn't expect then it still has to go to storage.
I know.

If you want some fun, check out Thomas Grohser's presentation from the EightKB conference "Scaling SQL Server beyond 2 CPUs." He goes into a _wonderful_ amount of detail regarding SQL Server and CPUs. <3

https://eightkb.online/previous/

 
They were two servers, exactly the same config. It was initially SQL Server 2008, eventually upgraded all the way to 2017. I don't recall the exact CPU model, but they were dual CPU Xeons, 6C12T per CPU.
So HT was disabled on only one? If so, did you ever check that they indeed performed identically when HT was either enabled or disabled on both?

Our bottleneck was always the storage in our daily runs
In this case, I wonder if simply increasing the number of threads in your non-HT case could've improved performance. If the threads were blocking on I/O, then having more could increase the effective queue depth, resulting in higher IOPS - thereby netting you more performance. In this case, having more threads scheduled on the HT machine could improve performance beyond what just the HT, itself, is doing.

All I'm saying is that it would need further investigation to know if the real differentiator was HT. There's certainly a chance that the IPC in executing your queries was exceptionally low, but I wouldn't expect so.
 
Last edited:
  • Like
Reactions: thestryker
If the threads were blocking on I/O, then having more could increase the effective queue depth, resulting in higher IOPS - thereby netting you more performance.
I'm assuming they were using local disk, age makes me think 10k HDD, instead of a SAN. 24 10k disks will only get you about 3600 IOPS with a max of 6GBps sequential reads. If they were on a SAN it would have been 8Gbps fiber at first so that would have really killed your through put.
 
All I'm saying is that it would need further investigation to know if the real differentiator was HT. There's certainly a chance that the IPC in executing your queries was exceptionally low, but I wouldn't expect so.
Consider the likely hardware era it'd be somewhere between Nehalem and Ivy Bridge most likely (maybe Haswell). Boosting and TDP worked differently than now so there was likely no clockspeed difference between HT on/off like there would be now. That certainly wouldn't make for all of the difference, but it helps.
 
Status
Not open for further replies.