Recently a friend was working on one of his clients SQL Servers and he ran into an interesting problem. The hardware in question was a HP DL 580 with four chips, each with 10 cores, with hyper threading enabled. This should have presented to SQL Server as 80 total logical cores. The problem was that SQL Server was only seeing 40 cores. The server in this case was Windows Server 2008 R2 and the SQL Server was SQL Server 2008.
If you are familiar with SQL Server 2012 you may be thinking that this is done licensing limitation, but you would be wrong. The problem here is a NUMA problem.
The reason that the problem comes up (which I’ll cover before giving you the solution) because of the way NUMA works on large systems. We all know that we have NUMA nodes, which on todays servers are basically one physical CPU socket per NUMA node. NUMA nodes are put into groups when there are lots of logical processors. The thing is that a single NUMA group can only contain 64 logical processors. Looking back at our HP DL 580, we have 40 cores with hyper threading, which is 80 logical processors. That means that we need to have two NUMA groups.
This is no problem on Windows 2008 R2 as it supports NUMA groups (Windows 2008 and below do not). However the problem is that SQL Server 2008 doesn’t support NUMA groups, so it can only see the logical processors that are in NUMA group 0 (you can have up to 4 NUMA groups which are numbered from 0 to 3).
Because of this the SQL Server was only able to see 40 cores, and those 40 cores were the physical and logical cores from CPUs 0 and 1. We could see this in the errorlog file because it only showed CPUs 0 and 1. Why doesn’t SQL Server just use the physical cores from all the processors and ignore the hyperthreaded cores? Well that’s because it has no idea that CPUs 2 and 3 exist because it can’t see them over in NUMA Group 1.
When running the workload on this machine, 1/2 of the physical CPU power just isn’t being used.
Why do we are about this? Well the 1st problem is that accessing the memory that is attached to CPUs 2 and 3 is going to be expensive as that memory is in another NUMA node than the CPU that’s doing the work. That’ll slow things down with all the cross NUMA node requests. The other problem here is that under a heavy CPU workload the SQL Server will be using 20 real CPUs and 20 virtual CPUs. It would be much better to have access to all the physical CPU cores.
The solution here was quite simple once we realized what the problem was. Disable hyper threading. Now SQL Server can still only see 40 logical processors, but it’s getting all 40 physical cores on the server. This means that cross NUMA node memory access should be mostly gone and we’ve got all the CPU power that we paid for available to us.
As our servers get larger and larger we’ll have more and more cases of older versions of SQL Server not being able to see all the CPU power, and this is why. The number of logical processors that SQL Server could see really depends on the physical server config and how many cores each physical processor has. The basic idea behind the problem is that not all the cores are showing to the SQL Server.
Denny