TempDB files don’t exist/are not reachable anymore, what to do?

TempDB is one of four(well, actually five with ‘mssqlsystemresource’ but that’s not the point of this blog post) system databases of SQL Server that holds temporary user and system objects for various operations, such as temp tables or sorting operations.

It is a crucial item on the startup checklist for SQL Database engine service. At every restart, tempdb is recreated and starts empty to serve its role. Any problem that you may face during this creation operation will cause the database service fail to start.

The Scenario

Let’s imagine a scenario where there are 2 tempdb data files defined, one of them is on E drive and the other is on F drive. What happens if we lose F drive? It won’t crash the instance but you will start seeing alerts in SQL Agent logs such as

“The operating system returned the error ‘2(The system cannot find the file specified.)’ while attempting ‘GetVolumePathName’ on ‘F:\temp2.ndf’ at ‘storagedmv.cpp'(443).”

Clearly, SQL Server is trying to access the defined tempdb data file but it doesn’t exist anymore, so it fails.

The first idea that comes to mind is to remove those file(s) and we should be okay. But you can’t just remove them, because SQL Server can’t remove a file if it is not empty.

USE [tempdb]
GO
ALTER DATABASE [tempdb]  REMOVE FILE [temp2]
GO

Msg 5042, Level 16, State 1, Line 3
The file ‘temp2’ cannot be removed because it is not empty.

Then how about emptying the file first and then remove it? No, it will not work also because we can’t access the file anymore.

USE [tempdb]
GO
DBCC SHRINKFILE (N'temp2', EMPTYFILE)
GO

Msg 823, Level 24, State 2, Line 3
The operating system returned error 21(The device is not ready.) to SQL Server during a read at offset 0x000001f3402000 in file ‘F:\temp2.ndf’. Additional messages in the SQL Server error log and operating system error log may provide more detail. This is a severe system-level error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.

The fix

We have to tell SQL Server that we lost these files and during next startup, it should create them on another available disk. How do we do that? By giving a new path for tempdb database files using ALTER DATABASE command.

USE [tempdb]
GO
ALTER DATABASE [tempdb] MODIFY FILE 
(name = temp2, filename = 'E:\temp2.ndf')--New available path
GO

The file “temp2” has been modified in the system catalog. The new path will be used the next time the database is started.

As can be seen in the result output, SQL updated its system catalog and will try to create this data file on given network path.

We can also verify this change by querying master_files dmv

SELECT * FROM sys.master_files WHERE database_id=2

As long as there is enough space on that drive, SQL should start without any problem and create tempdb files as configured.

If there is not enough space, then it will create tempdb files based on model database configuration. This will also be logged in error log file and sql agent log as

Message
MODIFY FILE encountered operating system error 112(There is not enough space on the disk.) while attempting to expand the physical file 'E:\temp2.ndf'.

SQL Server without tempdb?

One day you may want / need to start SQL Server without tempdb, maybe because tempdb drive is corrupt or not reachable and SQL doesn’t start successfully. Then you can start database engine service in a minimal configuration mode with the -f command-line parameter, either from SQL Configuration Manager tool by setting Service Properties or use the command-line

SQLServr.exe -f

Alternatively, you can use Trace Flag 3608 which only starts master database

SQLServr.exe -T3608

After bringing the service up, you can connect to it using SQLCMD command line tool and change the paths of tempdb files to a suitable folder. Now we can stop the database service, restart it and tempdb will be automatically created on new path without any issue.

Microsoft Announced a Vulnerability(CVE 2021 1636) and Released Security Fix

2 days ago, Microsoft released a Security Fix for ALL SQL Server versions to mitigate risks about a privilege escalation vulnerability that leverages Extended Events.

According to Microsoft website:

How can an attacker exploit this vulnerability? An authenticated attacker can send data over a network to an affected SQL Server when configured to run an Extended Event session.
https://msrc.microsoft.com/update-guide/en-US/vulnerability/CVE-2021-1636

Due to obvious reasons, they didn’t share a lot of information about the vulnerability yet. Make sure to plan this in your SQL Server environment and patch all SQL Servers as soon as possible!

Download Links:

SQL Server 2019	CU8	https://www.microsoft.com/en-us/download/details.aspx?id=102617
SQL Server 2019	RTM	https://www.microsoft.com/en-us/download/details.aspx?id=100442
SQL Server 2017	CU22	https://www.microsoft.com/en-us/download/details.aspx?id=102619
SQL Server 2017	RTM	https://www.microsoft.com/en-us/download/details.aspx?id=102620
SQL Server 2016	SP2 CU15	https://www.microsoft.com/en-us/download/details.aspx?id=102621
SQL Server 2016	SP2	https://www.microsoft.com/en-us/download/details.aspx?id=102622
SQL Server 2014	SP3 CU4	https://www.microsoft.com/en-us/download/details.aspx?id=102623
SQL Server 2014	SP3	https://www.microsoft.com/en-us/download/details.aspx?id=102624
SQL Server 2012	SP4	https://www.microsoft.com/en-us/download/details.aspx?id=102625

Make sure to download correct patch file

Upgrading SQL Server 2019 from CTP to RTM version: Is it even possible?

We have been waiting so long for this moment. SQL Server 2019 RTM is finally out! Maybe Microsoft should announced it as SQL Server 2020 when taking into consideration that only one month left to reach new year. Anyway, better late than never.

As I stated, It took so long that we did not only refresh Microsoft downloads website and waited with open arms, but also we wanted to do something in the meantime. That’s why, we followed early versions and installed SQL Server 2019 CTP 3.0 Evaluation Edition to get things going while waiting for RTM. I told the business partners that this is an Evaluation Edition which will ends in 6 months, took the sign-off and started waiting for RTM release.

Now that the time has come, I am asking this question:

Is it possible to upgrade SQL Server 2019 CTP 3.0 version to RTM version?

Looks like it is not.

I read Release Notes for SQL Server 2019 , there is no information about upgrade from CTP or RC version. Then, I kept searching on the internet and came across this documentation page which clearly states that they only support upgrades from Developer Edition and Release Candidates with a special condition on being Early Adopter Program member: Supported Version and Edition Upgrades for SQL Server 2019

I decided to give it a shot and launched Installation Wizard. First I tried “Upgrade from a previous version” option.

Unfortunately, that didn’t work due to the fact that installer gets information about already installed CTP version and forces you to create a new named instance which is not our desire. So it is a dead end.

Then It occurred to me that this is also an Edition Upgrade from Enterprise Evaluation to Enterprise. At this point, I have nothing to loose. I took backup of databases, server objects and accepted the fact that I have to uninstall CTP and start with a clean install of RTM.Thus I decided to try Edition Upgrade also.

Obviously, this didn’t work too, Nevertheless, They may say I have failed but No one can say I didn’t try.

I removed everything that has ‘2019’ in it from Programs and Features, restarted server and finally installed brand new SQL Server 2019 RTM Enterprise Edition.

During installation, I’ve noticed two new tabs in “Database Engine Configuration” section. They are MaxDOP and Memory as you can see below.

Good to have Recommended Memory Adjustments in the wizard, that means I don’t need the piece of code in my installation script where I set these values manually.

Memory settings are crucial for database engine. Thus, Microsoft would like to make sure that you know what you are doing. If you don’t tick the box at the bottom, GUI gives an error:

The rest is just usual installation wizard steps.

To sum up, it is not possible to upgrade from CTP or RC versions to RTM directly. You have to uninstall everything first and then install a fresh one.

I understand why this is not supported from Microsoft but I would expect some unofficial documentation or blogs about this upgrade but looks like no one even bothered trying or documenting.

Always On Availability Groups with Multi-Subnet Failover Cluster

So you decided to create Always On Availability Groups with Multi-Subnet Failover Cluster which gives you the opportunity to failover across different data centers that you have in different regions or continents. Lately, we created this scenario for our monitoring tool with 2 different Subnets. Along the way, we have faced some issues and fixed them. I prepared this post to give you a quick summary about our experiences.

First of all, Multi Subnet Failover Cluster needs to be created and configured by Windows or Domain Admins.

Command:

Install-WindowsFeature -Name Failover-Clustering -IncludeManagementTools

New-Cluster -Name MS-SQL-FC -Node SCOMDB1, SCOMDB2 -StaticAddress x.x.x.x, y.y.y.y

In this example:

Cluster Name Object(CNO) : MS-SQL-FC

Nodes: SCOMDB1(Primary),SCOMDB2(Secondary)

After Multi-Subnet Failover Cluster ready, Always On feature needs to be enabled on both SQL Services. After this feature enabled, a service restart is required. If you enable this feature before cluster is fully operational and ready you will get an error when you try to create AG such as:

Msg 35220, Level 16, State 1, Line 92
Could not process the operation. Always On Availability Groups replica manager is waiting for the host computer to start a Windows Server Failover Clustering (WSFC) cluster and join it. Either the local computer is not a cluster node, or the local cluster node is not online. If the computer is a cluster node, wait for it to join the cluster. If the computer is not a cluster node, add the computer to a WSFC cluster. Then, retry the operation.
Disconnecting connection from HOSTNAME…

Now, we can continue to Always On Availability Group setup. The tricky part here is not using New Availability Group Wizard but using TSQL! There are 3 reasons:

1- You can not create an empty Availability Group with Wizard. Most of the time you need to setup Availability Group prior to application setup. In that case, people (even some DBAs with 15 years background) create dummy databases just to create AG, that breakes my heart in two. You can create empty AG with TSQL.

2- When you create an Availability Group with Wizard and add a database with backup & restore seeding method, the user who launced the Wizard will be the owner of :

Endpoint
Availability Group
Database

Guess what is going to happen when that individual leaves the company? Thus use T-SQL Scripts and specify an account with EXECUTE AS LOGIN command! It could be sa or a dedicated svc account for this purpose. In my example I will be using sa account.

3- It’s fun, fast and more secure.

Below script should be executed on both SCOMDB1 and SCOMDB2 to create Endpoint for AG communication and Extended Events to monitor AG:

USE [master]
GO
EXECUTE AS LOGIN=’sa’
CREATE ENDPOINT [Hadr_endpoint]
AS TCP (LISTENER_PORT = 5022)
FOR DATA_MIRRORING (ROLE = ALL, ENCRYPTION = REQUIRED ALGORITHM AES)
GO

IF (SELECT state FROM sys.endpoints WHERE name = N’Hadr_endpoint’) <> 0
BEGIN
ALTER ENDPOINT [Hadr_endpoint] STATE = STARTED
END

USE [master]
GO
EXECUTE AS LOGIN=’sa’
GRANT CONNECT ON ENDPOINT::[Hadr_endpoint] TO [sa]
GO

IF EXISTS(SELECT * FROM sys.server_event_sessions WHERE name=’AlwaysOn_health’)
BEGIN
ALTER EVENT SESSION [AlwaysOn_health] ON SERVER WITH (STARTUP_STATE=ON);
END
IF NOT EXISTS(SELECT * FROM sys.dm_xe_sessions WHERE name=’AlwaysOn_health’)
BEGIN
ALTER EVENT SESSION [AlwaysOn_health] ON SERVER STATE=START;
END
GO

Create Always On Availability Group on SCOMDB1:(Make sure to edit Failover Mode, Availability Mode and other options according to your business needs)

EXECUTE AS LOGIN=’sa’

CREATE AVAILABILITY GROUP [SCOMBAG]
WITH (AUTOMATED_BACKUP_PREFERENCE = PRIMARY,
BASIC,
DB_FAILOVER = ON,
DTC_SUPPORT = NONE)
FOR
REPLICA ON N’SCOMDB1′ WITH (ENDPOINT_URL = N’TCP://SCOMDB1.FQDN:5022′, FAILOVER_MODE = AUTOMATIC, AVAILABILITY_MODE = SYNCHRONOUS_COMMIT, SEEDING_MODE = AUTOMATIC, SECONDARY_ROLE(ALLOW_CONNECTIONS = NO)),
N’SCOMDB2′ WITH (ENDPOINT_URL = N’TCP://SCOMDB1.FQDN:5022′, FAILOVER_MODE = AUTOMATIC, AVAILABILITY_MODE = SYNCHRONOUS_COMMIT, SEEDING_MODE = AUTOMATIC, SECONDARY_ROLE(ALLOW_CONNECTIONS = NO));

Create Always On Listener on SCOMDB1

EXECUTE AS LOGIN=’sa’

ALTER AVAILABILITY GROUP [SCOMBAG]
ADD LISTENER N’SCOMBAG-L’ (
WITH IP
((N’10.x.x.z’, N’255.255.255.0′),
(N’10.y.y.z’, N’255.255.255.0′)
)
, PORT=1433);

Join Secondary Host to AG

EXECUTE AS LOGIN=’sa’
ALTER AVAILABILITY GROUP [SCOMBAG] JOIN;

EXECUTE AS LOGIN=’sa’
ALTER AVAILABILITY GROUP [SCOMBAG] GRANT CREATE ANY DATABASE;

Now, You can create your database(s) and add them to Availability Group. There are two prerequisites for adding a database to Availability Group:

Database should be in FULL recovery mode
At least one full backup should be performed

After a database meets these requirements, you can prepare database on secondary with backup & restore method by executing below script in SQLCMD mode

–Prepare database on secondary with backup & restore method

:Connect SCOMDB1
EXECUTE AS LOGIN = ‘sa’;
BACKUP DATABASE [DBNAME] TO DISK = N’DBNAME_AG.bak’
:Connect SCOMDB2
EXECUTE AS LOGIN = ‘sa’;
RESTORE DATABASE [DBNAME] FROM DISK = N’DBNAME_AG.bak’ WITH NORECOVERY
:Connect SCOMDB1
EXECUTE AS LOGIN = ‘sa’;
BACKUP LOG [DBNAME] TO DISK = N’DBNAME_AG_log.bak’
:Connect SCOMDB2
EXECUTE AS LOGIN = ‘sa’;
RESTORE LOG [DBNAME] FROM DISK = N’DBNAME_AG_log.bak’ WITH NORECOVERY

Now, Database is ready and you can add it to Availability Group
–CONNECT TO PRIMARY
:Connect SCOMDB1
EXECUTE AS LOGIN = ‘sa’;

ALTER AVAILABILITY GROUP [SCOMBAG] ADD DATABASE OperationsManager;

–CONNECT TO SECONDARY
:Connect SCOMDB2

EXECUTE AS LOGIN = ‘sa’;
ALTER DATABASE [OperationsManager] SET HADR AVAILABILITY GROUP = [SCOMBAG];

–Check the dashboard.

The thing about multiple DNS Records

When we create the cluster, CNO creates DNS records under its name on DNS server for each subnet on the failover cluster such as:

MS-SQL-FC 10.x.x.x

MS-SQL-FC 10.y.y.y

When we create Always On Listener from SQL Server,RegisterAllProvidersIP setting will be set to 1 automatically. As a result, Listener Cluster Resource creates DNS records for each subnet ip it has been assigned such as:

SCOMBAG-L 10.x.x.z

SCOMBAG-L 10.y.y.z

This is an intended behaviour by design. If you add “MultiSubnetFailover=true” clause into connection string and your connector supports this feature(you can check here) database connector gets two DNS records from DNS server and then tries to connect both IP addresses in parallel (theoratically) and connects to right one. Thus, downtime will be minimized in case of a failover.

However, there may be such cases like older database clients doesn’t support MultiSubnetFailover option or your application doesn’t support this feature like our monitoring tool. Then, clients will get timeout errors for %50 percent of their requests inevitably since half of the connections would use wrong DNS record.

To solve this, you have to set RegisterAllProvidersIP to 0, then only the online subnet’s IP will be registered with DNS. You also need to set HostRecordsTTL parameter to reduce downtime. By default it is 1200 seconds, which means that every 20 minutes Local DNS Cache will be updated and application server will learn online IP. Common best practice is setting it to 600 so Local DNS cache will be updated every 5 minutes. Thus, in case of a failover, downtime will be only 5 minutes. We set it to 60 seconds in our case. You can set this parameters for Cluster Name and Listener Network Name through Powershell as:

Get-ClusterResource “Cluster Name“ | Set-ClusterParameter -Name HostRecordTTL -Value 120

Get-ClusterResource “Cluster Name” | Set-ClusterParameter -Name RegisterAllProvidersIP -Value 0

Get-ClusterResource “SCOMBAG-L” | Set-ClusterParameter -Name HostRecordTTL -Value 120

Get-ClusterResource “SCOMBAG-L” | Set-ClusterParameter -Name RegisterAllProvidersIP -Value 0

The listener name resource has to be taken offline and back online for the above changes to take effect. If you do these parameter settings in setup phase, then you are a wise man. If you find out after you installed the environment and clients started to complain about timeout errors, then taking Listener name resource offline will also take the availability group offline since it is dependent on Listener name resource. To prevent this, you should remove the dependency with Powershell or by the Windows Failover Cluster Manager utility.

Windows Failover Cluster Manager:

Open Properties of Availability Group Role
Click Dependencies Tab and Delete Listener Name Resource

Powershell:

Remove-ClusterResourceDependency -Resource SCOMBAG -Provider SCOMBAG-L

Now you have to take listener name resource offline and back online.

Stop-ClusterResource SCOMBAG-L

Start-ClusterResource SCOMBAG-L

Force DNS Update

Get-ClusterResource SCOMBAG-L | Update-ClusterNetworkNameResource

Re-add dependency

Windows Failover Cluster Manager:

Open Properties of Availability Group Role
Click Dependencies Tab and Click Resource dropbox
Select Listener Name Resource and Apply

Powershell:

Add-ClusterResourceDependency -Resource SCOMBAG -Provider SCOMBAG-L

Reconfigure Listener Port configuration from SSMS:

ALTER AVAILABILITY GROUP [SCOMBAG] MODIFY LISTENER ‘SCOMBAG-L’ (PORT=1433);

Error: 18456, Severity: 14, State: 5 Login failed for a member of AD group

This is officially my first blog post. I was planning to write a blog for the last 2 years but couldn’t start until now, so here we go.

Last Friday, we ran into a weird login failed error message for a login who is a member of an AD group with sysadmin rights on a new installed SQL Server. I tried conventional methods to find root cause and fix it but I have never experienced this case before. So, I decided to write a blog post about it as my first post!

One of my colleague reported that he can not login into a newly installed Always On Secondary SQL server(let’s call it SQL2) from another database host(let’s call it Host1). I checked error log immediately and saw this errors:

This is a typical login failure which a dba sees a lot, however the problem is he is a member of an AD group(let’s call it [AD\Group1]) and [AD\Group1] exists on SQL2 with sysadmin privileges! Furthermore, I am also a member of [AD\Group1] and I can connect to SQL2 as other members. Also, [AD\Group1] is exists on SQL1, primary replica of this Availability Group, and he connected that server without any problem. Things were getting interesting.

First, I created his domain account directly, he was able to connect both SQL1 and SQL2. This way, we eliminated the possibility that his account is not able to login. But why he couldn’t connect through an privileged AD group? There was something going on with AD group and his membership. I used xp_logininfo procedure to retrieve information about [AD\Group1] members and I saw him as a member, so why the hell SQL does not allow him to login and says there is no such login?

I patched both servers with latest SP and CU, didn’t help.

We removed his account from AD group and added him again, didn’t help.

We removed [AD\Group1] from SQL2 and created again with sysadmin role, didn’t help.

Even though I am embarrassed to tell this, I restarted SQL service, of course didn’t help.

I further searched this issue on internet, also didn’t help.

Then, I installed SSMS on SQL2 host and asked him to try to connect directly from database host, this time it worked. Now, I am even more confused.

So, We have an AD group with sysadmin role which contains 2 members in it. One of them can connect to both SQL1 and SQL2 from anywhere(Host1, local workstation, SQL2 host and so on) without problem, other member can connect to SQL1 without a problem. He can also connect to SQL2 as a member of [AD\Group1] from his workstation and SQL2 host. However, he can’t connect to SQL2 from Host1.

I realized that this needs more careful and detailed examination. I went back to basics and read about AD groups, kerberos authentication and SQL Server authentication-authorization. Then, my colleague told me that he removed himself from [AD\Group1] one week ago and re-added 1 day before this incident occurred. That moment I realized that his kerberos ticket was different on AD than the one cached in his user profile on Host1 server. I asked him to log off from Host1 and log on again, only then he could connect to SQL2 from Host1.

To sum up, If you encounter a similar login failure make sure that domain user is logged off and logged in again!