In my last post, I mentioned that I started the process of upgrading Stack Overflow to SQL Server 2019. This week I tackled our first production servers and after upgrading, we hit a small issue aka a gotcha because we were using an old system view. Below is a recap of what I encountered.
A Little Background
The servers I upgraded were the three SQL Servers that run Stack Overflow for Teams. For those not familiar with our setup, we use availability groups (AGs) across all of our main SQL Servers. This allows us to use the primary server for read/writes, and use the secondaries for a lot of the read-only traffic. Since we utilize both the primary and secondaries for read purposes, logins need to be the same across all of the servers in the AGs. In order to keep the logins in sync, we have a job that periodically runs on each server and creates new logins using dynamic SQL.
Since I previously upgraded our servers to SQL Server 2017, I knew what to expect as I moved through the cluster. Once I started with the secondary servers the databases would not be accessible until the failover was complete. After upgrading a secondary, I would see this if I tried to access a database:
This also meant that any of the jobs touching the databases would fail, so I disabled SQL Agent jobs to minimize noisy alerts to my email. After the upgrade and failover, I’d enable all the jobs and be on my merry way.
The upgrade of both secondary servers went totally as planned, without any issues. The evening rolled around and despite Nick Craver’s popcorn, the failover was successful, and we finally had SQL Server 2019 running in production.
I upgraded the last server in the cluster (the former primary), and did some other clean-up tasks, including enabling all the SQL Agent jobs, before calling it a night. Once the jobs were enabled I started getting emails that one was failing — the Login Replication job.
Something’s broken - time to investigate.
The Failing Job
I pulled up the history on the job and saw the error:
Executed as user: <username>. Invalid value given for parameter PASSWORD. Specify a valid parameter value. [SQLSTATE 42000] (Error 15021). The step failed.
Sigh ok, something is really broken because this was working before we failed over.
The code for the login replication basically does the following via a cursor (yeah, I know, but it works…normally):
- Select from the primary via
OPENQUERYto query the logins and passwords
varbinarypassword to a
- Create a string to be executed, i.e. dynamic SQL that runs a
I have trimmed the code because it’s long, but the key parts query the login and password.
The value of
sid are then used with
Finally, we concatenate them into a SQL string that we execute:
When I ran the
SELECT statement, I noticed the issue — the value of
null which obviously was wrong, and since we were passing a
null as the password to
sp_hexadecimal the job was failing.
Great, so now what?
We couldn’t just stop replicating the logins across all servers, due to our usage of the secondaries, and we needed to do this automatically on a regular interval. We had to figure out a solution. Thankfully the fix was easy.
After seeing that the
null value of the password was coming from
sys.logins, and since we were already querying
sys.sql_logins we could easily replace:
sys.sql_logins (h/t to Nick Craver). With this one change to the query, the new
SELECT statement was:
Once it was updated, the job successfully ran, and our logins were replicating again without issue. Now, I really called it a night.
Ok, but what was the gotcha?
Remember I said that this was working fine on SQL Server 2017?
When we started looking for solutions, I looked up the Microsoft Docs for
sys.syslogins and right at the top of the doc it says:
This SQL Server 2000 system table is included as a view for backward compatibility. We recommend that you use the current SQL Server system views instead. To find the equivalent system view or views, see Mapping System Tables to System Views (Transact-SQL). This feature will be removed in a future version of Microsoft SQL Server. Avoid using this feature in new development work, and plan to modify applications that currently use this feature.
The doc shows the description and value of
Hmm, that’s weird because the
password column contains an actual value in SQL Server 2017. Something obviously changed. Yes, we were risky because we were using an old view, but the important thing is the value in the
password column changed, and it bit us.
In SQL Server 2017, there is still a value for
sys.syslogins, but in SQL Server 2019 it is now
If this was mentioned somewhere, I missed it. Thankfully, this wasn’t a critical job otherwise it could have been more problematic. We don’t have the Login Replication job running in development, so we didn’t hit the issue until we moved to production.
Technically it’s on us because we were using an old system view and didn’t check for changes, but keep in mind if you’re using the
sys.syslogins view anywhere, and rely on the
password, you’ll need to make a code update before moving to SQL Server 2019.
If you’ve upgraded to SQL Server 2019, have you hit any of these issues yet?