If you follow me on Twitter, then you might have noticed that I’ve been fighting a lot of fires lately. Between high CPU (several times 1, 2), blocking queries, a slow failover, and deadlocks there have been a ton of things that needed attention. Not all of these issues are interesting to people, but one of them might be…the deadlocks. I’m going to go into details about what deadlocks we hit, why we hit them, and how we resolved them.
I mentioned this in a few previous posts, but for for those who may have missed it or forgotten, here’s a quick refresher - we use Always On Availability Groups at Stack Overflow on all of our main production servers running the network of public Q&A sites, Jobs, and Stack Overflow for Teams. It’s a great way to implement disaster recovery for a SQL Server environment. Always On Availability Groups can support up to nine availability replicas, and while we don’t use anywhere near that many replicas in each of our clusters, we do have 2 replicas per cluster (3 servers total), with the replicas being used as a readable secondary.
In my previous post, I listed out the tools I use with SQL Server. Some of the tools are SQL scripts that need to be deployed to each server. If you have 1-2 SQL Servers, manually deploying scripts might not be bad, but ideally you don’t want to manually deploy anything, so I wrote a little script that allow me to install the SQL scripts in my toolbox to any environment.
I get asked a lot about the tools I use at Stack Overflow to monitor and work with our SQL Servers. I figured it might be helpful to others to make the list public. I’ll also do my best to keep it updated as things change. The current list of both free and 3rd party paid tools is below: Free Tools Opserver - Stack Exchange’s Monitoring System - I pretty much live in our instances of Opserver because it gives me a one-stop shop to see the health of all of our SQL Servers
Initially, I wasn’t sure whether to write about this migration project, but when I randomly asked if people would be interested, the response was overwhelming. This was a long, kind of boring, very repetitive, and at times incredibly frustrating project, but I learned a lot, and maybe someone else will learn from this too. There may be far better ways to move this amount of data. In the path I went down, there was a huge amount of juggling that had to take place (I’ll explain that later).