Gravatar

Trey Teter

Software Engineer


Sandbox Service Upgrade

When Q2 started the SDK, we needed a way to quickly spin up online banking environments for external users. Prior to the SDK, there wasn’t much need for these environments to be accessible outside the Q2 network, nor was there a requirement to provision them rapidly. Typically, development teams would set them up for internal use, or they were configured and deployed by the Production and Staging teams for banking customers.

Initial Online Banking Setup

The team decided the fastest solution was to create a server cluster in AWS to host all online banking services, along with RDS instances to manage the databases. This setup served us well for many years. Aside from routine updates, the architecture remained largely unchanged—until recently.

Why We Needed a Change

At the time of implementation, AWS RDS for SQL Server had a limit of 30 databases per instance, along with licensing constraints—10 servers per license type. With two license types, our theoretical maximum was 600 databases. Although AWS later increased the limit to 100 databases per instance, this required more powerful (and more expensive) server types, raising the cap to 2,000 databases.

By the end of last year, we had reached 580 databases. We faced a decision: scale up to the larger RDS instances or design a new solution using EC2 servers.

Why We Chose EC2

After team discussions, we opted for EC2 servers. This aligned with our Staging and Production environments, and EC2 would be more cost-effective than RDS. Given the efficiency of Microsoft SQL Server, we were confident we could consolidate most databases onto a single EC2 instance. With the plan in place, it was time to execute.

Implementation

By May 12th, all migration scripts were written and tested. Maintenance was scheduled, and we were ready to upgrade. However, transferring 1.8 terabytes of data took significantly longer than expected. After an overnight effort, the system appeared stable by morning.

What Caused the Issues

We migrated databases from 19 over-provisioned servers to a single, high-performance machine. Given the efficiency of SQL Server and Online Banking, this should have worked seamlessly.

However, restarting all environments inadvertently upgraded Online Banking to a version incompatible with the databases. We quickly rolled back the environments, and everything seemed fine—until overnight processes spiked the server to 100% load.

To resolve this, we doubled the core count and quadrupled the memory. After one final restart of all environments, the system has remained stable and healthy.