Business continuity with Azure SQL Database

To protect your business from data loss, SQL Database automatically creates

  • Full database backups weekly
  • Differential database backups every 12 hours
  • Transaction log backups every 5 – 10 minutes

The backups are stored in RA-GRS storage blobs for 7 days Basic service tier, 14 days for Standard and 35 days for Premium service tiers. It’s also replicated to a paired data centre for protection against a data centre outage.

Microsoft provides Azure SQL Database with 99.99% availability SLA on all its service tiers and 99.995% SLA for the business-critical or premium tiers in the regions that support availability zones. For the latest updates check the below link.

https://azure.microsoft.com/en-gb/support/legal/sla/sql-database/v1_4/

There are two high-availability architectural models that are used in Azure SQL Database:

  • Standard availability model
  • Premium availability model

Standard availability model (Basic, Standard, and General Purpose service tier availability)

Standard availability model separates compute and storage. It relies on high availability and reliability of the remote storage tier (Azure Premium Storage Disks). This architecture targets budget-oriented business applications that can tolerate some performance degradation during maintenance activities.

This model has two layers:

  1. A stateless compute layer (Active compute nodes):
  • It runs the sqlservr.exe process.
  • Contains only transient and cached data, such as TempDB, model databases on the attached SSD, and plan cache, buffer pool, and column store pool in memory.
  • Operated by Azure Service Fabric that
    • initializes sqlservr.exe,
    • controls health of the node and
    • performs failover if necessary.
  • A stateful data layer (Azure Storage accounts):
  • The database files (.mdf/.ldf) are stored in remote Azure Blob storage its a Page Blob and file path is an HTTPS URL.
  • This Azure blob storage is set with LRS so 3 copies are made available with thing the same data centre for redundancy feature.
  • The data file are preserved even if SQL Server process crashes.
  • The TempDB database is not using Azure Premium Storage, it is located on the local SSD storage.

During the database engine or the operating system is upgraded, or a failure is detected, Azure Service Fabric will move the stateless SQL Server process to another stateless compute node so the data in Azure Blob storage is not affected by the move, and the data/log files are attached to the newly initialized SQL Server process. This process guarantees 99.99% availability, but a heavy workload may experience some performance degradation during the transition since the new SQL Server instance starts with cold cache.

Premium availability model (Premium and Business Critical service tier availability)

  • This model integrates both computing resources (SQL Server Database Engine process) and storage (locally attached SSD) on a single node and it replicates both computing and storage to additional nodes creating a three to four-node cluster.
  • It’s used for mission-critical applications with high IO performance, high transaction rate and guarantees minimal performance impact to your workload during maintenance activities.
  • The database files (.mdf/.ldf) are placed on the attached SSD storage to provide very low latency IO to your workload.
  • The cluster has a single primary replica (SQL Server process) for read-write workloads, and up to three secondary replicas (compute and storage) containing copies of data. An extra feature on this the premium availability model is the ability to redirect read-only SQL connections to one of the secondary replicas with additional compute capacity at no extra charge. This feature is called Read Scale-Out.

Zone redundant configuration

Premium availability model, by default the cluster of nodes database is created on the same data center. We can place different replicas to different availability zones in the same region to eliminate a single point of failure. As shown below, the control ring is duplicated across multiple zones as three gateway rings (GW) and the routing is controlled by Azure Traffic Manager (ATM). Having the database replicas in different datacentres increases the network latency and the commit time, hence it slightly impacts the performance of some OLTP workloads.


Posted

in

by

Tags: