Top 10 advances to availability since MySQL 5.5

Recently I found myself talking to a company that was a big user of MySQL, but just hadn’t followed all of the latest developments so closely. When the conversation went to availability I was actually not as prepared as I would have liked, and forgot about some of the (quite large) improvements that have been made to MySQL recently.
So here is me trying to write it all down for the next time I’m in the same situation. Note that I am using the definition of Continuous availability, which means any service disruptions that impact the business not just reduced redundancy.

#1 – InnoDB as default

When: MySQL 5.5
Why:
* Row locks instead of table-locks means that there is less contention and sudden stalls as applications become more loaded (i.e. performance degrades much better).
* InnoDB also features Multiversion concurrency control, which means that queries that read data do not have to set locks – further increasing concurrency and reducing potential stalls.
* If MySQL crashes, InnoDB is able to perform crash recovery very quickly and come back online (it just replays through its log file). For MyISAM crash recovery takes much longer, as the whole table must be examined.

#2 – Crash safe Replication

When: MySQL 5.6
Why:
* Replication slaves are crash safe. This means when a slave crashes you no longer need to re-image its data from the master or another slave. The slave can resume from where it was.
* This results is less time without HA, and reduced operations team overhead.

#3 – Semi-sync replication

When: MySQL 5.5 (will get even better performance in MySQL 5.7).
Why:
* Semi-sync is an option where you can make sure at least one slave has a copy of the data.
* This means less chance of lost transactions if a master fails.

#4 – Improved Group-commit

When: MySQL 5.6
Why:
* This means that transactions being committed near the same time in InnoDB will merge together and write to transaction logs as one operation.
* This is important because it makes making MySQL run in durable mode not result in as big of a performance drop.
* With the addition of changes like this (and fast SSDs) I recommend durability to a lot of users.

#5 – Replication with GTIDs

When: MySQL 5.6
Why:
* Previously it was very difficult when you had a master fail with many slaves. You could not simply pick a new master from one of the slaves and reposition the other slaves under it.
* With GITDs failover and maintenance operations that change the topology are much easier.

#6 – Online DDL

When: MySQL 5.6
Why:
* A large number of DDL operations (such as adding indexes) now do not block other queries reading or writing to the table. The list of which are blocking writes (‘allows concurrent DML’) is also well documented in the manual.
* Syntax is also provided so that if the query can not run without locking, it will refuse to execute. For example: ALTER TABLE my_table ADD INDEX a (a), LOCK=NONE;.

#7 – Improved InnoDB crash recovery

When: MySQL 5.5
Why:
* It was discovered that InnoDB crash recovery time was taking a long time on new servers with large amounts of RAM.
* The performance was greatly improved with a new crash recovery algorithm in MySQL 5.1 (InnoDB plugin edition only) and MySQL 5.5 (by default).
* This is now a very serious issue if you are running an older version of MySQL on some of the hardware available today.

#8 – Improved adaptive flushing

When: MySQL 5.5 (improved in MySQL 5.6)
Why:
* For performance InnoDB delays writes by first writing to a transaction log in the foreground, and then writing to destined page locations in the background.
* In earlier versions of MySQL it is possible that the log files can ‘fill up’ and have no free space available for new modifications. This can result in sudden stalls as the server quickly frees up and makes log space available.
* In MySQL 5.5 an adaptive flushing algorithm is introduced (default: on) to flush pages more aggressively as free space is running low. This feature results in more consistent and stable response times for all queries.

#9 – LRU not victim to side workloads

When: Introduced MySQL 5.5, MySQL 5.6 (by default)
Why:
* When MySQL needs to perform sudden IO (such as in the case of a mysqldump) it is possible that the ‘good’ contents of the cache could be unloaded from memory.
* In MySQL 5.5, the cache algorithm was changed from a classic LRU to a young sublist (default: 63%) and old sublist (default: 37%) so that these expensive side-load queries could have a fixed resource limit applied to them.
* The configuration variable innodb_old_blocks_time was introduced to specify the minimum time in milliseconds that a page must be in the old sublist before it can be promoted to the young sublist. In 5.6 it defaults to 1000.

#10 – MySQL Utilities

When: Independent Release
Why:
* MySQL now releases an official set of MySQL Utilities that provide command-line utilities for maintaining and administering MySQL servers.
* Notably mysqlfailover provides a scripted way to failover when using Replication with GTIDs. MySQL Fabric (Not yet GA; Labs Release Only) provides sharding with High-Availability groups.