The MySQL 5.7 Optimizer Challenge

In the MySQL team, we have been working really hard on refactoring the optimizer and improving the cost model. The hacks of storage engines lying to the optimizer are being rolled back, and your chances of getting an optimal query plan should now be much higher than in prior releases of MySQL.

The optimizer team has also allowed cost constants to be configurable on both a server and a storage engine basis, and we are confident that the default InnoDB engine will always work “as good as MyISAM” (which has a natural advantage, in that the optimizer was originally largely built around it.)

Today, I want to issue a challenge:

Find an example where the optimizer picks the wrong execution plan for InnoDB tables but is correct for MyISAM. If you can demonstrate a reproducible testcase, I have a polo with MySQL 5.7 Community Contributor on it waiting for you.

The supplies of this special edition t-shirt are limited, but I will ship it to you anywhere in the world 🙂

The MySQL 5.7 Community Contributor Polo, as modeled by Daniël van Eeden.
The MySQL 5.7 Community Contributor Polo, as modeled by Daniël van Eeden. I’m the guy on the left.

MySQL 5.7.8 – Now featuring super_read_only and disabled_storage_engines

I wanted to highlight two new features that are making their way into MySQL 5.7 via the not-yet-released 5.7.8-rc2:

  • A new system variable super_read_only allows a more strict definition of ‘read-only’ which also applies to super users.
  • A new disabled_storage_engines setting offers a way to prevent an enumerated list of storage engines from being used. For example, a DBA may wish to enforce an InnoDB-only policy to simplify common operations such as backups, but it’s possible MyISAM may sneak back in via new code-deployments. This setting allows more active enforcement.

These features are the fruits of our engineering team meeting with our users at Percona Live this year. Thank you to Percona for once again hosting a great conference, and in particular thank you to @isamlambert (and the GitHub Engineering team), @John_Cesario, @denshikarasu & Rob Wultsch for specifically requesting these two features 🙂

Proposal to deprecate INSERT and REPLACE alternative syntax

In the MySQL team we are currently considering a proposal to deprecate a number of alternative syntax uses with the INSERT and REPLACE commands. To provide examples:

  `Name` char(35) NOT NULL DEFAULT '',
  `CountryCode` char(3) NOT NULL DEFAULT '',
  `District` char(20) NOT NULL DEFAULT '',
  `Population` int(11) NOT NULL DEFAULT '0',
  KEY `CountryCode` (`CountryCode`),
  CONSTRAINT `city_ibfk_1` FOREIGN KEY (`CountryCode`) REFERENCES `Country` (`Code`)
 Name='NewCity', CountryCode='CAN',District='MyDistrict',Population=1234;
INSERT INTO city (Name,CountryCode,District,Population) VALUE
 ('NewCity2', 'CAN', 'MyDistrict', 1234);
INSERT city (Name,CountryCode,District,Population) VALUES
 ('NewCity3', 'CAN', 'MyDistrict', 1234);
REPLACE INTO city (Name,CountryCode,District,Population) VALUE
 ('NewCity4', 'CAN', 'MyDistrict', 1234);
REPLACE city (Name,CountryCode,District,Population) VALUES
 ('NewCity5', 'CAN', 'MyDistrict', 1234);

To summarize these queries:

  • INSERT using the SET syntax.
  • INSERT and REPLACE using the keyword VALUE instead of VALUES
  • INSERT and REPLACE without the keyword INTO

Our rationale for this proposal is as follows:

  • Having a number of very similar ways of completing the same task makes it very difficult for training, documentation and support. To explain this in more detail: in our manual we have always tried to document every option that the server will accept, but with no functional difference between the options, this makes the content more verbose and clumsy to read.

    MySQL usage becomes cleaner by stating which usage is explicitly preferred, even if the old syntax remains supported for the legacy use-case.

  • The syntax is non-standard and complicates our parser. While it may take some time before we are able to remove these options, by starting the deprecation cycle now we can provide application authors with as much notice as possible.

Our proposed plan is to deprecate the syntax starting with MySQL 5.7. We will assess feedback from our users before targeting a version for syntax removal.

Will you be affected by this change? Please leave a comment, or Get in touch! We’d love to hear from you.

Proposal to deprecate MySQL INTEGER display width and ZEROFILL

In the MySQL team we are currently discussing if we should deprecate the integer display width in numeric types. For example:

CREATE TABLE my_table (
 id INT(11) NOT NULL PRIMARY KEY auto_increment

The (11) does not affect the storage size of the data type, which for an INT will always be 4 bytes. It affects the display width.
Our rationale for proposing this should be deprecated is that it is a common source of confusion amongst users.

We are also discussing deprecating the non-standard ZEROFILL type attribute, which is the only modern consumer of this display width meta data. For example:

CREATE TABLE my_table (
 id INT(11) ZEROFILL NOT NULL PRIMARY KEY auto_increment
INSERT INTO my_table VALUES (1);
mysql> SELECT * FROM my_table;
| id          |
| 00000000001 |
1 row in set (0.00 sec)

StackOverflow has a good example of how ZEROFILL is useful:

[..] In Germany, we have 5 digit zipcodes. However, those Codes may start with a Zero, so 80337 is a valid zipcode for munic, 01067 is a zipcode of Berlin.

As you see, any German citizen expects the zipcodes to be displayed as a 5 digit code, so 1067 looks strange.


This usage is true for any numeric values that require leading zeros, such as some phone numbers.

Upgrade Paths

There are two possible upgrade paths to migrate away from ZEROFILL.

Option #1 – Move to CHAR/VARCHAR

This option is the most transparent for applications, and changes the data type to be a string instead of numeric. For example:

CREATE TABLE my_zip_codes (
 id INT NOT NULL PRIMARY KEY auto_increment,
 zip_code INT(5) ZEROFILL
INSERT INTO my_zip_codes (zip_code) VALUES ('01234'), ('54321'), ('00123'), ('98765');
mysql> select * from my_zip_codes;
| id | zip_code |
|  1 |    01234 |
|  2 |    54321 |
|  3 |    00123 |
|  4 |    98765 |
4 rows in set (0.00 sec)
ALTER TABLE my_zip_codes CHANGE zip_code zip_code CHAR(5);

In the case of a CHAR(5) the storage requirements will only be one byte higher than that of an integer. In the case of other data types (phone numbers requiring leading zeros) it might be slightly more efficient to store as an integer.

Option #2 – Format integers at a different layer

This option retains the storage efficiency of an integer, but moves the presentation into the application. For example:

CREATE TABLE my_zip_codes (
 id INT NOT NULL PRIMARY KEY auto_increment,
 zip_code INT(5) ZEROFILL
INSERT INTO my_zip_codes (zip_code) VALUES ('01234'), ('54321'), ('00123'), ('98765');
ALTER TABLE my_zip_codes CHANGE zip_code zip_code INT;
mysql> select * from my_zip_codes;
| id | zip_code |
|  1 |     1234 |
|  2 |    54321 |
|  3 |      123 |
|  4 |    98765 |
4 rows in set (0.00 sec)

It will also technically be possible to retrofit this into legacy applications that require ZEROFILL presentation returning from MySQL. This can be done with a query rewrite plugin to modify SELECT statements to add padding:

mysql> SELECT id, LPAD(zip_code, 5, '0') as zip_code FROM my_zip_codes;
| id | zip_code |
|  1 | 01234    |
|  2 | 54321    |
|  3 | 00123    |
|  4 | 98765    |
4 rows in set (0.01 sec)


We are seeking feedback from the community in response to this proposal. If you have found the existing behavior confusing, or will be affected by the removal of zero fill, please leave a comment or get in touch! We would love to hear from you.