We are considering enabling the SQL mode ONLY_FULL_GROUP_BY
by default in MySQL 5.7. Here is a quick example to demonstrate how this change will impact you:
Sample Data:
CREATE TABLE invoice_line_items (id INT NOT NULL PRIMARY KEY auto_increment, invoice_id INT NOT NULL, description varchar(100) ); INSERT INTO invoice_line_items VALUES (NULL, 1, 'New socks'), (NULL, 1, 'A hat'), (NULL, 2, 'Shoes'), (NULL, 2, 'T shirt'), (NULL, 3, 'Tie');
Default behaviour in MySQL 5.6 and below:
mysql> SELECT id, invoice_id, description FROM invoice_line_items GROUP BY invoice_id; +----+------------+-------------+ | id | invoice_id | description | +----+------------+-------------+ | 1 | 1 | New socks | | 3 | 2 | Shoes | | 5 | 3 | Tie | +----+------------+-------------+ 3 rows in set (0.00 sec)
Proposed default behaviour in MySQL 5.7:
mysql> SELECT id, invoice_id, description FROM invoice_line_items GROUP BY invoice_id; ERROR 1055 (42000): 'test.invoice_line_items.id' isn't in GROUP BY
That is to say that in 5.7 this statement will produce an error.
Notice that columns id
and description
are not actually unique? Roland Bouman has a blog post from 2007 that describes what MySQL is doing here, and how it is different in behaviour to other databases.
To summarize: MySQL is picking one of the possible values for id
and description
and the query executes in a non deterministic way. Here’s some more information to prove this:
mysql> SELECT id, invoice_id, description, max(description) FROM invoice_line_items GROUP BY invoice_id; +----+------------+-------------+------------------+ | id | invoice_id | description | max(description) | +----+------------+-------------+------------------+ | 1 | 1 | New socks | New socks | | 3 | 2 | Shoes | T shirt | | 5 | 3 | Tie | Tie | +----+------------+-------------+------------------+ 3 rows in set (0.00 sec) mysql> SELECT GROUP_CONCAT(id), invoice_id, GROUP_CONCAT(description) FROM invoice_line_items GROUP BY invoice_id; +------------------+------------+---------------------------+ | GROUP_CONCAT(id) | invoice_id | GROUP_CONCAT(description) | +------------------+------------+---------------------------+ | 1,2 | 1 | New socks,A hat | | 3,4 | 2 | Shoes,T shirt | | 5 | 3 | Tie | +------------------+------------+---------------------------+ 3 rows in set (0.00 sec)
What is problematic about this behaviour is that it can hide bugs in application code. To highlight two blog posts:
- Ike Walker @ Flite explains why he only uses full group by.
- Keyur Govande @ Etsy strongly recommends STRICT_ALL_TABLES and ONLY_FULL_GROUP_BY to avoid data corruption.
The Proposal
We are proposing to change ONLY_FULL_GROUP_BY
to be enabled by default as of MySQL 5.7. The downside in doing this, is that some users upgrading will have statements return errors. These users will be left with two options:
- Set
sql_mode
in their my.cnf (or my.ini) file explicitly, or as part of their application code since sql_mode is configurable per session. - Modify the offending
GROUP BY
statements to be deterministic.
On the second point, we are also proposing introducing an ANY_VALUE()
SQL function to allow statements that are ‘acceptable as non-deterministic’ to run easy retrofit in this less permissive configuration. For example:
mysql> SELECT ANY_VALUE(id) AS id, invoice_id, ANY_VALUE(description) AS description FROM invoice_line_items GROUP BY invoice_id; +------+------------+-------------+ | id | invoice_id | description | +------+------------+-------------+ | 1 | 1 | New socks | | 3 | 2 | Shoes | | 5 | 3 | Tie | +------+------------+-------------+ 3 rows in set (0.01 sec)
Why I personally like this proposal, is that I compare it to buying an undercooked burger in a North American restaurant. We are proposing to switch our policy from “we may serve it to you without you having realized you ordered it” to “sign a waiver, then we will serve it”.
i.e. We are not removing options, but want you to have to opt-in to what can be an unsafe choice, rather than opt-out.
Fans of standard SQL will rejoice at compatibility with SQL-2003, a standard defined after this behavior was first implemented.
Conclusion
As with other changes that have been proposed for MySQL 5.7, we are soliciting your feedback!
- Have you accidentally returned incorrect data, and could have benefited from having
ONLY_FULL_GROUP_BY
enabled? - Do you maintain an application that relies on the non deterministic behaviour?
- Do you agree or disagree that this change is the better default for users?
Please leave a comment, or get in touch!
Update: This proposal has gone ahead. MySQL 5.7 DMR5 has ONLY_FULL_GROUP_BY enabled by default!
Pingback: MySQL, GROUP BY, Select lists, and Standard SQL | SQL And Its Sequels()