As part of the ongoing effort to refactor MySQL and make it better, we are defining the behaviour of certain features and in some cases reimplimenting them in a more maintainable fashion.
I have previously blogged on:
* A proposal to simplify SQL_MODE options.
* The Federated Storage Engine.
* The Query Cache.
* The deprecation of EXPLAIN PARTITIONS and EXPLAIN EXTENDED.
* The deprecation of NULL synonym \N.
Today what I want to write about, is the IGNORE
option in ALTER TABLE
. From the MySQL manual:
“IGNORE is a MySQL extension to standard SQL. It controls how ALTER
TABLE works if there are duplicates on unique keys in the new table
or if warnings occur when strict mode is enabled. If IGNORE is not
specified, the copy is aborted and rolled back if duplicate-key
errors occur. If IGNORE is specified, only the first row is used of
rows with duplicates on a unique key. The other conflicting rows are
deleted. Incorrect values are truncated to the closest matching
acceptable value.”
This creates several issues for the MySQL server team:
IGNORE
could remove rows from a parent table when using a foreign key relationship.IGNORE
makes it impossible to use InnoDB Online DDL for several operations, for example adding aPRIMARY KEY
orUNIQUE INDEX
.IGNORE
has some strange side-effects for replication. For example: DDL is always replicated via statement-based replication, and since SQL does not imply ordering, it’s not clear which rows will be deleted as part of the ignore step. I also see cross-version replication problematic if future MySQL versions were to introduce more strictness, since a slave may de-duplicate more rows.
The most common case
We believe that the most common use case for IGNORE
is to be able to add a UNIQUE INDEX
on a table which currently has duplicate values present. i.e.
ALTER IGNORE TABLE users ADD UNIQUE INDEX (emailaddress);
In this scenario, a novice user manages to avoid auditing each entry in the users table, and simply lets MySQL pick a row to be kept, with all duplicates automatically removed.
There are two other ways to be able to do that:
Hand removal
Using the same an example as above, return a list of email addresses and PRIMARY KEY
values for records that conflict:
SELECT GROUP_CONCAT(id), emailaddress, count(*) as count FROM users
GROUP BY emailaddress HAVING count >= 2;
/* delete or merge duplicate from above query */
ALTER TABLE users ADD UNIQUE INDEX (emailaddress);
Note: This method will be the fastest way, since when not using IGNORE
, MySQL is able to use InnoDB’s Online DDL.
New table + INSERT IGNORE
While this method looks very similar, internally it’s semantics are quite different:
CREATE TABLE users_new LIKE users;
ALTER TABLE users ADD UNIQUE INDEX (emailaddress);
INSERT IGNORE INTO users_new SELECT * FROM users;
DROP TABLE users;
RENAME TABLE users_new TO users;
By creating a table first, the MySQL server will not have to manage rows in a foreign key relationship. The rows will also be re-sent to the slave using row-based replication, so issue (3) I mentioned above does not come into play.
Conclusion
We are looking for feedback on how you use this feature.
Is there another common use-case we are not accounting for?
Would it be acceptable to add a feature to MySQL Workbench to assist in preparing a table for adding a unique index (similar to hand removal)?
Help shape the future of MySQL. Please leave a comment or contact me.