As part of the ongoing effort to refactor MySQL and make it better, we are defining the behaviour of certain features and in some cases reimplimenting them in a more maintainable fashion.

I have previously blogged on:
* A proposal to simplify SQL_MODE options.
* The Federated Storage Engine.
* The Query Cache.
* The deprecation of EXPLAIN PARTITIONS and EXPLAIN EXTENDED.
* The deprecation of NULL synonym \N.

Today what I want to write about, is the IGNORE option in ALTER TABLE. From the MySQL manual:

“IGNORE is a MySQL extension to standard SQL. It controls how ALTER
TABLE works if there are duplicates on unique keys in the new table
or if warnings occur when strict mode is enabled. If IGNORE is not
specified, the copy is aborted and rolled back if duplicate-key
errors occur. If IGNORE is specified, only the first row is used of
rows with duplicates on a unique key. The other conflicting rows are
deleted. Incorrect values are truncated to the closest matching
acceptable value.”

This creates several issues for the MySQL server team:

IGNORE could remove rows from a parent table when using a foreign key relationship.
IGNORE makes it impossible to use InnoDB Online DDL for several operations, for example adding a PRIMARY KEY or UNIQUE INDEX.
IGNORE has some strange side-effects for replication. For example: DDL is always replicated via statement-based replication, and since SQL does not imply ordering, it’s not clear which rows will be deleted as part of the ignore step. I also see cross-version replication problematic if future MySQL versions were to introduce more strictness, since a slave may de-duplicate more rows.

The most common case

We believe that the most common use case for IGNORE is to be able to add a UNIQUE INDEX on a table which currently has duplicate values present. i.e.

ALTER IGNORE TABLE users ADD UNIQUE INDEX (emailaddress);

In this scenario, a novice user manages to avoid auditing each entry in the users table, and simply lets MySQL pick a row to be kept, with all duplicates automatically removed.

There are two other ways to be able to do that:

Hand removal

Using the same an example as above, return a list of email addresses and PRIMARY KEY values for records that conflict:

SELECT GROUP_CONCAT(id), emailaddress, count(*) as count FROM users 
GROUP BY emailaddress HAVING count >= 2;

/* delete or merge duplicate from above query */

ALTER TABLE users ADD UNIQUE INDEX (emailaddress);

Note: This method will be the fastest way, since when not using IGNORE, MySQL is able to use InnoDB’s Online DDL.

New table + INSERT IGNORE

While this method looks very similar, internally it’s semantics are quite different:

CREATE TABLE users_new LIKE users;
ALTER TABLE users ADD UNIQUE INDEX (emailaddress);
INSERT IGNORE INTO users_new SELECT * FROM users;
DROP TABLE users;
RENAME TABLE users_new TO users;

By creating a table first, the MySQL server will not have to manage rows in a foreign key relationship. The rows will also be re-sent to the slave using row-based replication, so issue (3) I mentioned above does not come into play.

Conclusion

We are looking for feedback on how you use this feature.

Is there another common use-case we are not accounting for?

Would it be acceptable to add a feature to MySQL Workbench to assist in preparing a table for adding a unique index (similar to hand removal)?

Help shape the future of MySQL. Please leave a comment or contact me.

The future of ALTER IGNORE TABLE syntax

The most common case

Hand removal

New table + INSERT IGNORE

Conclusion

Published by

morgo