Chicks Bit Brick: Deleting duplicate rows in mysql

Thursday, July 26, 2007

Deleting duplicate rows in mysql

I am posting this as a public service to myself on how to delete duplicate rows from a mysql table.
In my case I have a unique column id on each row. And I have an indexed column, call it match, that I use to determine duplication.
In short so I can find it next time what I do is

delete t2 from table1 as t1, table2 as t2 where t1.match = t2.match and t2.id > t1.id;

This works and takes about 2-4 minutes on a table with about 280K rows and 35K duplicates.

I've only tested this with isam tables, I know this can be done other ways and can be gleaned from the mysql documentation, but this way, when I can remember it, seems simplest.

4 comments:

tofindit said...: Hi, Google search brought me here. Could you please change to table as t1, table as t2---we have only one table here. Also, this keeps the oldest of the duplicate data. If we want to keep the newest row, the condition can be changed to t2.id < t1.id (hope I'm not missing something)

Thanks for sharing!; October 9, 2008 at 9:27 PM
Mark Gullett said...: Good stuff. It avoids the use of temporary tables that many people like to use.; April 17, 2009 at 11:47 PM
ksugam said...: good post but does not work for me. I have 56k rows with 300 duplicates.
When i run this query, it keeps on processing for 70-80 mins and i finally have to shutdown my server.

Any idea why this is happening?; June 19, 2009 at 7:06 AM
Curmudgeon said...: Would be nice if you would actually explain your code for us non-experts.; October 13, 2009 at 7:07 AM

Chicks Bit Brick

Thursday, July 26, 2007

Deleting duplicate rows in mysql

4 comments:

Blog Archive

About Me