mysql - Find rows that have 3 out of 5 fields in common - how to speed up query? -
The questions below work great but are slow to execute around 7500 lines in a table of approximately 30 Decade takes place. How can I raise it?
The goal is to find "almost duplicate" rows in the same table when we get 3 out of 5 fields, then we have a hit.
SELECTArealTable.id, the original Table.lastname, originalTable.firstname, originalTable.address, originalTable.city, as the original from the originalTable.email address, address as address, the table WHERE # The same record is not original Table.id! = CompareTable.id and # should match at least 3 of those 5 (Basic Table.firstname = compareTable.firstname) + (OriginalTable.lastname = compareTable Last Name) + (Original Reader.Pattern = comparisonTable.Edress and BasicTable.Edress! = '') + + (Original Tablet.City = ComparisonTable.City and MentalTable.City! = '') A + (Original Table.Email = CompareTab.Email and Original Table.email! = '') = 3 GROUP Original Table.lastname ACC by the original Table.id, thanks to any optimization signal by Basic Table.firstname SEC, Basic Table.city asc .
A cartesian product is needed here, true to cap cap I came up with the following solution: < / P>
create table address_dumps (index_duplicate) engine = Select memory OriginalTable.id, comparison tab.id, ((original documentation.First name = comparison tab.firstname) + (original Table.lastname = CompareTable.lastname) + (originalTable.address = compareTable.address and originalTable.address! = '') + (Original Table.city = ComparisonTable.City and Original Table.city! = '') + (OriginalTable.email = Comparison Table. Email and Original Table.email! = '') & Gt; = 3) AS_Duplic Address at the original table as ate, find the address as the right WHERE Original Table.id! = CompareTable.id; SELECT * FROM address_dups WHERE is_duplicate = 1; This will also request you to each line ID as the fuzzy duplicate line id.
Comments
Post a Comment