During teaching a SQL class I was asked how to delete duplicated records from a table without having a primary key. Since they used Advantage Database Server, the answer was simple: Use the virtual column ‚ROWID‘ as primary key.
For simplicity, let’s start with a table containing only two columns.

CREATE TABLE customer( PreName CHAR(30), LastName CHAR(30));

Then add a couple of records. As you can see, Peter Parker and Homer Simpson are duplicated, and Bruce Wayne even tripled.

INSERT INTO customer VALUES
  ('Peter','Parker'),   ('Peter','Parker'),
  ('Robin','Hood'),
  ('Homer', 'Simpson'), ('Homer', 'Simpson'),
  ('Bruce','Wayne'),    ('Bruce','Wayne'),    ('Bruce','Wayne');

SELECT PreName, LastName FROM customer ORDER BY PreName, LastName;

You can get the number of duplicates using a COUNT aggregation on the two fields.

SELECT PreName, LastName, COUNT(*) FROM customer
GROUP BY PreName, LastName

Most tutorials now use a primary key column to identify the duplicates for deletion. In our case, we use the virtual ROWID. Let’s take a look at this column. ROWID returns an 18 characters string containing a Base64 encoded unique identifier of a row within a table.

SELECT ROWID, PreName, LastName FROM customer
ROWID PreName LastName
BKkxSOAAAA/aAAAAAB Peter Parker
BKkxSOAAAA/aAAAAAC Peter Parker
BKkxSOAAAA/aAAAAAD Robin Hood

The first 8 characters of the string represent the database identifier, the middle 8 characters the table identifier, and the last 8 characters the encoded physical record number.
Let’s select only DISTINCT records having the lowest ROWID.

SELECT MIN(ROWID), PreName, LastName FROM customer
GROUP BY PreName, LastName

And then delete all records, that ar not part of our selection.

DELETE FROM customer WHERE ROWID NOT IN
(SELECT MIN(ROWID) FROM customer GROUP BY PreName, LastName)

As you can see, this is quite simple. If you need more columns to identify your duplicates, just add them to the GROUP BY clause.
BTW: you can even GROUP BY a function like SOUNDEX or a self written KOELNER_PHONETIC (for German names, see here).

DELETE FROM customer WHERE ROWID NOT IN
(SELECT MIN(ROWID) FROM customer GROUP BY SOUNDEX(TRIM(COALESCE(PreName,''))+TRIM(COALESCE(LastName,''))))
Delete Duplicated Records without having a Primary Key
Markiert in:     

Schreibe einen Kommentar

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert