SQL Remove Duplicate Records Directory
Directory .ZIPThis section contains two example SQL source members suitable for use with RUNSQLSTM. These were developed at the emergency request via e-mail of someone I've never met and I thought others might find them useful... even if they don't fit well with the overall intent of this web site.
The first SQL source member (FINDDUPES) builds four views over an existing table (or DDS physical file). The second SQL source member (KILLDUPES) uses those views to delete duplicate records from the original table (or DDS physical file). Both members use example library, file and field names that must be changed in order to work with actual data.
SQL such as these might be used because you have a file that you are trying to create a UNIQUE logical file over and the CRTLF fails because duplicate keys are found; so, you need to locate and delete the duplicates. You might also have added records by accident or faulty program logic and they need to be taken back out without having a journal or backup available. Running FINDDUPES will give you basic views needed for locating duplicate (or triplicate, etc.) records. Of course, you can build additional views based on these views and/or the original table if you simply need to display or print more info than these views allow. Eventually, you can run KILLDUPES and get rid of the duplicates.
The two members could be consolidated into one, but I believe it's certainly better to keep them separate. This way, FINDDUPES can be run, modified and run again as many times as necessary to get you comfortable that the correct records will be accessed. Further, to handle record keys that are triplicated or worse, KILLDUPES can be run as many times as needed to catch them all without needing to run the base CREATE VIEWs over and over. Once the base views exist, DB2/400 will update them automatically as deletions occur, leaving only rows that are still duplicated (or worse).
The SQL relies entirely on the RRN() function returning the physical record numbers of each record. Assuming such a function exists, these statements should work pretty much the same for any SQL database, not just DB2/400. However, this has ONLY been tested on DB2/400 V4R3. If you take it anywhere else, I'd like to know the result. Please let me know what happens and I'll try to keep this document updated with successes or failures.
Please read the @GENERAL DISCLAIMER document for any items you should be aware of if you download any of these items.
SQLs
FINDDUPES -- Find Duplicate Key Records:
This SQL source member contains a series of DROP VIEW and CREATE VIEW statements. The created views build over a table (or DDS physical file) and each other to expose the physical record numbers (RRN()) of the records in the table.After GROUPing selected key fields, we get a COUNT() of the records within each group. Any group HAVING COUNT(*)>1 is a group indicating duplicate records. This provides us with the key fields we're interested in. By using this GROUPing view along with the original table, we build a second view that includes the selected key fields and the associated RRN()s for each record in a group of duplicates. By using the MAX() function in a third view over the record number column of the second view, we find which record number is highest within a group of duplicated records -- assuming arrival-sequence these will be the last records added to the groups. (By using MIN(), we'd get the first or original records.) The fourth view simply presents the MAX(RRN()) column as an actual column, making it easy to access in the DELETE statement in KILLDUPES. Records are deleted from this fourth view, but SQL knows to actually delete the records in the underlying table.
The provided source uses the example library name MYLIB. The assumed original table name is MyFile. The source assumes three selected key fields in the original table: OrgKFld1, OrgKFld2 and OrgKFld3. If less than three key fields exist, then use only as many as are needed to provide uniqueness; likewise, if more than three are needed to determine the UNIQUE level you want, add as many as you need.
The DROP VIEWs are only useful when FINDDUPES is run more than once. The first time FINDDUPES runs, the DROP VIEWs will fail resulting in a RUNSQLSTM ERRLVL(30) error. Therefore, use ERRLVL(30) when FINDDUPES is run at least until the views are all created. Also be careful with the RUNSQLSTM COMMIT() parameter. If you are working with a DDS physical file that isn't journalled, you might use COMMIT(*NONE). Other COMMIT() values depend on your environment and it's up to you to decide what's appropriate.
Naturally, your best course is to create a duplicate of your original table and run these statements over the duplicate. If the result satisfies you, either point the statements at your original table and run them again or copy the remaining records from the duplicate table back into the original.
KILLDUPES -- Delete located Duplicate Key Records:
This SQL source member DELETEs rows from the final view created by FINDDUPES. Because MAX(RRN()) is exposed in this view as a column rather than a function, the view is perfectly acceptable for DELETE. Also, the column makes it simple to use the MAX(RRN()) value as the basis for the subselect that identifies which record numbers we're interested in.KILLDUPES can be run as many times as necessary to delete multiple occurrences of selected keys, not just duplicates. When it reports that zero rows were deleted, no duplicates of the original selected keys exist. The created views can also be used to determine if duplicates still exist; e.g., when view DUPV contains no rows, there are no duplicates of the selected keys.