skip navigation

Düber Instructions

Dedupe menu Düber uses a wizard to help you quickly deduplicate your list. To start the wizard click on a cell in your list and then click the Dedupe entry on the Data menu.

There are four simple steps:

Following step 4 Düber will copy the results onto a new worksheet leaving your original list unchanged. For example, the results of deduplicating a list in worksheet 'Sheet1' will be copied to 'Sheet1 results'.

If you are removing duplicates and wish to keep the discarded records then these will also be copied to a new worksheet. For example: 'Sheet1 discards'.

1-Click Deduplication

Düber remembers your deduplication settings for each unique list you process. This means that if you repeatedly deduplicate the same data you can simply click Finish. The wizard will finish and processing will begin using your remembered settings. This is called 1-Click Deduplication.

Step 1

Step 1 Start by selecting the fields that make up the unique key in your list. You must select at least one field.

To select or clear a field click on it. You can also click on one of the buttons below the list.

Records have a header row

Check this if your list has a header row, otherwise clear it.

When you have defined the unique key click Next to continue to Step 2. If you are happy with the current settings click Finish to exit the wizard and start deduplicating.

Step 2

Step 2 Select matching options and operation.

Matching options determine how records will be compared to see if they are duplicates. With no matching options selected an exact match will be performed. An exact match means that the unique key values of two records must be identical to be flagged as a duplicate.

If you want to match non-exact records then you must choose one or more of the matching options. This is called 'fuzzy matching'.

Ignore Case

Upper and lower case letters are treated as identical. 'DUBER' would match 'duber' or 'Duber'.

Trim Whitespace

Removes leading and trailing spaces around each field. ' Duber ' would match 'Duber'.

Remove accents from characters

Treats accented characters the same as non-accented. 'Düber' would match 'Duber'.

Ignore characters that aren't letters or digits

Only compares letters or digits in each field. 'duber-help@lazyslug.com' would match 'duberhelplazyslugcom'

Any of these four options can be used together and it does not affect performance. Matching using these options is known as 'fast fuzzy matching'.

Use full fuzzy matching 75%

Turns on full fuzzy matching. Fuzzy matching identifies records that are close to one another but not exactly the same; for example: 'Duber' and 'Doober'. This is a powerful feature that allows you to remove duplicates in data that contains typos, spelling mistakes and so on.

Use the slider to determine how close two records have to be before they are considered a match.

Single field threshold 50%

If you have chosen more than one field in the unique key then you can specify that each field compared between two records must meet the threshold you specify. For example, you may decide that the overall fuzzy match threshold is 75% but each individual field must be a 50% match.

Operation

The results of matching are a set of unique records and potentially many sets of duplicate records. You must specify an operation to perform on the duplicate sets.

Remove duplicates

Removes duplicate records from each set. The default is to pick the first record from each duplicate set to be retained and to discard the others.

Save discarded records

Records that are discarded as duplicates are copied onto a separate worksheet.

Fix duplicates

Keeps all of the records and copies the unique key values from the master record in each duplicate set to the other records in that set. The default master record is the first one in each duplicate set.

Click Next to continue or click Finish to deduplicate using the current settings. You can also click Back to change the unique key.

Step 3

Step 3 Automatic result selection

Controls how Düber selects a record to be retained in each duplicate set (if the operation is Remove Duplicates) or to be made the group master (if the operation is Fix Duplicates).

The default is that the first record in each duplicate set is chosen. You can override using one of the following options:

Result options

You can set the following additional options.

When no duplicates found

This determines what happens if no duplicate records are found in your list.

Result sheet column widths

This allows you to control the width of the columns on the results and discards sheets.

Create audit sheet

An audit sheet will be created showing you which records were duplicates. The audit for deduplicating a list on 'Sheet1' will be on a worksheet called 'Sheet1 audit'.

Confirmation

These options determine when you will prompted for confirmation of an action.

Ask for confirmation if processing cancelled

Fuzzy matching large lists can take some time. By enabling this option you will be asked to confirm before stopping if you click Cancel.

Ask to continue with partial results if fuzzy match cancelled

If you cancel during fuzzy matching then you will be given the option to continue to the next step with the duplicates found so far.

Ask for confirmation before overwriting results.

If a result sheet already exists then you will be asked for confirmation before overwriting it with new results.

Click Next to continue or click Finish to exit the wizard and deduplicate with the current settings. You can also click Back to change other settings.

Step 4 (remove duplicates)

Step 4 - remove duplicates Review and amend results.

Your list has been deduplicated using your chosen settings and a number of duplicate sets have been found. A duplicate set contains records that were judged to be duplicates. This dialog lets you review and amend the results.

Automatic record selection

These buttons let you select records in every duplicate set. Records that are selected are kept and those that are not are discarded.

Display

Select which fields of each record are to be displayed.

Duplicate Record Set

Displays a list of records in a duplicate set. You can switch between duplicate sets by using the scroll bar.

Change which records are selected clicking on them or by using the Select All, Clear All and Invert buttons.

Click Finish to save the results or Back to change the settings.

Click Options to change how the records are displayed.

Step 4 (fix duplicates)

Step 4 - fix duplicates Review and amend results.

Your list has been deduplicated using your chosen settings and a number of duplicate sets have been found. A duplicate set contains records that were judged to be duplicates. This dialog lets you review and amend the results.

Automatic record selection

These buttons let you select records in every duplicate set. Records that are selected are kept and those that are not are discarded.

Display

Select which fields of each record are to be displayed.

Duplicate Record Set

Displays a list of records in a duplicate set. You can switch between duplicate sets by using the scroll bar.

Change the master record by selecting it and then checking the 'Is group master?' checkbox. Master records are indicated with an asterisk (*).

If you decide that this duplicate set contains more than one group of records then you can split the set into multiple groups. To move a record to a new group select it and then use the group drop-down.

Click Finish to save the results or Back to change the settings.

Click Options to change how the records are displayed.

Display Options

Display options Result display options.

Display custom fields

Select the fields you want in the duplicate set display when 'Custom fields' is selected.

Show match percentage

Add a match percentage column to the duplicate set display. This is selected by default if fuzzy matches were made.

Show record number

Show the position where each record was located in the original list. Note that this is not the worksheet row number: the first record in your list will be 1, the second 2 and so on.

Only show sets with fuzzy matches

Choose whether to display all duplicate sets or only those containing fuzzy matches. This is selected by default if fuzzy matches were made.