Deduping part II

I am back working on data selections, and improving the health of the mailing list again today. I’m finding relatively few duplicates, and the ones that I do find are broadly down to one of four reasons:

  • Two or more people who genuinely share the same address (can be either home or place of work)
  • Spelling mistakes in the name and/or address
  • Differences in the name provided eg using different initals, giving a middle name in one record and not the other.
  • Differences in the address provided (or it’s layout) – for example giving a house name in one record, and a house number in another.

It’s a tricky one because we want to send two mailing pieces, where we have two different customers at the same workplace, but on the otherhand we want to save money by not sending more than one piece to two family members at the same address. Using other data, like phone numbers and email addresses does give some clues as to whether two records are the same customer, or members of the same family group, and I’m using this to help supress records, and choosing just the best record to mail.

I will hand over an output of data to our customer services team to manually check some of the records and correct where possible – particularly where it’s likely to be spelling mistakes, or the same customer but with the data in a different format.

Not the most exciting work, but if it saves money off the bottom line 🙂 and the coding and logic is a bit of a challenge.