The DataSlave Name Split Function

Thoughts, comments and discussion from Baycastle staff and partners

Moderators: Tom, ian

The DataSlave Name Split Function

Postby ian » Mon Jan 27, 2014 5:03 pm

A common problem is the name of customers or prospects in a single field. Take a look at the sample list of contact names shown below:

  • Dr Ian Lawrence Manning PhD
  • Miss Jane Manning
  • Dr De La Clusa
  • John Smith
  • Ms Rosie Parks PhD
  • Ms Jane Rosie Johnson
  • Miss Van Smith MD
  • Dr Rosie Molly Van Percie
  • Prof John Smith FRSC
To use this data, for example in a marketing campaign, you need to split it into the constituent parts.

This is required because the address at the top of a letter may start with: Dr Ian Manning

but the greeting should be: Dear Dr Manning or Dear Ian

This splitting of the contact names is easily done by a human. The logic may be obvious, but I will spend a few moments developing an understanding of why we humans can do this so quickly. The steps we take are as follows:

  • See if the first field is a title. We compare with our mental list of allowed values.
  • See if the last field is a set of letters like MD, PhD, BA and so on. These are actually called Post Nominal Letters. We compare with a mental list of allowed values and also with some
  • intelligent guesses. For example PGeo is a valid Post Nominal Letter. You were unlikely to know that, but you knew it would be a strange Surname so would assume it is the former.
  • Next split out the Surname for example recognising De La Clusa as a surname. More accurately recognising De La as a valid start to a multi-word surname.
  • Decide that if two names remain that they will be first name and last name and in that order.
The DataSlave Name Split function follows the same logic used by a human when faced with this problem. The key features are:

  • Provide a list of allowed Titles, Family Name Prefixes (e.g De La, Van, De), and Post Nominal Letters. We also need to provide a tool to maintain these lists.
  • Provide a configuration tool to specify how many name fields and which names fields are potentially in the data and if one or more are missing, the order of priority they appear in.
  • For each row extract any matching the allowed Titles.
  • For each row extract any matching the allowed Post Nominal Letters.
  • For each row extract any Family Name Prefixes.
  • Divide the remaining fields into Last Name, First Name and Second Name(s).
Data import should not be a concern. Moving the data is made simple by first-class tools like DataSlave. These tools, often called ETL tools, are simple to use allowing the data quality to be reviewed, issues with the data resolved and then quickly move the data into your new system.

If you have a project under consideration please talk to us and we will help you plan the data migration (data import and data export). We have a depth of experience available for you to access.
ian
 
Posts: 364
Joined: Sat Dec 18, 2004 8:13 am
Location: UK

Return to Baycastle Blog

Who is online

Users browsing this forum: No registered users and 1 guest

cron