Posted in June 2012

When dealing with Strings….

Strings are general, strings are tricky. Regular expression is a useful tool when it comes to string manipulation, it also generates problems. Writing your own rules requires extreme precaution. Below are some notes I gathered from dealing with Strings (specifically people data, names, addresses, .etc) at large scale.

1. Have you considered punctuations?

Apostrophes exists in people’s last name, street names and many proper names. So is dash, forward and backward slash. Does your RegEx match these? Should your output normalize them? Continue reading