Regular Expressions
AKA Regex
September 8, 2014
According to Wikipedia a regular expression (abbreviated regex or regexp) and sometimes called arational expression[1][2] is a sequence of characters that forms a search pattern, mainly for use in pattern matching with strings, or string matching, i.e. "find and replace"-like operations.
These can be used to represent alternatives in strings such as locating the same word spelled two different ways in a text editor, for example the regular expression seriali[sz]e matches both "serialise" and "serialize". A "match" is the piece of text, or sequence of bytes or characters that pattern was found to correspond to by the regex processing software.
Ruby and Javascript both have built in regex capabilities whereas other popular languages like Java, Python, and C++ rely on separate libraries.
Common Example of A Regex
\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b is a more complex pattern that describes an email address. You can use a pattern like this to search through a text file to find an email address or verify if a particular string looks like an email address, or even check in a form field that an email address is formatted correctly in order to raise an error.
Breaking Down the Email Regex
Although regex syntax can be intimidating the site Regexbuddy.com gives a brilliant breakdown of what various parts of a regex expression mean. Take our email example. The following breakdown describes what each piece of the regex is doing:
\b[A-Z0-9._%+-]++@[A-Z0-9.-]+\.[A-Z]{2,6}\b
\b : Assert position at a word boundary
[A-Z0-9._%+-] : Match a single character out of the list: one of the characters "._%+-", or in the range between A and Z, or in the range between 0 and 9
++ : Between one and unlimited times, as many times as possible, without giving back (possessive)
@ : Match the character "@" literally
[A-Z0-9.-] : Match a single character out of the list: one of the characters ".-", or in the range between A and Z, or in the range between 0 and 9
+ : Between one and unlimited times, as many times as possible, giving back as needed (greedy)
\. : Match the character "." literally
[A-Z] : Match a single character in the range between A and Z
{2,6} : Between 2 and 6 times, as many times as possible, giving back as needed (greedy)
\b : Assert position at a word boundary
Why use regex?
Using regex will help your software run much faster and reduce your development time. Regex only takes one line of code to check the existence of an email address versus the many lines that a plain text search algorithm can take.
Regexes are ideal solutions for searching, texting processing, and validating data. For example, if you have a search box in which a user can type in their search, you would want to account for any possible misspellings. Using regex, for a word such as separate, you could take care of it and all of its common misspellings by using the regex s[ae]p[ae]r[ae]te.
References:
http://en.wikipedia.org/wiki/Regular_expression
http://www.regular-expressions.info/tutorial.html
http://www.regexbuddy.com/regex.html