Duncan’s blog

September 11, 2012

Regex to find 6 repeating characters

I wanted to check our codebase for where we had six repeating characters in HTML and CSS for colours, e.g. #FFFFFF for white.  Here’s the regular expression for it, basically blogged here so I can remember where to find it later:

([a-fA-F0-9])\1{5}

So [a-fA-F0-9] looks for a single character that is one of the letters A – F (only need to go up to F because we’re dealing with hexadecimal numbers), or the digits 0 – 9.

The ( ) parentheses around that turns it into a backreference.

The \1 then refers to that backreferenced matched, and the {5} says to match it exactly 5 times.

So it’ll find where there’s a single matching character that is then repeated five times.

What I was doing this for was to replace all six character codes (that repeat) for colours with three character codes (because #FFF is equivalent to #FFFFFF). So to then replace it, I used the following (in Eclipse):

Find: ([a-fA-F0-9])\1{5}
Replace with: \1\1\1

We can also modify the regular expression so it also catches colours like #FF0000 or #33FF99, where we have three repeating sets of identical digits. These can also be shortened, to #F00 or #3F9.

This modified regular expression will cover this too. In this case we’re looking three times for a single character that gets repeated once each time. We then replace with just the three different single characters, ignore the repeating characters:

Find: ([a-fA-F0-9])\1{1}([a-fA-F0-9])\1{1}([a-fA-F0-9])\1{1}
Replace with: \1\2\3

March 12, 2012

ColdFusion regular expression backreferences and numeric strings

Supposing you have a string that starts with a number, e.g. “25% discount this weekend”, and you’re using that along with a regular expression.  If you’ve got a backreference immediately prior to that “25%…”, ColdFusion treats it as \125 instead of \1. 

For instance:

<cfset myString = "25% Discount this weekend">

<cfset originalString = "Look out for our special offers, including XXX and much more">

<cfset newString = reReplace(originalString, "(.*)XXX(.*)", "\1#myString#\2")>

<cfoutput>#newString#</cfoutput>

This just outputs:
% Discount this weekend and much more
as it thinks that you’re trying to do \125 instead of \1 followed by “25%…”

To get around this, you can use \E to separate the backreference from the rest of the string.
\E is meant to indicate when you’ve reached the end of an uppercase or lowercase block as set by \U or \L. However in this case it simply indicates to the regular expression process where the end of the backreference occurs.

<cfset newString = reReplace(originalString, "(.*)XXX(.*)", "\1\E#myString#\2")>

<cfoutput>#newString#</cfoutput>

This time it correctly outputs:
Look out for our special offers, including 25% Discount this weekend and much more

October 30, 2008

UK Postcode validation

While looking at some old code at work that was doing postcode validation, I thought there must be a better way to do it with a regular expression. By chance I came across this page, UK Government Data Standards Catalogue, which documents the standard for Post Codes. They also have this XML document which basically gives the regex required.

I was going to submit it to the CFLib, but discovered there was already a IsZIPUK function which ostensibly did the same thing, dating back to about 2001! However its regular expression didn’t validate all postcodes correctly. So I submitted this as an update to that function, which has now been published.

The regular expression’s a bit of a mess, I’m no regex master, so if anyone has any tips on how to simplify it, let me know.

Theme: Rubric. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.