Duncan's blog

March 12, 2012

ColdFusion regular expression backreferences and numeric strings

Supposing you have a string that starts with a number, e.g. “25% discount this weekend”, and you’re using that along with a regular expression.  If you’ve got a backreference immediately prior to that “25%…”, ColdFusion treats it as \125 instead of \1. 

For instance:

<cfset myString = "25% Discount this weekend">

<cfset originalString = "Look out for our special offers, including XXX and much more">

<cfset newString = reReplace(originalString, "(.*)XXX(.*)", "\1#myString#\2")>

<cfoutput>#newString#</cfoutput>

This just outputs:
% Discount this weekend and much more
as it thinks that you’re trying to do \125 instead of \1 followed by “25%…”

To get around this, you can use \E to separate the backreference from the rest of the string.
\E is meant to indicate when you’ve reached the end of an uppercase or lowercase block as set by \U or \L. However in this case it simply indicates to the regular expression process where the end of the backreference occurs.

<cfset newString = reReplace(originalString, "(.*)XXX(.*)", "\1\E#myString#\2")>

<cfoutput>#newString#</cfoutput>

This time it correctly outputs:
Look out for our special offers, including 25% Discount this weekend and much more

3 Comments »

  1. Thanks for this, I hit this yesterday.

    Do you have any idea if the underlying Java has this same issue?

    Comment by Dave Merrill — August 31, 2012 @ 1:54 pm | Reply

  2. No idea! It might just be a quirk of how CF handles dynamic variables in the regex replacement string

    Comment by duncan — August 31, 2012 @ 2:22 pm | Reply

  3. G’day Duncan (just found this blog of yours!)
    Nice regex trick you’re documenting here.

    @Dave:
    CF doesn’t have its own regex processor, it just hands it off to Jakarta ORO (which is a long dead Apache project) under the hood (http://jakarta.apache.org/oro/). This is a Java implementation of PERL-compatible regular expressions. Java’s own java.uti.Regex patterns are slightly different. As to why Macromedia chose to use ORO instead of Java’s own regex support, I can only presume it was because ORO was closer to the regex implementation the pre-Java versions of CF (<= CF5) used, and MM didn't want to cause backwards compat issues. Ultimately this has proven to be quite annoying because CF is now stuck with this dead regex implementation, which is falling a bit behind where regexes are now. Fortunately Java regexes are easy enough to use in CF (google "Ben Nadel regex" for thorough coverage).

    I've been trying to convince Adobe to upgrade CF's regex engine… it wouldn't hurt if the ticket got some more votes… https://bugbase.adobe.com/index.cfm?event=bug&id=3037998😉


    Adam

    Comment by Adam Cameron — September 27, 2012 @ 10:32 pm | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Create a free website or blog at WordPress.com.

%d bloggers like this: