Duncan's blog

August 21, 2008

Cartesian join versus ListAppend

Filed under: Coldfusion — duncan @ 7:29 pm
Tags: , , ,

Over on Ray Camden’s blog there was a recent discussion about the best way to “create a list of all possible combinations” if you have two lists you want to join together. e.g. you have a list of colours and a list of sizes, and you want to get all possible combinations of colours and sizes.

Ray used a simple nested loop with ListAppend to generate a new list. I thought maybe it would be even simpler just to do a cartesian join on the options.

Gary Funk rightly pointed out this wasn’t necessarily a great idea:

Should I mention that it is bad form to use a cartesian join in production. If you have 100 items in table A and 100 items in table B, you just returned 10,000 records. I’ve never met a DBA that that allows a cartesian join, except in testing.

I’d read a great post recently on the Aliaspooryorik blog comparing string concatenation methods for performance. ListAppend was always the slowest method. Remembering that, I thought there’s no way Ray’s method could be any better than a cartesian join for Gary’s theoretical 10000 rows.

Here’s the code I used to test that (using the exact same methodology as Aliaspooryorik).

<!--- try a long list: --->
<cfset colorList = "">
<cfset sizeList = "">
<cfloop index="i" from="1" to="100">
	<cfset colorList = ListAppend(colorList, "color#i#")>
	<cfset sizeList = ListAppend(colorList, "size#i#")>

<!--- ListAppend --->
<cfset startTime = getTickCount()>

<cfset comboList = "">
<cfloop index="c" list="#colorList#">
	<cfloop index="s" list="#sizeList#">
		<cfset comboList = listAppend(comboList, c & " " & s)>

<cfset endTime = getTickCount()>
	ListAppend : #endTime-startTime#ms<br>

<!--- use a query --->
<cfset startTime = getTickCount()>

<cfset Colours = QueryNew("colour")>
<cfset QueryAddRow(Colours, ListLen(colorList))>
<cfloop index="i" from="1" to="#ListLen(colorList)#">
	<cfset QuerySetCell(Colours, "colour", ListGetAt(colorList, i))>

<cfset Sizes = QueryNew("size")>
<cfset QueryAddRow(Sizes, ListLen(sizeList))>
<cfloop index="i" from="1" to="#ListLen(sizeList)#">
	<cfset QuerySetCell(Sizes, "size", ListGetAt(sizeList, i))>

<cfset endTime = getTickCount()>
	Generate Queries : #endTime-startTime#ms<br>

<!--- Query of Queries --->
<cfset startTime = getTickCount()>

<cfquery name="CartesianJoin" dbtype="query">
	SELECT Colour, Size
	FROM Colours, Sizes

<cfset endTime = getTickCount()>
	Query of Queries : #endTime-startTime#ms<br>

Of course, I’m not using a real query to the database in this example, just creating it with the QueryNew function. I split the time out from creating my two original query objects, as normally you wouldn’t do this, you’d just run the second query. So overall the execution time for my method is probably a bit off what it might be in production. However the real eye opener is the slowness of ListAppend. I suspect if Ray amended his code to use ArrayAppend, it would be very quick over large numbers of rows.

The time I got back first time I ran this (on CFMX 7) was
ListAppend : 30703ms
Query of Queries : 172ms
And this was typical of timing for subsequent runs. Your mileage may vary.


Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: