Web Stats


Funnel Web Analyzer is an excellent web log analyzer by Quest that we've been using since before Quest bought it. It used to be fairly expensive, but they decided to stop development on it, and at that point released it for free. It runs on Windows, OSX, Linux, and Solaris. It hasn't been updated in a while, officially, but you can pick up an updated settings file that will support more recent browsers and search engines (and modify it yourself for anything you want to track that's missing).

As I said, we've been using it in my office for years, but in the last couple of months it started failing on me in a weird way. It would just silently crash. No message. Nada. Just one minute it's there, the next it's gone. Part of that I tracked down to some settings in Windows system preferences.. but there was no good indication of *why* it crashed. I knew it wasn't because of the volume of data passing through it because it handled years at a time previously without complaint. Very odd.

Long story short, back in late January I switched our sites from having log files that roll at X megs (50 in some places, 500 in others), to log files that roll once per hour. That was the problem — either funnel web itself or the process that unzips .gz files cannot handle more than a few thousand log files at once.

So, how to fix it? Not sure if I picked the best possible way, but it seems to have done the trick. Here's what I did:

  1. Changed all the servers to log by day instead of hour, so I don't have to deal with this again in a few months
  2. uncompressed the files into a subfolder
  3. combined the log files into a monthly file using DOS — copy 2008-www1-ex0801*.log 2008-www1-ex08.log
  4. If the resultant log files are over 1.2 gigs (the max TextPad will open), run the freeware TextSplitter tool to split them into chunks of  around 900 megs each
  5. Open the first file in TextPad, copy the log file header lines to another document.
  6. regular expression replacement to strip out all the header lines: ^\#.*\n
  7. Tools > Sort > From 1 Length 20 Ascending Case Insensitive (in case the merge put some of the files in in the wrong order)
  8. Check for any lines from the wrong month, drag those out into a new document if they exist
  9. Paste in the header at the beginning, update the date/time stamp there to match the first record if needed
  10. Save and move to the next file. Rinse. Repeat. Be sure to drag/paste any extra hours of data you might have excised from earlier months.

At the end of this, instead of having around 7000 files for one 2-server cluster, I had around 20-30 depending on the site. Going forward of course, this number will be larger since I have it set to 1 file per day. But at any rate, I was then able to easily run my reports and can remain confident that the scheduler will begin to do its job properly again.

Hope that helps someone out there. :)

I keep this in a file on my desktop.. may as well put it here, too… maybe it'll help someone someday.

CRLF:
cfset crlf = #chr(13)#&#chr(10)#

Variable Variables:
cfset varvar = "MyVarName"
cfset "#varvar#" = "whatever"
#evaluate(varvar)# outputs "whatever", #varvar# outputs "MyVarName"

Existence/Type Test:
StructKeyExists() IsDefined() (isDefined fails on complex vars with[], use dot notation) IsArray() IsStruct() IsQuery() IsSimpleValue() IsNumeric() IsDate() IsBinary() IsBinary()

CFHTTP Connection Failure
Gzip bug — add this:
cfhttpparam type="Header" name="Accept-Encoding" value="deflate;q=0″
cfhttpparam type="Header" name="TE" value="deflate;q=0″
Also check for redirect="yes|no" — sometimes redirect needs to be set to no

Query Variables:
query_name.currentRow, query_name.columnList, query_name.RecordCount, result_name.sql

Query of Queries:
cfquery dbtype="query" name="foo" — select * from query_name

CFDirectory Performance:
listinfo="name" significantly improves performance, if you don't need other columns

CFQueryParam:
Null="#YesNoFormat(NOT LEN(MyVar))#" will make the value NULL instead of empty
List="yes" will internally CFParam all list values so you can do WHERE ID IN (cfqueryparam… list="yes") and get a valid list format. Add separator = '|' instead of delimiter='|' — for some reason this tag is different than all the other ones.
CF_SQL_TIMESTAMP — datetime, smalldatetime (use CreateODBCDateTime() for this)
CF_SQL_CHAR — char, nchar, unique identifier
CF_SQL_VARCHAR — varchar, nvarchar, sysname
CF_SQL_LONGVARCHAR — text, ntext
CF_SQL_NUMERIC — numeric (scale='2′ for 2 decimal places)
CF_SQL_INTEGER — int
CF_SQL_SMALLINT — smallint
CF_SQL_TINYINT — tinyint
CF_SQL_BIT — bit
CF_SQL_FLOAT — float
CF_SQL_DECIMAL — decimal, money, smallmoney (scale='2′ for 2 decimal places)
CF_SQL_BINARY — binary, timestamp

Merge Log Files (Not CF, but useful!)
copy *.log merged.log

CF Regex:
rereplace(foo, "[[:space:]]+", " ", "all") — turns all whitespace into single space
rereplace(foo, ".*?", "", "all") — strips all HTML tags (probably fails if they span multiple lines)

(updated July 16)