Navigation | Funnel Web Disappearing/Crashing Fix

Funnel Web Disappearing/Crashing Fix

Funnel Web Analyzer is an excellent web log analyzer by Quest that we've been using since before Quest bought it. It used to be fairly expensive, but they decided to stop development on it, and at that point released it for free. It runs on Windows, OSX, Linux, and Solaris. It hasn't been updated in a while, officially, but you can pick up an updated settings file that will support more recent browsers and search engines (and modify it yourself for anything you want to track that's missing).

As I said, we've been using it in my office for years, but in the last couple of months it started failing on me in a weird way. It would just silently crash. No message. Nada. Just one minute it's there, the next it's gone. Part of that I tracked down to some settings in Windows system preferences.. but there was no good indication of *why* it crashed. I knew it wasn't because of the volume of data passing through it because it handled years at a time previously without complaint. Very odd.

Long story short, back in late January I switched our sites from having log files that roll at X megs (50 in some places, 500 in others), to log files that roll once per hour. That was the problem — either funnel web itself or the process that unzips .gz files cannot handle more than a few thousand log files at once.

So, how to fix it? Not sure if I picked the best possible way, but it seems to have done the trick. Here's what I did:

  1. Changed all the servers to log by day instead of hour, so I don't have to deal with this again in a few months
  2. uncompressed the files into a subfolder
  3. combined the log files into a monthly file using DOS — copy 2008-www1-ex0801*.log 2008-www1-ex08.log
  4. If the resultant log files are over 1.2 gigs (the max TextPad will open), run the freeware TextSplitter tool to split them into chunks of  around 900 megs each
  5. Open the first file in TextPad, copy the log file header lines to another document.
  6. regular expression replacement to strip out all the header lines: ^\#.*\n
  7. Tools > Sort > From 1 Length 20 Ascending Case Insensitive (in case the merge put some of the files in in the wrong order)
  8. Check for any lines from the wrong month, drag those out into a new document if they exist
  9. Paste in the header at the beginning, update the date/time stamp there to match the first record if needed
  10. Save and move to the next file. Rinse. Repeat. Be sure to drag/paste any extra hours of data you might have excised from earlier months.

At the end of this, instead of having around 7000 files for one 2-server cluster, I had around 20-30 depending on the site. Going forward of course, this number will be larger since I have it set to 1 file per day. But at any rate, I was then able to easily run my reports and can remain confident that the scheduler will begin to do its job properly again.

Hope that helps someone out there. :)

Filed by JC at July 14th, 2008 under Web Stats

Leave a comment