Common Flaws Interpreting Web Statistics in various packages
Lots of webstats packages have potential for displaying anomalies in their reports. I believe that statistical analyser packages should NEVER filter the data you see in your reports - that should be your job! (with their help, naturally.) Any analysis software ought to put you in control over what you're seeing in your reports, making it clear exactly what it is that you are seeing and giving you the facility to filter out anything that might give you anomalies. However, sadly, in my experience this is almost never the case - you are presented with data that has been pre-filtered without any real recourse for understanding how it has been filtered, what is included and excluded etc.
I describe all of the below as "flaws", but really it's up to you to decide whether these really do constitute flaws. Remember that some stats packages will also naturally attempt to reduce the effects of these flaws either by giving you separate reports, or by simply excluding data which it deems open to flaws.
Flaws
In my mind, the most important flaw that can be introduced into your stats data, is crawler/robotic visits. These robotic visitors can inflate your visitor numbers by upwards of 50% and I've seen statistics that inflate page view numbers to over 80%!! If your current stategy (marketing or otherwise) is based solely on these numbers alone then you are in serious danger if not taking into account robotic visitors to your website.
A recent site that I looked at appeared to show through a particular stats package that it had 3,000 visitors per month and over 9,000 page views. However, after deeper inspection using a separate stats package, it was immediately obviously that 1,200 of the visitors were a combination of google, Yahoo Slurp and MSNBot. More depressing still, 6,300 of the page views came from these robots (and a few others), leaving less than 3,000 page views by actual people!
Referer data is also open to abuse. Some stats programs, in particular some of the simpler ones, for some reason will show your own website as a referer. Apart from being completely useless as data, it can actually harm your ability to analyse your true referers. However, two more important and more common referer problems are:
- Referers from hot-linking websites
Sites hot-linking elements from your website will appear in your stats data as referers, making you believe that you are getting visitors from a source that simply isn't sending visitors your way. They are, instead, using up your bandwidth.
- Referer spam
You'll be well aware what email spam is and referer spam is essentially the same - unwanted advertising. These spammers will use robotic web crawlers to visit your website and fool the stats by saying that they have come to your website by clicking a link on the site they are attempting to advertise. You, being the curious soul that you clearly are, will investigate this when you see it in your stats, thus visiting the site they are advertising. Job done by the spammer.
The other potential flaw introduced by people hot-linking is that each fetch of an element from your website may be classified as a visit by your stats program, even if there is actually no page view.
Many stats programs allow you to exclude your own IP Address from its statistics analysis. This enables the program to exclude your own visits and page views to your website - important if you're interesting in ensuring that your statistics are not over-inflated by your own visits! However, there are also a lot of stats programs which don't give you this facility (or hide it away so you don't even know it's there!).
The last easily identifyable flaw is one which is open to debate - and it is images.google.com. As you are probably aware, google the search engine, also indexed images into its database and users can search solely for images. Some stats packages do not differentiate between images.google.com and www.google.com, thus you might be getting numerous referers from the image search engine being misreported. Generally I think it's fair to say that an image searcher is not going to be as valuable as a normal searcher (he/she will, after all, be looking for an image and not your website itself). Add to this that all stats packages will log the search terms entered in images.google.com and your keyword considerations based on incoming search terms could potentially be very flawed indeed.
Unavoidable Flaws
There are of course some flaws which are a completely unavoidable part of analysing data which is gathered by robotic means.
As previously discussed, most analyzer programs will use the IP Address of the visitor to ascertain visitor sessions. However, some visitors to your website will have dynamic IPs, and these are IPs which are subject to change each time the user accesses the internet. Therefore, your stats program will not be able to flag a visitor with a dynamic IP as a return/repeat visitor. Whilst this is a sad reality, it is of course not a real problem since that particular statistic is not on the higher scales of usefulness to you. However, there is a significant problem with AOL users in the same vein as this. AOL users (specifically dial-up) not only obtain a dynamic IP Address each time they login, indeed their IP Address is subject to change on each separate page load! This can significantly increase the numbers of visitors your stats program is able to ascertain from your logs (but of course won't affect your page view data, other than avg. pages/visit)
As a result of perceived security risks and a good dose of paranoia, some browsers are now not transmitting as much information as they used to. In particular, referer data is not always transmitted by everyone's browser. Although this is for now a limited issue, in the future it may become more-so.
On a related note, and with particular reference to the Integrated Tracking method, many people are now either disabling javascript in their web-browsers, or their internet security packages (eg. Norton) asks permission of the visitor to your site to executive javascript on your website. If you are using the Integrated Tracking method and javascript is disabled, you will not receive any information about that visitor's visit to your site.
My final "unavoidable" flaw would be your visitors "caching" data. Caching means that if a visitor has previously been on your website, instead of the visitor's browser going to the server your website is hosted on and requesting the latest version of the page (eg. index.htm), the browser instead recognises that it's already seen that page and displays that instead. Caching can also happen at an ISP level, meaning that the visitor's ISP could also have a cache of the website and send the visitor that instead of going to your server.
|
|
|