Spoofing Popularity-A warning to Webmasters

In which Phil tries to warn you of the dangers of over-valuing Website-traffic Stats.

A friend who runs a local history website in a rural area of England surprised me by saying that he values a single letter of interest or appreciation more highly than any amount of increased web-traffic on his site. He gave me a telling example. He once did a transcription of a five hundred year old bill for repairing a bridge in the locality. He admitted that it was probably of no interest or importance to anyone but a small group of historians. Yet it became consistently one of the most popular pages on his site. Many web-masters would have then filled their site will lots of other 400 year-old transcriptions of bills or repair, in the hope of increasing their traffic. Not he; He was curious and so investigated. After ‘googling’ around for a while he discovered that there was, near the bridge a car park that the misguided local authority had placed for people who came out to enjoy the countryside. It was being used as a meeting place for people who wished to engage in unusual or bizarre sexual practices in cars or hedges. his page was being picked by various automated crawlers and various hopeful ‘googlers’ looking for a partner.

I know of many other examples where ‘hits’ and ‘visits’ to websites have been misinterpreted. My friend the historian couldn’t care about traffic as he does the work for pleasure but…on a commercial site this could be hard on the wallet..

As part of my job, I test websites with simulated traffic to see how they stand up, and to iron out the problems before they are made public. I use tools that would horrify the average Webmaster. One can simulate the user agent and the source IP address. It is easy to simulate normal traffic, send POST and REQUEST HTTP messages, do logins and XML-based transactions, FTP, POP3, and SMTP. The part I enjoy most is to spoof names and addresses, card details and so on, or provide messages that seem to have come from a real person. I use only SQL Server functions and procedures, with the bare minimum of command-line utilities. I mention all this not to boast, but to point out that it is part of the toolkit of a lot of IT people..

It is easy to be malicious with this sort of tool. Have you noticed the sponsored links on Google? These are ‘pay per click’. I shudder when commercial concerns sign up to this sort of deal, just as I do when marketing firms offer to charge by the amount of increased traffic to a site. Every time your automaton clicks on one of these links, someone is charged for it. In the next world, perhaps, where there is no malice or competitive drive, this would make perfect sense. In this world, the only deal that makes commercial sense is to pay by the number of people who first used the sponsored link, and then went on to purchase something. In the meantime, it is a clilling idea that someone might set an automaton to click on the links of their rivals in business. How could one possibly tell?.

I recently had to advise a client who were completely transfixed by the idea of hits and visits as measures of the performance and quality of their website. I implored them to take a realistic and cynical approach. They were about to sign a contract with a ‘Web Marketing’ firm that involved them paying them fees in proportion to the upturn in traffic to their site. This smelt bad to me. It smelt so bad that I promised them I’d do it for free. I rushed home, got out my toolkit and let ’em have it through both barrels. When I returned, they were walking on air and were delighted with whatever it was I’d done, even though I was purple in the face and shouting “It’s spoofed! It’s spoofed!”. It was only after considerable sober talking that the truth sank in: that their rock solid measure of the site’s performance was a quagmire. This was culture shock..

Unfortunately, it is not only the angels who have the ability to spoof web traffic. Those on the dark side share the technology. One ingenious fraud that has taken in several IT websites in the states starts with plagiarism and then gets worse. Initially, someone, usually a lecturer in an IT department of (for some reason) an Asian University, copies out of something written by an expert. In the case of SQL databases, it tends to be taken from Joe Celco or Ken Henderson, making only slight cosmetic changes. Joe Celco has written so much he wouldn’t notice and would just think it was someone agreeing with him (I just plagiarised that from Ken Henderson). Recently, they have become confident enough to lift stuff straight out of MSDN. They then offer it to one of the commercial fee-paying websites. It is quite easy to spot them. The surprise is how popular they seem to be. The number of visits they get is quite amazing. The Webmasters therefore love them and buy more from the same source. Number of visits? Hmmmm…. My thoughts go back to my trusty traffic-spoofing toolkit. Setting this up to produce visits in any website statistics, and fool all but the cleverest stats packages would the work of an idle moment. I could even generate the various appreciative comments that they get in the forums. This is not because of the artificial intelligence of my programs, but the natural stupidity of some of the real contributors to the forums. If it is done right it is very difficult to prove, but I’d just warn all webmasters to treat website statistics with a lot of caution and not to draw too many conclusions from them.