How does junk email affect your web metrics?
Junk emails don't have to break into a web server to make it do their dirty work. They merely have to subvert the email system. Most sites have contact or enquiry forms. Most of these take the information submitted and convert it into an email message.
It is remarkably easy to trick such a system into sending out junk email. A common technique is to place complete messages in a form's comment box. The text placed in the comment box includes commands to the email system. The email system has to read the comment in order to convert it into an email message. When it does so these commands trick it into sending out the junk email. Sendmail, the most popular email system used to process forms, is especially vulnerable to this.
A related trick is to use software to sweep the web for forms, and then read the form's HTML code in order to find the location of the program which processes the form. Once they have the location, the junk emailers start probing this program to see if they can trick it into sending email directly. This means they no longer need to work through the original form. If they can do this, they can send out a great deal more email than is possible via the form. With most forms, you are unlikely to be able to send more than 20 to 30 junk emails at a time. If you can access the mail program directly, you can work in batches of hundreds or even thousands.
Of course, once technicians realized this was happening, they started building defenses against it. First they put checks into the forms to make sure they were being used by a legitimate browser such as Internet Explorer, and not by junk email software such as Spamalot. Next, they put checks into the form processing systems on the web server to ensure the data was coming to them from the appropriate forms, and not somewhere else. Thus a secure site will only accept form data from its own form pages, used by legitimate web browsers, and contains checks to ensure this. You might want to ask your techies if they have these checks in place. Most sites do not.
No prizes for guessing what the junk emailers did to get around this. The obvious response is to write software that imitates a legitimate browser and runs your forms. It could even talk directly to your form processing software and simply pretend to be your form inside a legitimate browser.
In fact, junk emailers didn't even have to write new software to accomplish this: multiple products already exist that imitate browsers, many are even free. For testing purposes people have developed systems to operate on websites just like browsers… reading pages, clicking links, running JavaScript. One, Selenium, was even one of the five finalists for Developer.com's Open Source Tool of the Year in 2006. Another major product is Watij, a Java API. These are legitimate products, but they illustrate the ease with which it is possible to develop software which can imitate browsers. You could even use Microsoft's Internet Explorer developer kit to reprogram Internet Explorer itself.
Because junk email systems like this need to imitate browsers fully, they can activate page-based tracking mechanisms. In most cases this will result in them being recorded as visitors in your web reports. If you are using log file analysis, then they will show in your records as just another visitor.
Next: Spotting junk traffic
