SPAM Archive

This directory contains all the spam that I have received since early 1998. I have employed various "bait" addresses, such as <bait@em.ca> to trick email address harvesters into putting them on spam lists. The archives have been (re)compressed with p7zip which produces files about half the size of tar+bzip2, and smaller even than I was able to achieve with RAR on Linux.

This archive is provided for the purposes of researching behavior of spammers and development of new spam management techniques. Permission is hereby granted to use this archive without restriction. If you publish any research or software based on this archive, I would apreciate a reference to this archive in said work, but it is not required. I would also like to see any such work, and may link it here.

If you have any comments, please e-mail me at bruce@untroubled.org.

NOTE: Most of the messages in this archive contain forged headers in one form or another. The fact that a message claims to have come from one particular address or another does not mean it actually originated at that address. The only way to determine where a message originated is to do a careful study of the Received: headers, and even then much of the information cannot be trusted.

This archive was used in the following reports or sites:

Notes

The number of messages in the archive for 2007 is lower than for 2006 or 2008. One of the spam traps I had in place was a wildcard address. During 2006, this address started receiving increasingly large amounts of spam, making it hard to process the mail effectively. Since it was all duplicates of other spam I received, I disabled the wildcard address. By 2008, the amount of spam to the other addresses had increased back to the same levels.

File Naming

Several people have asked about the file names within the archives. There have been a variety of methods used to move messages from my mailboxes into the archive over time. The formats are:

  1. [timestamp].[PID].txt
    This is how qmail produces maildir filenames, with the hostname stripped off.
  2. [timestamp].[PID]_[serial].txt
    This is how Mutt saves files, where [PID] is Mutt's PID, and [serial] is a number to make the filename unique.
  3. [timestamp].[PID]_[serial].[hostname]
    This is the same as #2, but something didn't strip off the hostname.
  4. [timestamp].M[microseconds]P[PID].txt
    This is used by some newer maildir scripts to generate more unique filenames by including the microsecond portion in the timestamp.

Index of spam


NameModification TimeSize

Parent Directory2014-04-30 09:40 -
1997-1998-headers.tar.bz21998-03-26 11:15 67k
1997-1998-spam-headers.bz21998-03-26 10:48 66k
1998.7z2005-09-05 13:33 754k
1999.7z2005-09-05 13:33 898k
2000.7z2005-09-05 13:34 1.5M
2001.7z2005-09-05 13:39 5.8M
2002.7z2005-09-05 17:04 11.2M
2003.7z2005-09-07 12:57 28.9M
2004.7z2005-09-07 12:57 53.6M
2005.7z2006-06-19 12:30 37.4M
2006.7z2007-08-06 12:25103.5M
2007.7z2008-01-01 15:15 87.1M
2008.7z2009-01-01 14:32129.4M
2009.7z2010-01-01 11:01137.0M
2010.7z2011-01-01 22:17201.9M
2011.7z2012-01-02 11:32 95.2M
2012.7z2013-01-02 10:57145.6M
2013.7z2014-01-01 13:15132.5M
2014-01.7z2014-02-01 06:22 9.2M
2014-02.7z2014-03-01 06:11 7.9M
2014-03.7z2014-04-01 06:15 12.1M
2014-04.7z2014-05-01 06:25 12.4M
2014-05.7z2014-06-01 06:20 13.2M
2014-06.7z2014-07-01 06:17 13.7M
2014-07.7z2014-08-01 06:17 12.0M
2014-08.7z2014-08-28 05:31 9.8M
attachments2014-08-28 05:31 -

Valid XHTML 1.0!