This directory contains all the spam that I have received since early 1998. I have employed various "bait" addresses, such as <bait@em.ca> to trick email address harvesters into putting them on spam lists. The archives have been (re)compressed with p7zip which produces the smallest resulting file size of any tool I have tried.
This archive is provided for the purposes of researching behavior of spammers and development of new spam management techniques. Permission is hereby granted to use this archive without restriction. If you publish any research or software based on this archive, I would apreciate a reference to this archive in said work, but it is not required. I would also like to see any such work, and may link it here.
If you have any comments, please e-mail me at bruce@untroubled.org.
NOTE: Most of the messages in this archive contain forged headers in one form or another. The fact that a message claims to have come from one particular address or another does not mean it actually originated at that address. The only way to determine where a message originated is to do a careful study of the Received: headers, and even then much of the information cannot be trusted.
WARNING: I recently have had issues with certain clients refetching these files hundreds of times per day. If I discover you abusing this service, I will block your IP. The files in this directory are updated once per day. There is no point in fetching them more often than that.
This archive was used in the following reports or sites:
The number of messages in the archive for 2007 is lower than for 2006 or 2008. One of the spam traps I had in place was a wildcard address. During 2006, this address started receiving increasingly large amounts of spam, making it hard to process the mail effectively. Since it was all duplicates of other spam I received, I disabled the wildcard address. By 2008, the amount of spam to the other addresses had increased back to the same levels.
At the beginning of April 2018, I let the bruce-guenter.dyndns.org domain name expire. Most of the spam I receive came through that domain name. As such, this archive may not be of much use going forward.
Several people have asked about the file names within the archives. There have been a variety of methods used to move messages from my mailboxes into the archive over time. The formats are:
Name | Modification Time | Size | |
---|---|---|---|
Parent Directory | 2023-08-01 22:04 | - | |
1997-1998-headers.tar.bz2 | 1998-03-26 11:15 | 67k | |
1997-1998-spam-headers.bz2 | 1998-03-26 10:48 | 66k | |
1998.7z | 2005-09-05 13:33 | 754k | |
1999.7z | 2005-09-05 13:33 | 898k | |
2000.7z | 2005-09-05 13:34 | 1.5M | |
2001.7z | 2005-09-05 13:39 | 5.8M | |
2002.7z | 2005-09-05 17:04 | 11.2M | |
2003.7z | 2005-09-07 12:57 | 28.9M | |
2004.7z | 2005-09-07 12:57 | 53.6M | |
2005.7z | 2006-06-19 12:30 | 37.4M | |
2006.7z | 2007-08-06 12:25 | 103.5M | |
2007.7z | 2008-01-01 15:15 | 87.1M | |
2008.7z | 2009-01-01 14:32 | 129.4M | |
2009.7z | 2010-01-01 11:01 | 137.0M | |
2010.7z | 2011-01-01 22:17 | 201.9M | |
2011.7z | 2012-01-02 11:32 | 95.2M | |
2012.7z | 2013-01-02 10:57 | 145.6M | |
2013.7z | 2014-01-01 13:15 | 132.5M | |
2014.7z | 2015-01-02 09:04 | 136.4M | |
2015.7z | 2016-05-30 12:40 | 148.4M | |
2016.7z | 2017-01-21 17:24 | 136.1M | |
2017.7z | 2018-03-16 16:53 | 259.5M | |
2018.7z | 2019-01-25 16:39 | 19.0M | |
2019.7z | 2020-07-16 17:15 | 19.8M | |
2020.7z | 2021-09-22 09:26 | 35.1M | |
2021.7z | 2022-06-19 19:59 | 21.5M | |
2022.7z | 2023-01-07 20:22 | 18.6M | |
2023-01.7z | 2023-01-31 05:30 | 2.1M | |
2023-02.7z | 2023-03-05 15:02 | 2.3M | |
2023-03.7z | 2023-03-31 05:30 | 8.1M | |
2023-04.7z | 2023-04-30 05:30 | 3.9M | |
2023-05.7z | 2023-05-31 05:30 | 2.4M | |
2023-06.7z | 2023-06-30 05:30 | 1.7M | |
2023-07.7z | 2023-07-31 05:30 | 3.3M | |
2023-08.7z | 2023-09-03 15:57 | 1.3M | |
2023-09.7z | 2023-09-29 05:30 | 1.7M | |
2023-10.7z | 2023-11-01 05:55 | 1.6M | |
2023-11.7z | 2023-12-01 05:55 | 2.1M | |
2023-12.7z | 2024-01-01 05:55 | 654k | |
2024-01.7z | 2024-02-01 05:55 | 3.1M | |
2024-02.7z | 2024-03-01 05:55 | 2.4M | |
2024-03.7z | 2024-04-01 05:55 | 667k | |
2024-04.7z | 2024-05-01 05:55 | 1.6M | |
2024-05.7z | 2024-06-01 05:55 | 2.5M | |
2024-06.7z | 2024-07-01 05:55 | 748k | |
2024-07.7z | 2024-08-01 05:55 | 847k | |
2024-08.7z | 2024-09-01 05:55 | 840k | |
2024-09.7z | 2024-10-01 05:55 | 909k | |
2024-10.7z | 2024-11-01 05:55 | 1.2M | |
2024-11.7z | 2024-12-01 05:55 | 1.1M | |
2024-12.7z | 2024-12-21 05:30 | 777k | |
attachments | 2024-12-21 05:30 | - |