Blog Post

In-depth analysis of comment spam

Before we get to methods to fight comment spam, it is very important to understand how spammers and spambots work, what methods they use, what types of comments they are posting.

This analysis is based on the comment spam on www.dev4press.com. Dev4Press website is a network of over 20 websites, but comments are allowed only on the main website that includes the blog. Before we go on, you might wanna check out the previous article: Comment spam: how does it work?.

Comments Settings

WordPress settings regarding comments are set so that anyone can comment (with or without the user account), comment authors need to have a comment approved first (so basically, all comments from new authors are moderated), and comments are held for moderation if they contain 3 or more links. There are no keywords in moderation list or blacklist.

Analysis period and basic stats

This analysis deals with the comment spam on Dev4Press for the whole 2015. In this period, Dev4Press blog received the total of 628925 spam comments. No spam appeared on the website, it was all caught by WordPress (comments with too many links) or by Antispam Bee plugin. The analysis is focusing on the structure of the comments (length of the comment, number of links) and on users accounts posting spam.

For this period, there was the total of 2,153,579 links in all the spam comments combined. That is 3.41 links per one spam comment. 538,913 spam comments were posted by registered user accounts. For this, spammers used 213 different registered accounts, but on average 60 accounts were used each month. These accounts used emails belonging to 54 different domains (many of these are no longer active).

Spam and links

Let’s start with a number of links in each spam comment. This is very important because the number of links is the easiest method to spot spam comment. Data also shows the number of comments that had links wrapped in BBCodes URL tag. This number was on the rise in the last 3 months of 2015, but I am not sure why.

Month Spam Links with BBCodes
January 49,625 137,241 299
February 40,200 106,583 966
March 43,425 118,266 1104
April 46,025 117,415 1219
May 43,025 111,780 788
June 54,900 160,839 529
July 62,925 212,957 552
August 73,025 240,235 230
September 58,550 211,163 805
October 60,750 229,713 9591
November 59,025 296,125 19,389
December 37,450 211,262 15,433
628,925 2,153,579 50,905.00

And here is the chart.

Spam comments and number of links
Spam comments and number of links

Last 2 months of 2015, the number of links was rising, and the average number of links was over 5 links in each comment. The overall average for the whole year is 3.41. I am not sure what is the reason for this, but the most likely explanation is the fact that last two months had biggest holidays of the year and that influenced spammers to work more.

Some comments had more than 200 links, but most comments had only 2 links, and some had no links! Here is the chart:

Spam comments with 2 links and with 0 links
Spam comments with 2 links and with 0 links

The interesting thing about the comments with no links is that they most likely represent bugs in spammer software because comment text looked like something cut from the larger text and it was obvious that either beginning or end was missing. And the fact that most comments had only 2 links points to the tailoring of comments to pass undetected through WordPress default filters.

Spam comments size

The size of comments is directly linked to the number of links, and it is understandable that over 710% of all comments were under 1KB in the size of comment content. 29% of comments were over 1KB in size. In most extreme cases, there were comments with more than 20KB of comment content size. And, yes, there were comments that had content made of one (or few) word(s) only. Such comments rely on the single link in the URL field, and no links in content. Approximately 2% of all spam comments were under 20 characters in length.

Spam users

Most important metrics in this analysis is showing how the spam got delivered. I was surprised to learn that over 85% of all spam comments were posted using registered user accounts. Dev4Press website allows free account registration, so spammers use that to create accounts that will be used to deliver spam. Logic is that spam filters will maybe let users post comments without control. And this only proves that spam filters need to check all comments regardless of the user account used.

Month Spam Users Visitors Accounts
January 49,625 43,286 6,339 55
February 40,200 34,983 5,217 49
March 43,425 37,444 5,981 54
April 46,025 39,100 6,925 60
May 43,025 37,007 6,018 63
June 54,900 47,725 7,175 61
July 62,925 53,360 9,565 61
August 73,025 64,124 8,901 68
September 58,550 50,485 8,065 72
October 60,750 52,141 8,609 68
November 59,025 48,760 10,265 67
December 37,450 30,498 6,952 45
628,925 538,913 90,012

And here is the chart:

Spam comments by users and visitors
Spam comments by users and visitors

Accounts used for spamming were overlapping from month to month. But, most accounts were active for 2-3 months. Most domains used for emails are no longer active, and most domains were related to ‘adult toys’ and all sorts of variations. But, there were regular Gmail, Yahoo, and other popular free mail services. Total of 7233 IP addresses were used to deliver spam to Dev4Press during 2015.

Conclusion

This whole analysis was very useful to me to better understand how spam gets delivered, what type of accounts were used and how the comments look like, what kind of content is included. Different websites will get different results, influenced by the comments settings and policy regarding account registration, but, I have compared data from other websites that allowed free accounts registration, and they also show the tendency that most spam comments get delivered by registered accounts.

Next posts in this series will deal with methods for spam prevention, starting with use of honeypots, followed by reCaptcha and other spam fighting methods.

Please wait...
GD Security Toolbox Pro
Proactive protection and security hardening

A collection of many security related tools for .htaccess hardening with security events log, ReCaptcha, firewall, and tweaks collection, login and registration control and more.

About the author

Milan Petrovic
Milan Petrovic

CEO and Lead developer of Dev4Press Web Development company, working with WordPress since 2008, first as a freelancer, later founding own development company. Author of more than 250 plugins and more than 20 themes.

Subscribe to Dev4Press Newsletter

Get the latest announcements, release digests, promotions and exclusive discounts, and general Dev4Press-related news straight into your mailbox.


This form collects your email (optionally your name) for the purpose of sending you newsletters. Check out our Privacy Policy for more information on how we store and manage your data. We will not send you any spam. Newsletters are sent 2 to 4 times every month.

Leave a Comment

SiteGround - Managed WordPress Hosting
Grammarly - Number 1 Writing App