Before we get to methods to fight comment spam, it is very important to understand how spammers and spambots work, what methods they use, what types of comments they are posting.

This analysis is based on the comment spam on www.dev4press.com. Dev4Press website is a network of over 20 websites, but comments are allowed only on the main website that includes the blog. Before we go on, you might wanna check out the previous article: Comment spam: how does it work?.

Comments Settings

WordPress settings regarding comments are set so that anyone can comment (with or without the user account), comment authors need to have a comment approved first (so basically, all comments from new authors are moderated), and comments are held for moderation if they contain 3 or more links. There are no keywords in moderation list or blacklist.

Analysis period and basic stats

This analysis deals with the comment spam on Dev4Press for the whole 2015. In this period, Dev4Press blog received the total of 628925 spam comments. No spam appeared on the website, it was all caught by WordPress (comments with too many links) or by Antispam Bee plugin. The analysis is focusing on the structure of the comments (length of the comment, number of links) and on users accounts posting spam.

For this period, there was the total of 2,153,579 links in all the spam comments combined. That is 3.41 links per one spam comment. 538,913 spam comments were posted by registered user accounts. For this, spammers used 213 different registered accounts, but on average 60 accounts were used each month. These accounts used emails belonging to 54 different domains (many of these are no longer active).

Spam and links

Let’s start with a number of links in each spam comment. This is very important because the number of links is the easiest method to spot spam comment. Data also shows the number of comments that had links wrapped in BBCodes URL tag. This number was on the rise in the last 3 months of 2015, but I am not sure why.

MonthSpamLinkswith BBCodes
January49,625137,241299
February40,200106,583966
March43,425118,2661104
April46,025117,4151219
May43,025111,780788
June54,900160,839529
July62,925212,957552
August73,025240,235230
September58,550211,163805
October60,750229,7139591
November59,025296,12519,389
December37,450211,26215,433
628,9252,153,57950,905.00

And here is the chart.

Spam comments and number of links
Spam comments and number of links

Last 2 months of 2015, the number of links was rising, and the average number of links was over 5 links in each comment. The overall average for the whole year is 3.41. I am not sure what is the reason for this, but the most likely explanation is the fact that last two months had biggest holidays of the year and that influenced spammers to work more.

Some comments had more than 200 links, but most comments had only 2 links, and some had no links! Here is the chart:

Spam comments with 2 links and with 0 links
Spam comments with 2 links and with 0 links

The interesting thing about the comments with no links is that they most likely represent bugs in spammer software because comment text looked like something cut from the larger text and it was obvious that either beginning or end was missing. And the fact that most comments had only 2 links points to the tailoring of comments to pass undetected through WordPress default filters.

Spam comments size

The size of comments is directly linked to the number of links, and it is understandable that over 710% of all comments were under 1KB in the size of comment content. 29% of comments were over 1KB in size. In most extreme cases, there were comments with more than 20KB of comment content size. And, yes, there were comments that had content made of one (or few) word(s) only. Such comments rely on the single link in the URL field, and no links in content. Approximately 2% of all spam comments were under 20 characters in length.

Spam users

Most important metrics in this analysis is showing how the spam got delivered. I was surprised to learn that over 85% of all spam comments were posted using registered user accounts. Dev4Press website allows free account registration, so spammers use that to create accounts that will be used to deliver spam. Logic is that spam filters will maybe let users post comments without control. And this only proves that spam filters need to check all comments regardless of the user account used.

MonthSpamUsersVisitorsAccounts
January49,62543,2866,33955
February40,20034,9835,21749
March43,42537,4445,98154
April46,02539,1006,92560
May43,02537,0076,01863
June54,90047,7257,17561
July62,92553,3609,56561
August73,02564,1248,90168
September58,55050,4858,06572
October60,75052,1418,60968
November59,02548,76010,26567
December37,45030,4986,95245
628,925538,91390,012

And here is the chart:

Spam comments by users and visitors
Spam comments by users and visitors

Accounts used for spamming were overlapping from month to month. But, most accounts were active for 2-3 months. Most domains used for emails are no longer active, and most domains were related to ‘adult toys’ and all sorts of variations. But, there were regular Gmail, Yahoo, and other popular free mail services. Total of 7233 IP addresses were used to deliver spam to Dev4Press during 2015.

Conclusion

This whole analysis was very useful to me to better understand how spam gets delivered, what type of accounts were used and how the comments look like, what kind of content is included. Different websites will get different results, influenced by the comments settings and policy regarding account registration, but, I have compared data from other websites that allowed free accounts registration, and they also show the tendency that most spam comments get delivered by registered accounts.

Next posts in this series will deal with methods for spam prevention, starting with use of honeypots, followed by reCaptcha and other spam fighting methods.

Rating: 5.0. From 2 votes. Show votes.
Please wait...

About the author

MillaN
MillaN
Dev4Press owner and lead developer

Programmer since the age of 12 and now WordPress developer with more than 8 years of WordPress experience, author of more than 100 plugins and more than 20 themes.

Learn More

GD Security Toolbox Pro
A collection of many security related tools for .htaccess hardening with security events log, ReCaptcha, firewall, and tweaks collection, login and registration control and more.
GD Security Toolbox Pro Logo

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *