A week ago, my website suddenly started becoming non responsive. Pages would take forever to load and going to the dashboard to post was a nightmare, and worst of all, I was starting to get inundated with “500 coded” database connection errors. I place a lot of emphasis on keeping my site running fast and smoothly and delays of this sort were disastrous for me. I was at my wits end.
I have a shared hosting plan with Bluehost and also use the free Cloudflare service. So not a lot of resources at my disposal either.
Errors of the sort shown in the screenshot below would often disappear if I refreshed the page. I tried just “waiting it out” figuring that Bluehost was having issues at their end, but it continued on for several days. At times my blog would become completely unresponsive and just sit there trying to connect and would then show an error page.
Later, my server logs revealed that I was getting around 1500 error results on my site every day! That meant legitimate visitors were being forced to wait for long periods of time and ultimately not getting the content they came for.
The WordPress forums didn’t help. They all dealt with issues like the password combo being wrong, or replacing “localhost” with the name of the server in wp-config.php. But none of these account for the fact that the errors were random. If there was really a configuration problem, they wouldn’t vanish on a page refresh. Besides, I’ve been maintaining this blog for several years now and there’s never been anything of this sort before. I hadn’t changed anything.
So I called Bluehost to complain. After the woman on the other end spent time consulting with the tech person, she told me that my site was getting overwhelmed by visitors. The shared plan I was on allowed only 15 database connections per second. She told me to be happy that I had outgrown the plan I was on! I told her this wasn’t possible because my Google analytics wasn’t showing any such dramatic increase in visitors. She then visited the stats page and there we found the problem.
My site was swamped with spam and bots. By half the month, one single unknown bot had eaten up 1.5 GB of bandwidth and had hit my site 29,000 times! In addition, Bluehost has a CPU throttling facility where if a site uses up too much CPU power to the detriment of the others, it’s cut down. That screen was showing me that my CPU was being throttled for around 60,000 seconds a day – meaning almost all the time! Well, at least now that I knew what was wrong I could try and do something about it.
I had become complacent about traffic and bot management on WordPress because I use Cloudflare and they’re supposed to block out most undesirables. Unfortunately, I had found that even the “medium” security setting was blocking many legitimate users of my site who had complained, so my filter was on “low”. Clearly this wasn’t doing a good enough job. So I temporarily put it on high while I sorted out the issue at my end.
To no avail. The bots kept coming, and now that I knew what to look for I was shocked at how aggressive they were. My site was completely inaccessible and I had to switch on the “I”m under attack!” mode in the Cloudflare control panel where all visitors got a 5 second delay and turned aside if they were bots. I had to do this to access my site while I tried to fix things.
“Bad Behavior”, “Better WP Security” and “ecSTATic”
I noticed that my “wp-login” page was being hit the most. I had to beef up my security. So two plugins came to the rescue. “Bad Behavior” is one that modifies your .htaccess file to block all kind of baddies at your doorstep. It also allows you enter your Project Honeypot key so that all known visitors with a threat level of 25 and above are stopped. I had to sign up with Projecty Honeybot to get a key and also enabled “Strict checking” in the plugin. I’m not sure what that option does, but it blocks more malicious visitors.
More important than Bad Behavior though was a plugin called “Better WP Security” that really hardens your site. Among other things it changes the admin and login page locations to something else, thus denying bots convenient attack locations returning a 403 instead of wasting bandwidth on your site even if they don’t succeed. This alone saved my site from thousands of hits. But the plugin also does a LOT more from removing the default “admin” user, to allowing you to ban bots and users from your site and also scans it for security vulnerabilities.
The final plugin is called “ecSTATic”. This is a very powerful plugin for banning bots from your site. There is an option called “WTF” or “Way too fast” which you configure to ban any bot that makes too many requests within a certain period of time. Plus you can also block all unknown bots which is fairly safe I guess since the “known” list is pretty extensive. And you can configure detailed rules for denying access based on user agent or IP addresses.
Banning bots like this is necessary if they don’t follow robots.txt as described in the next section.
I had ignored my robots.txt for quite a while since I believed that cloudflare would filter out misbehaving bots. But it didn’t. Even at the “high” security levels, cloudflare doesn’t do a good job of protecting your site from bad behaving bots that eat up your bandwidth and hit your database. It’s up to you alone.
So the first thing you do is to check your server logs. Every site has a few irritating spiders that hit it too often. My bane during this difficult time was “80 Legs” and “008 AhrefsBot”. So I disallowed them in my robots.txt file like so:
Unfortunately, only 80 Legs followed robots.txt. AhrefsBot simply ignored it. That’s what bad bots do and they deserve to get their ass banned either using the plugins I showed earlier, or using .htaccess. This practice alone has relieved my site of a ton of traffic.
Blocking URL Parameters and Archive Pages
Going over the list of pages crawled by Google and others, I found that a whole ton of useless URLs were being crawled. As a WordPress blog, all my posts have multiple “replytocom” URL parameters. One for each comment I think. And these were being crawled by Google – around 7000 pages worth! And I have just 464 unique posts. The solution is to block these URL parameters from being crawled using robots.txt. So for me, the relevant entry was:
This puts a stop to a lot of crawls that were eating up my bandwidth and unnecessarily hitting up my database. There’s also an option in Google’s Webmaster tools that can let you specify what URL parameters you don’t want them to crawl. But the robots.txt entry is more elegant if you have control over it. Otherwise, the webmaster tools section will do just fine – but only for controlling Google crawlers naturally.
End Result of the above steps
My efforts paid off. In the next few days, my “500 code” database errors disappeared to almost nothing and my website loading speeds went back to normal levels. I kept a close watch on my stats and visitors as well as the Bluehost throttling panel for the next week to catch any relapses and ban any more misbehaving bots. But so far, it’s been pretty smooth sailing.
I saved my WordPress blog of six years for free and improved my security to boot!