Sometime ago we were optimizing a Magento shop for speed, and discovered that numerous robots were crawling the site daily causing a continuous extra load on the server. It made us think: Did those robots actually need the entire Magento site to be loaded dynamically? Or could we implement something to speed up things tremenduously?
The problem: 3 robot-requests per second
The main issue was that the Magento shop had become quite popular with search-engines. Not only common search engines like GoogleBot (we all want that one), Bing and Baidu (for our Chinese fans), but also less known search-engines like MJ12 and Ezooms. The Magento shops htaccess-file had a great rule Crawl-Delay: 3 but not all sites seemed to respect that delay.
At some times, the server had to process 3 robot-requests per second - pure calls to the Magento application, so excluding CSS, JavaScript and image-requests. There were some performance-issues with the Magento site anyway, which caused a page-load to take over 3 seconds, but still 3 requests per second is quite much. Especially, if those requests are not caused by real-time visitors - in that case, we would speak of a great success.
Lightening the load
To lighten the load, of course we started to optimize the Magento pages and got the load down from 3 seconds to 1 second. But still, the server was stressed repeatedly simply because various robots fired away their requests so quickly, that the Apache process itself was causing harm. We had some limits as well: No IP-blocking of those robots. No migration from Apache to Nginx. No downtime. Simply optimize the performance now.
Zooming on each Magento page, analysis showed that most of the Magento blocks that caused the highest load were dynamic blocks that were either generated by using details from the customers session (recently shown products; shopping cart; checkout pages), while some other dynamic blocks refreshed their content on every page.
Caching things specifically for search engines
The first thing that hit me was that search engines actually don't use sessions actively, so by caching those blocks specifically for search engines the load would lighten and the stress on the server would be less. Come to think of it, most search engines did not crawl the site more than daily, so random content on a specific block could be cached as well daily.
We started to experiment a bit and found out this worked as expected: In peak hours, various robots were still indexing the site by requesting Magento pages like crazy. During those same hours, regular visitors experienced a slower site which was unwanted. But by serving cached content to those robots (and letting them request their stuff at the same crazy pace), the server had more power to serve dynamic content for regular visitors - thus the site remained speedy at all times.
Yireo SearchEnginePageCache extension
So here it is: Our new Magento extension SearchEnginePageCache bundles this behaviour for you to reuse. It simply caches the entire page (yes, every page within Magento) without any full page cache hole punching, and serves this cached content only to user-agents that match a search-engine-pattern. Which user-agent is a search-engine is determined by a built-in list within the extension, but the list can also be extended through the Magento System Configuration.
Checkout the extension-page for more details on this cool new extension that makes your site fly for search-engines.
About the author
Jisse Reitsma is the founder of Yireo, extension developer, developer trainer and 3x Magento Master. His passion is for technology and open source. And he loves talking as well.