How to improve WordPress/Apache performance on a relatively heavily loaded server using Varnish and more
Scenario
One server (8 cpu cores, 32gb of memory, RAID5 disks) running WordPress (PHP 5.3.x and MySQL 5.1.x). WordPress site(s) would often grind to a halt as the server struggled to respond to requests quickly enough.
Problems
Lack of available I/O capacity, identified by :
- Numerous processes often in a ‘waiting for I/O’ state (as shown by ‘vmstat’)
- MySQL becoming backed up with queries and frequently overwhelmed (running out of connections; large number of queries waiting in the process list). Problem especially apparent when a new post was made to the site – as this causes a flushing of the MySQL query cache, any on disk cache’s from WordPress and any related caches (e.g. the cached copy of the front page would need regenerating).
- Performance graphs from ‘munin’ plot I/O operations along with other virtual memory subsystem events.
Solutions
- Profile WordPress code to identify expensive queries, add additional caching where possible. Using e.g. the debug-objects plugin to provide timings and help identify the origin of some queries.
- Caching of pages using e.g. WPSuperCache or W3 Total Cache was already in place but not sufficient
- Use memcached, rather than disk as an object cache for WordPress
- Use the BufferedLog directive within Apache to try and bulk up I/O operations
- Install Varnish as a front end cache, using a malloc memory backend to reduce disk I/O (no need to read file(s) from disk)
Of the above, use of Varnish has had the biggest impact on disk I/O – resulting in a significant decrease. It also led to a significant decrease in the number of Apache processes in use (each of which is relatively memory hungry) – this is possible as a large part of the site is static content (images/stylesheets/javascript) and unchanging.
Varnish configuration
- Use a malloc backend, 10GiB in size. (See /etc/default/varnish on a Debian based system). Configure to use “-s malloc,10G” as a startup parameter. Before using the ‘malloc’ backend (and using the default file backend) there was no improvement in I/O and performance problems persisted.
- Ignore cookies for all javascript, images and stylesheets to improve the cache hit rate
- Ignore query parameters on all images, javascript and stylesheets – again to improve caching
- Force a future expiry date on static assets to encourage browser side caching of files
- Ignore all cookies for HTML/PHP content, unless they contain “wordpress_logged_in” – which is a suitable indication that the end user is authenticated.
- Add a backend health probe – so Varnish can display a suitable error page to end users – rather than showing either a Apache/Wordpress error message or timing out.
Pretty Pictures
Here are some graphs taken from the server, which give some idea of the impact Varnish has had.
Firstly, we have the number of accesses registered by Apache – before and after Varnish was introduced. Because Varnish sits infront of Apache, when it is introduced the workload of Apache drops.
This next graph shows the hit rate logged by Varnish – i.e. the number of hits per second it deals with. As varnish is made ‘live’ it’s hit-rate increases while Apache’s decreases.
Before the 07th, Varnish was only accessible for testing, on/after the 7th the site’s DNS entries were changed to route traffic through Varnish. Further configuration changes were made to improve the cache hit rate.
Finally, we have the I/O graph – note how initially varnish doesn’t help with the I/O load on the server (if anything it makes it worse between the 7th and 10th).
On the 10th Varnish was reconfigured to use the malloc backend – at which point the I/O load drops down and appears to remain more consistent.