has_many :codes

Full page caching in Rails part 2 - memcached and middleware

Published  

In a previous post, I showed how to implement full page caching in Rails with Nginx and Redis. I have in the meantime changed my approach for DynaBlogger and I am now using a memcached cluster for the cache instead of Redis, and a Rack middleware to serve cached pages instead of Nginx. Nginx may be a little faster since it allows serving content directly from cache, bypassing Rails entirely, but there is a big advantage with the new approach: Rails and Nginx "speak" different algorithms when hashing distributed cache keys in a Redis or memcached cluster (as opposed to a single Redis/memcached instance), making it difficult to write cache entries with Rails and read them with Nginx. The Nginx approach still works just fine with a standalone Redis/memcached instance but it's not as scalable.

By using a middleware instead of Nginx, both reads from and writes to the cache cluster are performed by the same Rails app, so there are no issues with incompatible hashing algorithms. Performance wise, I still see thousands of requests per second even with the Rack middleware, since most of the Rails stack is bypassed anyway. Additionally, I have switched from Redis back to memcached because after further testing I found that using a memcached cluster is quite a bit easier than using a Redis cluster (which would require Redis Cluster Proxy, adding yet another dependency), since Rails supports multiple memcached servers nicely out of the box. I do lose persistence this way, but it's a trade-off I am willing to accept for simplicity.

Part of the setup is similar to what I described in the other post. I still use the actionpack-page_caching gem to have Rails write rendered pages to cache, so if you go with the new approach you will still need to add this gem to your Gemfile. The initializer (config/initializers/page_cache.rb) that monkey patches this gem in order to read from/write to memcached instead of the filesystem is the same too:

require "action_controller/caching/pages"

Rails.configuration.to_prepare do
  ActionController::Caching::Pages::PageCache.class_eval do
    private

    def delete(path)
      return unless path
      Rails.cache.delete(path)
    end        
  
    def write(content, path, gzip)
      return unless path
      Rails.cache.write(path, content, raw: true)
    end        
  
    def cache_path(path, extension = nil)
      "#{domain}#{path}"
    end        
  end
end

I am using the domain here to define a prefix for the path used in the cache keys, so in the PublicController (which is the controller I use to serve pages for the user sites) I have the following:

class PublicController < ApplicationController
  self.page_cache_directory = -> { request.hostname }
...

In my case each user site can have custom domains as well as a dynablogger.net sub domain, but in your app you may just use something other than the domain if you prefer, it doesn't really make much difference. In the same controller, I have a call to caches_page which ensures that the actions specified result in a cache write for the relevant path provided that the action renders successfully:

caches_page :index, :document, :author, :tag, :archive

Of course you'll need to specify the actions that you want to cache according to your own app. The above answers the question of how to write pages to memcached; to read from cache we'll need to create a middleware in lib/middleware/page_cache.rb or wherever you prefer:

class PageCache
  def initialize(app)
    @app = app
  end
  
  def call(env)
    request_method = env["REQUEST_METHOD"]
    domain = env["HTTP_HOST"].downcase
    request_path = env["PATH_INFO"].downcase
    is_cable = request_path == "/cable"
    is_asset = request_path =~/.*\.(jpg|jpeg|webp|gif|css|js|xml|txt|rss)$/
    
    if !is_cable and !is_asset and (request_method == "GET" || request_method == "HEAD")
      cache_key = "#{domain}#{request_path}"

      if (body = Rails.cache.read(cache_key))
        headers = { 
          "Content-Type" => "text/html",
          "X-Content-Type-Options" => "nosniff",
          "X-Download-Options" => "noopen",
          "X-Frame-Options" => "SAMEORIGIN",
          "X-Permitted-Cross-Domain-Policies" => "none",
          "X-XSS-Protection" => "1; mode=block",
          "Access-Control-Allow-Origin" => "*",
          "Access-Control-Allow-Methods" => "GET, HEAD",
          "Strict-Transport-Security" => "max-age=15724800; includeSubDomains",
          "Content-Security-Policy" => "default-src * 'unsafe-inline' 'unsafe-eval' data: blob:;"
        }

        cloudflare_ip = env['HTTP_CF_CONNECTING_IP']
        forwarded_ip = env['HTTP_X_FORWARDED_FOR']&.split(", ")&.first
        remote_addr = env["REMOTE_ADDR"]

        client_ip = (cloudflare_ip.blank? ? nil : cloudflare_ip) ||
          (forwarded_ip.blank? ? nil : forwarded_ip) ||
          (remote_addr.blank? ? nil : remote_addr)        
  
        user_agent = env["HTTP_USER_AGENT"]

        if request_method == "GET"
          Rails.logger.info "GET From cache: path=#{request_path} host=#{domain} client_ip=#{client_ip} user_agent='#{user_agent}' time='#{Time.now.utc.to_s}'"
          [200, headers, [body]]
        else
          Rails.logger.info "HEAD from cache: path=#{request_path} host=#{domain} client_ip=#{client_ip} user_agent='#{user_agent}' time='#{Time.now.utc.to_s}'"
          [200, headers, [""]]
        end
      else
        status, headers, body = @app.call(env)
        [status, headers, body]
      end
    else
      status, headers, body = @app.call(env)
      [status, headers, body]
    end
  end
end

I have removed a few things that are too specific to my app but this is the basic gist of it. Let's see what's happening in this middleware. First, we figure out the request method, the domain, the path of the request, and whether the request is for ActionCable or an asset. We only want to read from cache if it's a GET or HEAD request and not websockets or an asset, otherwise we fall back to the Rails app to proceed with further processing of the request as usual. 

Once we determine whether the request is of a cacheable type, we then attempt to fetch the actual content from cache using the expected cache key. If the content is in cache, we return that content as the body of the response and add an entry to Rails' log. As you can see, we set some basic headers to improve security (these headers are similar to the headers Rails would set if it continued processing the request; in this case I am configuring the content security policy header to allow anything because it doesn't pose any security risks for my user sites - there's no authentication or anything that could be exploited with XSS etc - and also because I want to let users embed content from other sites if they wish, such as from YouTube, Twitter or others which require adding some JavaScript to the page; in your case you may have to configure CSP differently). Finally, if the content was not present in the cache, we let Rails continue the processing of the request as it would do without this middleware. 

The final step we need to do is enable this middleware. In production.rb (and any other environment that should use the cache), add the following:

require_relative "../../lib/middleware/page_cache"

Rails.application.configure do
  ...
  config.middleware.insert 0, PageCache
  ...
end

As you can see, we insert this middleware at the top of the chain, so that it's the first to be executed and in order to return the cached content as quickly as possible bypassing most of the Rails stack.

That's it! It's a simpler approach compared to having Nginx as an additional dependency and using a Redis cluster, and the result is basically the same, still with very good performance. Like I mentioned in the other post, full page caching is a good option for pages that do not require authentication nor any customisation depending on the user; in all the other cases it's best to stick to other types of caching such as fragment caching. But when it can be used, full page caching can improve performance dramatically, at the expense of very little added complexity as shown in these posts. One more thing you will need to do though, which I haven't explained here because it's specific to your app, is cache invalidation. You need to make sure that cache is invalidated for a page whenever the relevant data is updated, otherwise stale content will be served to the client. For DynaBlogger, this means that the cache is invalidated whenever some content or theme assets change.

Let me know in the comments if you run into any issues implementing this in your app.

© Vito Botta