Railsconf: HTTP's Best-Kept Secret: Caching – Ryan Tomayko (Heroku)

May 8, 2009 Pivotal Labs

HTTP’s Best-Kept Secret: Caching Ryan Tomayko (Heroku)

About Ryan

  • http://tomayko.com
  • Sinatra maintainer.
  • Rack core team.
  • Creator and maintainer of Rack::Cache.

Http Caching?

  • NOT Rails Caching
  • HTTP caching headers in requests: Cache-control: If-Modified-Since: If-None-Match:
  • and responses: Cache-control: Last-Modified: ETag: Vary:
  • This stuff is defined in RFC2616, we won’t be going into this that deeply.

Types of Cache

Client cache

  • Built into browsers and other types of client.
  • 1:1 relationship between cache and client. The cache only serves one client (private cache).
  • How much bandwidth does each cache save: can’t beat it.

Shared Proxy Cache

  • Setup for an organization
  • 1:many relationship between cache and clients. Serves more than one client (shared cache).
  • Is closer to the client than the server, therefore saves a lot of bandwidth.

Gateway Cache

  • a.k.a. Reverse Proxy Cache
  • Situated inside of the origin site
  • 1:everyone relationship between cache and clients.
  • Reduces bandwidth the least.

Why cache?

  • The answer to this has changed over time.
  • In Nov 1990 there was 1 guy on the web – Tim Berners-Lee.
  • In Feb 1996 the web population was 20M. State of the art connectivity was a 28.8kbps modem. At that speed, loading the current http://yahoo.com (~350k) would take 2:48s. Bandwidth was the largest issue. RFC1945 HTTP 1.0 included the Expires: and Last-Modified: headers.
  • In March 1999 RFC2616 HTTP 1.1 was released. Addressed 1996 caching problems.
  • Today: we cache so we can scale. Keep your back-ends free from as much work as possible. Push as much work up the stack as possible.

HTTP 1.1 defines 2 caching models

Expiration

  • Back-end sets Cache-Control: public, max-age: 60
  • Gets cached in gateway cache an browser cache.
  • Public says it is good for many clients.
  • Cached for 60s.

Rails example

def show
  expires_in 60.seconds, :public -> true
  # stuff
  render ...
end

Sinatra example

headers['Cache-Control'] = 'public, max-age=60'

Validation (Conditional GET)

  • Back-end adds ETag or Last-modified, e.g. ETag: abcdef012345
  • Last-modified is redundant, basically there for HTTP 1.0 clients.
  • On 2nd request, gateway cache realizes it has this page in cache, then sends a GET /foo, Host: foo.com, If-None-Match: abcdef012345 to the back-end.
  • If back-end returns a 304 Not Modified, gateway cache returns cached version.

Rails example:

def show
  @foo = Foo.find(params[:id])
  fresh_when :etag => @foo,
  :last_modfied => @foo.updated_at.utc

Alternative idiom:

def show
  @foo = Foo.find(params[:id])
  modified = @foo.updated_at.utc
  if stale?(:etac => @foo, :last_modifed => modified)
    respond_to ...

Sinatra example:

get '/foo' do
  @foo = Foo.find(paramsp:id])
  etag @foo.etag
  erb :foo
end

Combine Expiration & Validation

  • Back-end sets Cache-control: public, max=age=60 and ETag: abcdef012345
  • In < 60 seconds, cache-control takes precedence
  • After 60 seconds, it queries back-end using ETag
  • Back end can then send back a 304 not modified with a new Cache-control: public, max-age: 60

Misc

  • Never Generate the Same Response Twice

Recommend using Rack:cache

gem install rack-cache

config.middlware.use Rack::Cache,
  :verbose          => true,
  :metatstore       => "fie:/var./cahe/rack/meta",
  :entitystore      => "file var/cache/rack/body",
  :allow_reload     => false,
  :allow_revalidate => false

The client controls what happens at the cache as well as the server using Cache-control. Refresh send Cache-control: no-cache. No-cache means gateway cache MUST revalidate ETag before sending response. This is bad and people can pound your back-end. :allow_reload => false disables this.

  • High-Performance Caches: Squid, Varnish (Heroku uses this)
  • Interesting discussion about ESI at the end.
  • Rails by default uses id of model, classname and last_updated to create an MD5 hash for etag.
  • Need to start with a seed that covers your release version, otherwise etag will not change. Rails now has a mechanism to handle this.
  • 2.3 branch has a new “touch” mechanism too.
  • Browser behavior differs and varies quite significantly when using SSL.

About the Author

Biography

Previous
Using Web Hooks
Using Web Hooks

Web Hooks evangelist Jeff Lindsay describes the powerful simplicity of integrating web hooks with your appl...

Next
RailsConf: Art of the Ruby Proxy for Scale, Performance, and Monitoring – Ilya Grigorik (AideRSS Inc.)
RailsConf: Art of the Ruby Proxy for Scale, Performance, and Monitoring – Ilya Grigorik (AideRSS Inc.)

Slides are online already Random nuggets from the talk: The overhead of most requests is calls out of a f...