Flickr "10+ Deploys Per Day" @ Velocity 2009

June 25, 2009 Pivotal Labs

My favorite talk at Velocity was by Paul Hammond and John Allspaw from Flickr, who are doing real lowercase-a agile:

UPDATE

Here’s the video, highly recommended:

“If there’s one thing you do, it should be automated infrastructure”. This was a refrain through the conference – as Theo Schlossnagle put it, it doesn’t matter if it’s chef, puppet, bcfg2, cfengine – whatever works for you, just do it.

Some of their techniques:

  • One-step build. They literally go to a web page and click a button and watch the build take the full site from soup to nuts.
  • Deploys: Who. What. When. You want to make all the meta-details of a deploy easily visible to anyone. Deploy logs are readily accessible.
  • Always ship trunk. In a webapp this is possible. It vastly simplifies – everyone knows where to look for what’s going out, and what’s live.
    • Flickr does their branching in the code (take note git people), and these become natural ops levers (i.e. uh-oh turn that feature off / turn it down).
  • “Dark launches”. Facebook does this too: launch the guts of a new feature with the UI turned off so you can see how the technology works in real production.
    • Which results in anticlimactic launches, the best kind
  • They roll forward (not back) to turn features off that aren’t working.
  • “Gather shitloads of metrics”. System metrics, app metrics, everything. John has a great writeup on their Ganglia setup in his book.
    • Developers watch the metrics just as obsessively as ops. Visualization is a powerful tool used day-to-day by the whole group.
    • Developers have a way of putting in their own metrics via a little framework.
  • They’re all on IRC. This kept coming up at the conference – teams are on some chat tool like IRC, Skype chat LINK, or Campfire LINK.
    • Important events are piped into IRC, so you’ll be right in the middle of a conversation and an alert will pop in.
  • Logs are piped into a search engine so they can find things in the log history, easily.

Culture, philosophy:

  • There’s an ongoing conversation between dev and ops. They’re learning to solve the Flickr problem together. Each side’s way of thinking informs the other (major conference theme as well)
  • Failure will happen. Develop your ability to respond. Like ER doctors you practice on failures, that makes you better/competent at handling what comes along next.
  • In addition to the ops people on call, there’s always a developer who has a pager

More:

About the Author

Biography

Previous
Automated Configuration @ Velocity 2009
Automated Configuration @ Velocity 2009

Theo Schlossnagle: "I don't care what you use: puppet, chef, bcfg2, cfengine - choose one and automate your...

Next
An easy way to write named scope tests
An easy way to write named scope tests

The project I'm working on has a lot of named scopes which are really great. If you're not using them alre...