My favorite talk at Velocity was by Paul Hammond and John Allspaw from Flickr, who are doing real lowercase-a agile:
UPDATE
Here’s the video, highly recommended:
“If there’s one thing you do, it should be automated infrastructure”. This was a refrain through the conference – as Theo Schlossnagle put it, it doesn’t matter if it’s chef, puppet, bcfg2, cfengine – whatever works for you, just do it.
Some of their techniques:
- One-step build. They literally go to a web page and click a button and watch the build take the full site from soup to nuts.
- Deploys: Who. What. When. You want to make all the meta-details of a deploy easily visible to anyone. Deploy logs are readily accessible.
- Always ship trunk. In a webapp this is possible. It vastly simplifies – everyone knows where to look for what’s going out, and what’s live.
- Flickr does their branching in the code (take note git people), and these become natural ops levers (i.e. uh-oh turn that feature off / turn it down).
- “Dark launches”. Facebook does this too: launch the guts of a new feature with the UI turned off so you can see how the technology works in real production.
- Which results in anticlimactic launches, the best kind
- They roll forward (not back) to turn features off that aren’t working.
- “Gather shitloads of metrics”. System metrics, app metrics, everything. John has a great writeup on their Ganglia setup in his book.
- Developers watch the metrics just as obsessively as ops. Visualization is a powerful tool used day-to-day by the whole group.
- Developers have a way of putting in their own metrics via a little framework.
- They’re all on IRC. This kept coming up at the conference – teams are on some chat tool like IRC, Skype chat LINK, or Campfire LINK.
- Important events are piped into IRC, so you’ll be right in the middle of a conversation and an alert will pop in.
- Logs are piped into a search engine so they can find things in the log history, easily.
Culture, philosophy:
- There’s an ongoing conversation between dev and ops. They’re learning to solve the Flickr problem together. Each side’s way of thinking informs the other (major conference theme as well)
- Failure will happen. Develop your ability to respond. Like ER doctors you practice on failures, that makes you better/competent at handling what comes along next.
- In addition to the ops people on call, there’s always a developer who has a pager
More:
About the Author