Viacom is one of the world’s largest media conglomerates and is sitting right at the forefront of one of the hardest problems in data today: how to deal with dynamic video content at an escalating scale.
To put perspective on how quickly this challenge is escalating, in 2010 the world hit the zettabyte mark for all content on the internet, and in 2012 alone we added 2.8 zettabytes more, most of which is unstructured data including video and photos.
Consisting of VMN (formerly MTV Networks, Paramount and BET, Viacom is a veritable media giant supporting a variety of very popular websites including The Daily Show, Tosh.0, South Park Studios, and GameTrailers.com, among others. Being a media company, these sites are updated often throughout the course of the day with new text, photos and video clips.
Recently, we had the opportunity to learn from Viacom’s Senior Architect Michael Venezia about how they approach the complex updating of their content-rich websites using Redis. Below, Michael shares the background on their PHP-based architecture and scale requirements, how they ended up choosing Redis, and explains 8 uses cases where Viacom is using Redis to help manage the dynamic nature and scale of their web properties.
The Background on Viacom’s Website Architecture
For Viacom, spanning content across their many popular sites drives their software team to put a focus on scale and content relationships so they can get the content out to the right places quickly and serve their viewers.
Even though a single site like The Daily Show, Nickelodeon, Spike, or VH1 might accrue a few million page views per day, the sites can also experience 20-30x surges in traffic when a piece of content goes socially viral. In the past, it was acceptable to take up to half a day to publish new content, but the realtime nature of media these days now drives the team to update content in seconds or minutes. Naturally, this makes dynamic scale and speed an important foundation to every part of the architecture.
In addition to dynamic scale, the websites are designed to offer visitors a lot of content tailored to their viewing profile or geography. For example, a page might relate a single episode of video to local promotions, additional episodes of the video series, or even related videos. They have built a software engine behind the web pages that relies on a detailed meta-data to automatically build pages that encourage users to stay as long as possible on the site by presenting additional content that follows the users real-time interest. Since this is so dynamic, the data model for the content is quite extensive—almost graph-like in nature with extensive joins.
There is also an effort to reduce the number of copies of large-file content like videos. For example, a single record in the data store is the popular Southpark episode “Cartman gets an Anal Probe”. However, this episode may appear also on German sites. While the video is the same, the way English users search for the content actually is through separate records. Copies of meta-data translate the search results and point to the same video. So while an American may search on the actual title, a German viewer could search for the translated title, “Cartman und die Analsonde” on the German website.
These meta-data records overlay other records or objects and can be used to alter the content based on the context of use. These overlay records allow several rule-sets to be evaluated as well including preventing restricted content from showing up in various geographies or devices.
The Viacom Approach
Many organizations serving this type of content expect to go the route of using an ORM and traditional data store. Interestingly, Viacom uses a different approach.
Essentially, for speed, they can not afford to talk to a database directly. Firstly, because they are dealing with mostly streaming content, they prefer to use Akamai to distribute content geographically. Next, the pages are complex and could literally have thousands of objects to fetch. This amount of fetches is unreasonable for performance, so JSON is used and provided by a data service. Of course, caching these JSON objects becomes an important aspect of the website’s speed. And the cache needs to be actively updated when content and content relationships change.
To solve this, Viacom relies on sets of object primitives and super objects. Continuing the South Park example, a primitive “episode” object exists that contains all the information pertinent to that episode and a “super object” episode which flushes out how the episode links to associated shows, seasons and finds the actual video objects. The concept of super objects is really useful for the page developers making it easy for them to automatically build out pages with low latency. These super objects are effectively the primitive objects mapped out and saved into cache as a separate item.
To help explain how these objects work internally, Michael calls the primitive objects ‘atoms’ and the large ones ‘molecules’ because it is easier for many people to remember molecules are made up of atoms.
How did Viacom start working with Redis?
Each time Viacom uploads a episode it creates the primary object and attaches it to a super-object. With each update, they need to re-evaluate the primary object for any change and regenerate all composite objects. As well, the system needed to invalidate URL requests within Akamai. The combination of the existing architecture and the new need for a more agile management approach lead Michael to Redis.
As Michael looked for solutions, he needed something to work with PHP because Viacom’s sites mainly ran on PHP. Memcached was previously used for object storage, but it didn’t really work well with hashmaps, and he needed a better solution for evaluating the invalidation step that started with understanding the content dependencies. Basically, Michael wanted to easily follow a chain of dependencies forward and backward in the invalidation step. So, they looked at using Redis with Predis to help solve this problem.
Other Ways Viacom Uses Redis
Of course, if someone is going to use Redis for object dependency graphing, it might make sense to use it for object storage too, right? This was the next logical use case for the architecture team. However, they had to get system admins to buy in. With Redis, the ops team bought in mainly because of the persistence and replication features. Several development cycles later, Redis was being used for storing the entire set of objects and their relationships.
The next two use cases for Redis were buffering activity tracking and view counts. Activities are now flushed from Redis to durable storage in MySQL every few minutes, and view counts are stored and served from Redis. Redis is also used to calculate the popularity of content based on a point system that includes both number of views and recency—in other words, if something was viewed more recently, the popularity ranking are influenced. You can imagine that doing these types of calculations every 10 or 15 minutes on a large body of content will break a more traditional RDBMS like MySQL. Viacom’s approach for replacing MySQL with Redis was simple—a Lua batch job now runs on one instance of Redis that stores view information and tabulates the scores. The information is copied to another Redis instance to support production queries. Still, a copy of the scores is stored in MySQL for future analysis. The approach with Redis has proven to take 1/60th the amount of time than in MySQL alone.
The sixth way Viacom uses Redis is to store information on asynchronous jobs. Information is inserted into a list, and workers use the BLPOP command on the queue to grab a task from the top of the queue. As well, zsets are used to aggregate data from various social networks like Twitter and Tumblr. Lastly, Redis syncs several content management systems with the Brightcove video player.
Across all of these use cases, almost every Redis command is used—sets, lists, zlists, hashmaps, scripts, counters, and more. With these 8 use cases, Redis has become a key part of scaling Viacom’s architecture to support one of the world’s largest and most dynamic user bases.
About the Author
BiographyMore Content by Stacey Schneider