Hulu is an online video service that offers a selection of hit TV shows, clips, movies and more on the free, ad-supported Hulu.com service and the subscription service Hulu Plus. One of the top video streaming sites in the U.S., today the service has over four million subscribers and approximately 30 million unique viewers per month.
Inability to Scale MySQL and Memcached
In 2012, Hulu’s subscriber base passed the two million mark and the back-end systems that tracked viewer history started to breakdown. When a video is played, the system records information from the player to keep track of both the video and the viewing position or timeframe. When the video application is closed, the stored information allows the user to resume the video where they left off. The system also provides recommendations for what videos to watch next based on user history.
Originally designed as a Python application, Hulu’s viewed history tracking system relied on Memcached for reads on top of a sharded MySQL database for writes. When the Hulu engineering team started to see that MySQL couldn’t handle the volume of writes, the only way to scale was to add more shards. Reads were done in Memcached to preserve I/O on the database, but Memcached could not be replicated. So, user history was served out of one shard in one datacenter.
With the occurrence of peak time failures and an understanding of root causes, the core Hulu engineering team began to design a solution with four overarching requirements:
The Path to 3 Million Subscribers
After looking at a variety of NoSQL alternatives like MongoDB, Riak and LevelDB, Hulu selected Redis. Describing the process andres Rangel, Senior Software Engineer, stated, “We chose Redis for several key reasons – it was simple to set up, had great documentation, offered replication and allowed us to use data structures. Data structures are extremely powerful and allow us to architect solutions to many use cases very efficiently. For example, depending on the operation, we have the need to query either a specific video a user watched, or all of them. With Redis, this was easy using hashes.”
To meet all requirements, there were some minor areas that needed additional development. First, the Hulu team took a look at how the data was sharded. They were able to easily shard on user_id. “We scale Redis by sharding the data and the intelligence about shards is in the application logic,” noted Rangel. Second, Redis didn’t have the Sentinel implementation of monitoring and automatic failover at that time. Since Redis has a open API, the Hulu team was able to create their own Sentinel mechanism to support high availability.
Open Ended Scaling for Reads and Writes
“Since Redis supports replication, it became possible to reorganize the data map so writes and reads could be easily separated, load-balanced and scaled across datacenters,” said Rangel. Reads are routinely balanced across Redis shards. Each shard is replicated to a set of slaves in each datacenter. A user only exists on a single shard, which ensures that newly-added users distribute evenly across the shards. The architecture is highly repeatable and provides Hulu with a linear scalability path.
800% Performance Improvement for Queries
With queries running on dedicated, load-balanced slaves in regional datacenters – instead of all out of the west coast – speed and performance improvements were expected for this new architecture. According to Rangel, “For performance considerations, we decided to pre-shard the system into 64 instances. We replicate the master shard to a slave in the same datacenter and to a slave in the second datacenter. This way, applications in the other datacenter read locally from the Redis slave and achieve greater performance. The result was that 75% of the latency in reads from the east coast was reduced from 120 ms to less than 15 ms and 90% went from 300ms to around 25ms.”
Greater Performance with Data Durability
To build data durability into their system, Hulu decided to use Apache Cassandra as the persistent data store where all writes are made. As data is ingested, it is written from Cassandra to Redis. As Rangel describes the solution, “The first time a request comes for a user, the system will create a job to load all the videos for this user into Redis. Once this is done, the system will update a flag. The next time a request comes in for this user, the flag is set. Then, the system returns whatever it has from Redis without hitting Cassandra. This way, access to the database is greatly reduced and we aren’t required to have every record in Redis. When Redis queries are faster than Cassandra by a huge margin, we achieve the low latency reads for active users by having their data in Redis. This means we can leave Cassandra for batch reports where the latency is not important.”
As Hulu pursues a superior experience for users, content owners and advertisers in the future, they are confident in the long-term scalability of their back-end systems. The features in Redis will continue to provide a high-performance data tier as Hulu’s user base grows.