Using Redis at Pinterest for Billions of Relationships

July 25, 2013 Adam Bloom

Pinterest has been one of those amazing Silicon Valley stories—they grew over 1047% for PC-based use in 2012 , 1698% for mobile use, and had 53.3 million uniques in March of this year. People have followed billions of things on Pinterest—a complex engineering problem since almost every screen of the user interface performs a query to see if a board or user is already followed. This happens to be the wheelhouse for Redis.

Over the past couple of years, Pinterest has blossomed to become one of the leading citizens in the world of media, social or otherwise. If they haven’t gotten your attention yet, here are a few more factoids:

They drive more referral traffic than Google+, YouTube, and LinkedIn combined.
They are considered the third most popular social network after Facebook and Twitter.
Shoppers referred by Pinterest are 10 percent more likely to follow through with a purchase than visitors from other social networking sites.

As you can imagine, we are talking about a high-scale site that makes strong demands on the IT infrastructure with it’s unique user experience.

Optimizing the User Experience without a Cache

Recently, Abhi Khune, Engineering Manager at Pinterest, published a great article about the demands of the user experience and the use of Redis. Even for savvy web application builders, you wouldn’t necessarily catch these features without analyzing the site in detail, but they are there. First, there is the previously mentioned check for followers on each screen. In addition, the UI shows accurate counts and paginated lists of a user’s followers and follows as well as a board’s followers in many places. To perform queries like these for each mouse-click requires a high performance architecture.

Logically, the Pinterest software engineers and architects had MySQL and Memcache already in place, however the caching solution had already reached their limits, and to better serve their users the cache needed to be expanded. In fact, to really perform, the engineering team found that a cache was only useful if a user’s sub-graph was in cache. So, whoever was using the system needed to be in cache, and this really led to caching the entire graph. As well, for one of the most common queries, “does user A follow user B,” the answer is often no, but this was considered a cache miss and meant a lookup on the data store. Expanding the cache meant they needed a new approach.

Ultimately, the team decided to store the entire graph in Redis to serve lots of lists. Immediately, Redis begins to show how it is truly different and acts almost like an in-memory, operational data store.

Storing lots of Pinterest lists in Redis

Next time you log in to Pinterest, remember that Redis is running in the background and storing several types of lists for you as a user:

A list of users who you follow
A list of boards (and their related users) who you follow
A list of your followers
A list of people who follow your boards
A list of boards you follow
A list of boards you unfollowed after following a user
The followers and unfollowers of each board

Redis stores the above lists for each of it’s 70 million users—it basically stores the entire follower graph, sharded by user ID. As you can see by the types of data in the list of lists above, analytical summary information is stored and viewed more like a transactional system. Pinterest’s current group of 70 million users are limited to 100,000 likes—rough math shows that, if the average person liked 25 boards, there would be 1.75 billion relationships between users and boards. If the average was 100 likes, there would be 7 billion relationships. And, this is a core feature—the relationships keep growing every day the system is used.

Redis Architecture and Operations at Pinterest

According to one of their founders, Pinterest started writing the application in Python and a modified Django—they continued this way to 18 million users and 410 terabytes of user data. While several data stores are used for all data, the engineers at Pinterest have stored the lists above by splitting the user id space into 8192 virtual shards where each shard runs on a Redis DB, multiple Redis DBs run on an instance, and multiple Redis instances run on a machine. They place multiple, single-threaded instances of Redis on a machine to fully utilize CPU cores.

While the entire data set runs in memory, Redis logs every write operation to disk on Amazon EBS for every second that passes. Scaling is accomplished two ways: 1) at 50% utilization, half of the Redis instances running on a machine are moved to a new machine by swapping a slave to a master or 2) nodes and shards are expanded. The overall Redis cluster is run in a master-slave configuration where the slaves are hot back-ups. Failure of a master, means the slave takes it’s place as a new slave is added, and ZooKeeper controls the process. They also run BGsave done hourly to a more permanent store on Amazon S3—this Redis operation saves the DB in the background. Pinterest uses this data for MapReduce and analytics jobs.

As you might see from this summary in this post, caches and a databases have limits, and Abhi’s full article provides a much greater level of detail into the reasons and approaches of using Redis to scale Pinterest. In the future, we plan to post about how Pinterest uses RabbitMQ!

Learn more about Redis:

There are Redis clients for ActionScript, C, C#, C++, Clojure, Common Lisp, D, Dart, Emacs Lisp, Erlang, Fancy, Go, Haskell, haXe, IO, Java, Lua, Node.js, Objective C, Perl, PHP, Pure Data, Python, Ruby, Scala, Scheme, Smalltalk, and Tcl.
How Redis is used at Viacom
How it Redis is used at Twitter
How Redis is used at Superfeedr

About the Author

Biography

Ember vs Angular – Templates

Over the last couple of weeks I’ve been fortunate enough to play around in both Ember and Angular. Having s...

New Open Source Download: Browser-based Schema Management for Pivotal SQLFire

Senior Systems Engineer, Pas Apicella, shares a new open source project—a browser-based user interface that...

Using Redis at Pinterest for Billions of Relationships

Optimizing the User Experience without a Cache

Storing lots of Pinterest lists in Redis

Redis Architecture and Operations at Pinterest

About the Author

Previous

Next

Related content in this Stream

Following the xz supply chain attack blog, explore security and trust in open source with VMware Tanzu's secure container solutions and proactive measures.

VMware Tanzu empowers Netflix accelerates its service evolution and boosts the capabilities of its development teams. Tanzu helps to provide them with the platform to run on and scale.

Unveil regulatory compliance ease with VMware Tanzu Spring Runtime! Elevate audits, adhere to FIPS & NIST standards, benefit IT, DevOps, and Auditors.

Uncover open source risks and the 'Zero CVE' myth with insights on continuous lifecycle management. Discover how VMware Tanzu supports diverse projects effectively.

This 7-part blog series provides a roadmap for architecting a data science platform using VMware Tanzu. We'll delve into the building blocks of a successful platform that drives data-driven insights.

This 7-part blog series provides a roadmap for architecting a data science platform using VMware Tanzu. We'll delve into the building blocks of a successful platform that drives data-driven insights.

This 7-part blog series provides a roadmap for architecting a data science platform using VMware Tanzu. We'll delve into the building blocks of a successful platform that drives data-driven insights.

This blog provides a summary of VMware Tanzu CloudHealth news and product updates for the month of April, 2024

This 7-part blog series provides a roadmap for architecting a data science platform using VMware Tanzu. We'll delve into the building blocks of a successful platform that drives data-driven insights.

How VMware Tanzu CloudHealth helps customers uncover spiraling AWS Extended Support charges.

VMware Tanzu enhances Spring development with simplified operations, accelerated innovation, seamless microservices transition, increased security, and effortless scaling.

This 7-part blog series provides a roadmap for architecting a data science platform using VMware Tanzu. We'll delve into the building blocks of a successful platform that drives data-driven insights.

This 7-part blog series provides a roadmap for architecting a data science platform using VMware Tanzu. We'll delve into the building blocks of a successful platform that drives data-driven insights.

This 7-part blog series provides a roadmap for architecting a data science platform using VMware Tanzu. We'll delve into the building blocks of a successful platform that drives data-driven insights.

Bitnami-packaged open source software is loved by developers for its ease of use, which enables developers to directly pull a Bitnami package and seamlessly start using it with little effort.

VMware Tanzu announces the General Availability of AWS Commitment Discount Recommendations, which provides recommendations for all reservable services in AWS through VMware Tanzu CloudHealth.

Introducing VMWare Tanzu Data Hub, a self-managed Database as a Service (DBaaS) Platform, providing enterprises a way to host their internal DBaaS offering for internal business users.

In the cloud-native landscape, MCAs drive seamless compliance integration. Their expertise ensures proactive security measures align with regulatory standards for sustained innovation & collaboration.