This Big Data thing Is…Well…Big!
Early this year I made a big career move: after almost 7 years working at Yahoo! I joined Greenplum as our VP of Product Management. The excitement of the new job has been exhilarating — new industries to understand, Big Data challenges to solve, and the fast moving pace of a “start-up-like” company. I’ve always enjoyed learning new things — it was what I liked best about working in the central data team at Yahoo!. Greenplum is a place where I have continued to learn.
That said, there are a few patterns that I experienced at Yahoo! that I continue to see as I meet with Greenplum customers and prospects who want to tackle the world of Big Data. The first is an ever-growing need for analytic agility. Organizations are constantly challenged by the time and efforts required to extract insights from existing data assets and then convert these into actions (in the form of well-informed business decisions or data-driven applications). A portion of this challenge is solved by having the right platform — one of my favorite parts of being at Greenplum is when I can share with a customer how Greenplum’s Unified Analytics Platform supports an agile analytics environment.
But the right platform is only part of the solution. The other consistent query I get from prospects and customers is: “What is the right organizational model for me to be successful with Big Data?” And while it’s fun for me to talk about our platform, I think it’s this second aspect of the Big Data challenge that may be the toughest to solve. And the most critical aspect to success.
Strategic Data Solutions at Yahoo!
In mid-2005 I joined a newly formed group at Yahoo! called Strategic Data Solutions (SDS). I actually sought out a role in the group after reading an article in Information Week about the appointment of Usama Fayyad as Yahoo!’s Chief Data Officer. My persistence paid off and I was lucky enough to get hired, joining a group of other data-loving professionals (including my current Greenplum colleague Annika Jimenez. Yahoo! was among the earlier companies to realize that its data assets were in fact a strategic asset, and to make a big bet (in the form of what grew to a 500+ person organization) to extract the most value from this data. It turns out that the bet Yahoo! made back in 2005 is similar to what I see many non-Internet companies starting to do today. In industries that people may consider to be old-school when it comes to Big Data — insurance, manufacturing, utilities — we now see executives starting to anoint their own Chief Data Officers and rolling out strategic data initiatives.
So, when these customers ultimately ask, “What is the right organizational model for me to be successful with Big Data?” — I look back to the way that the Strategic Data Solutions organization was set up, and I see a lot of things that we did right. Of course there is no single cookie-cutter approach to organizational design that works in all situations, but I believe that the core philosophy that drove how SDS was set up can help give other companies a strong framework for how to think about their strategic data initiatives. I’ve listed below the core components of the SDS organization — you can view these both as “product lines” as well as organizations within the group. My assertion: when building out your strategic data organization and its capabilities, think in terms of the big functional areas below:
- Data Platform: at the core of any strategic data initiative is establishing a strong data platform that meets the core data provisioning needs of the organizations data consumers. Be careful not to confuse a data platform initiative with a more traditional “data warehouse” initiative. While one of the functions of the data platform may be to host or integrate with a data warehouse, the data platform also needs to support data sets that may not typically be in a data warehouse (documents, machine-generate logs, etc) and also needs to support workloads that aren’t well suited to a traditional data warehouse (sandbox-based analytics, feed provisioning to production systems, non-SQL data analysis, etc). At Yahoo! we made a big investment in building out a core data platform (originally a home-grown file-based system, and ultimately a combination of Hadoop and relational databases) to support the broad range of data consumers we needed to support.
- Business Intelligence: one of the mistakes we made early on in SDS was to abdicate responsibility for the delivery of the core business intelligence needs of the various Yahoo! business units. It was a convenient decision to make initially: it was an area with demanding consumers, difficult-to-prove ROI, and was frankly not as “sexy” as the other more advanced work that we wanted to do.Over time, however, we realized that supporting the business intelligence needs of our business stakeholders needed to be one of our core offerings. There were benefits in terms of data re-use, stakeholder relationships, and other economies of scale that made this the right thing for SDS to do. By successfully supporting the BI needs of our business partners we were able to “earn the right” to engage with them on the more advanced analytics and data services we had to offer. The key to success here was to (appropriately) view our BI investment as a cost center. We avoided getting caught up in the losing battle of trying to show the ROI of our BI efforts by instead focusing our ROI-based initiatives in areas where we could, in fact, show true returns (see below).
- Data Science Services: within SDS we worked hard to enable customers (the various Yahoo! product lines & business units) to derive “actionable insights” from the data asset we created with our Data Platform. Often the data science skills required for anything other than traditional reporting and BI weren’t resident in the various lines of business. (In fact, we continue to see this challenge today, and are working to help solve the Data Scientist skills shortage through things like our innovative partnership with Kaggle ) So SDS built out a consultancy-oriented group to help our customers move to the next level of analysis. The ultimate goal of our engagements with the business was twofold: first, the Data Science team was devoted to solving data-driven problems that resulted in a measureable ROI (increasing ad clickthrough rates, reducing churn, improving customer acquisition); second, we wanted to train our internal business customers on how to use the Data Platform and associated tools to do subsequent Data Science projects on their own.
- Data Driven Applications: the ultimate goal of a lot of our Data Science initiatives at Yahoo! was to spur the creation of data-driven applications that could measurably impact the bottom or top line. As the name implies, these applications leveraged the results of some underlying data science efforts (scoring algorithms, recommendation models, pricing optimizations) to drive actions taken in Yahoo!s customer and internal-facing applications. The team was structured to work on a commissioned project basis: business unites would request support to build specific applications and back up their requests with detailed business cases. The Data Driven Applications team would then prioritize the long list of incoming requests and methodically tackle the highest-value projects. This model turned out to be a win-win for both SDS and our internal customers — the business units received value-enhancing data-driven applications; and SDS was able to effectively show how the investment in data as a strategic asset was driving true ROI for Yahoo!.
- Data Distribution: a final and important aspect of the strategic data organization is an understanding that in addition to supporting the analytical needs (either via BI support or data science projects) there is also the need to support data distribution. For example, at Yahoo! the core data platform was used to generate segment membership information for billions of users (browser cookies) each day. These profiles needed to be distributed out to the operational systems that consumed them — the ad targeting platforms — so it was important to have the appropriate infrastructure and APIs to allow the consumers of these large data sets to access and move them. Additionally, there were consistent demands to provision subsets of the data in the core data platform to other consumers both inside and outside of Yahoo!. The Data Distribution challenge is one that many of our Greenplum customers today are started to struggle with as well, and it’s important to think about it when scoping out a big data strategy.
Dive Right In. The Water’s Warm!
Now I can’t guarantee that the structure described above is perfect for every organization — there are likely variations of this perspective that have worked for other successful data groups. However, I do think the emerging themes are consistent, and that if you consider the above elements while diving in to the Strategic Data Organization waters, you’ll be more likely to achieve success.
At the end of the day, there is a bit of a leap of faith required to make a strategic bet on big data. But the data shows that it’s worth it. A recent article in the Harvard Business Review revealed: “In particular, companies in the top third of their industry in the use of data-driven decision making were, on average, 5% more productive and 6% more profitable than their competitors.”
About the Author
BiographyMore Content by Josh Klahr