When It Comes To Big Data, Cloud And Agility Go Hand-in-Hand

May 16, 2016 Jeff Kelly

 

sfeatured-36016-BigDataCloudAgility-HandinHandAgile development is as close to doctrine as we get at Pivotal, and it’s core to how we build software with our clients. Yet, agile concepts are mostly used by developers. It is now time for data professionals to embrace agile development. And with the help of cloud analytics, we can.

Agile software development focuses on developing software in small, iterative batches, which helps you ship software faster and respond quicker to feedback from users. Agile software development is still pretty new for most enterprises, but the concept has gained widespread acceptance as superior to traditional software development methods. Even Gartner is bullish on agile software development practices like DevOps.

Why Aren’t Data-Centric Efforts More Agile?

Unfortunately, many of the same enterprises that have adopted agile software development practices for building applications don’t always apply agile methodology to data management. At least not until recently. The traditional way of building software—spending an eternity building the “perfect” software before deploying it, only to find user requirements have changed in the interim or there’s a bug buried somewhere deep inside the code with no easy way to find it—mirrors the way most enterprises still treat data.

Traditionally, data warehouse and analytics pros spend weeks and months gathering requirements from the business. Then, they spend even more time rationalizing those requirements into a workable framework. And, yet more time is spent building the perfect data warehouse and data model to provide a single version of the truth. The whole process can take months or years! And, many projects get dropped as failures well before completion—but not before hundreds of thousands of dollars are wasted.

For those data warehouse projects that deploy to end users, many end up being of little use. The questions they were designed to answer are no longer the questions the business needs answers to. Think about your own business. Are the insights you wanted or needed 18 months ago the same as they are today? Probably not. Making matters worse, the monolithic nature of a traditional RDBMS doesn’t allow for horizontal scale, and the brittle nature of data models means it’s no easy task to change and adapt to new business requirements.

In my opinion, it’s time for data professionals to embrace agile development practices, focusing on answers to today’s business requirements and iterating over time. According to a new survey by EMA Research, it appears many of you agree with me. The survey found that a majority (59%) of enterprises have recently turned to the cloud for analytics workloads and find them to be essential or important. Some of the reasons are financial, as you might guess—trading large, upfront CapEx for smaller, ongoing OpEx, for example. But what jumped out at me—was that nearly two-thirds of the survey respondents said they are deploying analytics workloads in the cloud to support agile development methodology. Specifically, 64% of respondents to the EMA survey said supporting agile development was critical or extremely critical to the success of their analytics projects and, in part, prompted their move to the cloud.

Agile and Data, Hand in Hand…in the Cloud

In the software development word, cloud and agile go hand-in-hand. The same is true when it comes to data. The cloud supports agile analytics development in a number of ways.

  1. On-Demand Environment: With public cloud infrastructure and services from providers like Amazon Web Services and more specialized cloud-based analytics service providers like U.K.-based Aridhia, data practitioners can spin up new analytics clusters in minutes with just a few keystrokes, significantly reducing time to insight. There’s no need to procure hardware, configure servers or deploy software. This makes it easy to launch new analytics sandboxes and respond to quickly evolving business requirements—after all, doesn’t every developer need their own sandbox?.
  2. The Right Tool for the Right Job: Public cloud providers support various analytics engines and technologies, such as Hadoop and MPP data warehouses like Greenplum Database. So, practitioners can more easily tap the right tool for the right job—we live in a world of polyglot persistence. It de-risks analytics projects—as there are no expensive, up-front licensing fees for a product you might only use for a limited amount of time. This encourages data professionals and even business users to experiment more with analytics.
  3. Performance, Performance, Performance: Performance is critical when it comes to supporting iterative analytics. Queries must return answers at the speed of thought. That requires some heavy-duty infrastructure, especially when there are large numbers of concurrent users hitting the system with complex queries simultaneously. Public cloud-based databases have massive computing power supporting them, meaning they can stand up to even the most demanding analytical workloads without users or admins having to lift a finger.
  4. Collaboration and Access: Cloud-based analytics also promotes collaboration, and analytics and data science are collaborative disciplines by nature. With cloud-based analytics, data practitioners and other end-users can access analytics tools and technologies over the internet from whatever device they happen to be using, from wherever they happen to be, in order to collaborate on projects. All that’s needed is a solid internet connection.

If you can get access to sandboxes as you need them, have alternative choices for certain persistence scenarios, and not have to worry so much about a big fail, then the cloud becomes a more attractive environment for supporting iterative, agile analytics. That’s one reason the aforementioned Aridhia chose to make its clinical healthcare platform—which leverages Pivotal Greenplum for advanced, large-scale analytics—available as cloud-based service. Data scientists and clinical researchers access Aridhia’s platform, called AnalytiXagility, from the cloud and can quickly spin-up new analytics environments to support the development of treatments for conditions like diabetes and heart disease. The platform is highly collaborative, providing data scientists a single environment to tackle thorny analytics problems, and having it available via the cloud enables users to work together and iterate off one another’s’ work.

If you’re a data scientist, business analyst or even a more casual business user, it makes sense to consider cloud-based services to support analytics and an agile data mindset. This doesn’t mean you should move all your analytical workloads wholesale to the cloud, but think strategically about when it makes sense to use cloud-based services over traditional, on-premises data warehouses and related tools. The cloud can help you be more agile and iterative in your approach to analytics, which will ultimately lead to quicker, better answers to your business challenges.

To learn more about agile analytics in the cloud, check out this webinar I recorded with EMA’s Lyndsay Wise, who conducted the study referenced above, and Pivotal’s vice president of products, Ian Andrews. In it, we dive in deeper to this topic and explore the implications for data scientists and developers.

 

About the Author

Jeff Kelly

Jeff Kelly is a Principal Product Marketing Manager at Pivotal Software. He spends his time learning and writing about how leading enterprises are tapping the cloud, data and modern application development to transform how the world builds software. Prior to joining Pivotal, Jeff was the lead industry analyst covering Big Data analytics at Wikibon, an open source research and advisory firm. Before that, Jeff covered data warehousing, business analytics and other IT topics as a reporter and editor at TechTarget. He received his B.A. in American studies from Providence College and his M.A. in journalism from Northeastern University.

Follow on Google Plus Follow on Twitter More Content by Jeff Kelly
Previous
Fail Fast And Ask More Questions Of Your Data With HDB 2.0
Fail Fast And Ask More Questions Of Your Data With HDB 2.0

Pivotal HDB 2.0, the Hadoop Native Database powered by Apache HAWQ (incubating), became generally available...

Next
Spring Cloud Stream: The New Event-driven Microservice Framework
Spring Cloud Stream: The New Event-driven Microservice Framework

The new Spring Cloud Stream 1.0 release is now generally available. A new microservices project within Spri...

×

Subscribe to our Newsletter

Thank you!
Error - something went wrong!