Melbourne-based doubleIQ builds, operates and maintains innovative real-time information management systems that solve a range of business problems, quickly and cost effectively for some of the largest companies in Australia. doubleIQ has provided information management systems to clients in banking, insurance, telecommunications, retail and utilities. The company has helped customers develop in-house data warehousing applications to integrate, aggregate and distribute data.
Extending Data Warehousing and Analytics to the Cloud
doubleIQ’s internal data warehousing infrastructure handles market data used to help customers. The company undertakes business intelligence and competitive analytics on the data and then delivers market intelligence data to its clients. doubleIQ has also grown its business to include its own hosted data warehouse and cloud analytics services, giving clients a secure and scalable environment for fast and efficient data access and analysis of big data via cloud-based services.
“Increasingly, our clients are interested in a cloud-based service,” explains Dennis Claridge, Business Director, doubleIQ. “Our cloud service allows us to dramatically streamline and automate the delivery of business intelligence.”
Pivotal Greenplum Improves Performance While Reducing Costs
doubleIQ deployed Pivotal Greenplum as the foundation for its hosted data warehouse and real-time analytics solution for big data and cloud-based services. The company developed a database architecture, using Pivotal Greenplum, to enable a cloud-based data warehouse for its clients. It provides this as either a public or private service.
“We were already using a PostgreSQL open source database,” Claridge notes. “One of the attractions of Pivotal Greenplum was that it is a similar build to the PostgreSQL model and we already had a good understanding of that system. We were also attracted by its scalability and cost. It definitely offered better performance at a lower price than the other databases we considered.”
doubleIQ deployed Pivotal Greenplum across a cluster of five Linux Intel servers, comprising a master node and four segment nodes. Pivotal Greenplum uses a shared-nothing massively parallel processing architecture designed to support business intelligence and analytical processing. The database is structured so that data is automatically partitioned across several segment nodes. Each node owns and manages a portion of the overall data. The servers process every query in parallel, use all disk connections simultaneously and send data between segments as dictated by query plans.
“Using Pivotal Greenplum, we’ve built our own processing data warehouse application that is capable of everything from loading the client data to presenting it to end users via a web interface,” Claridge says. “This allows us better control over the entire process and enables us to provide cost and speed advantages to our clients.”
Fast Data Queries
“One of the key things we wanted to see after the deployment was how fast we were able to generate a query and deliver the data back to the end user, regardless of the volume of data involved,” says Claridge. “So far the speeds have been very good. I’d say it’s at least two to three times faster than any comparable alternative system.”
Rapid ETL Processing
At doubleIQ, Pivotal Greenplum contains 3 terabytes of performance data with approximately 14 terabytes of disk space attached. The database handles all the heavy lifting of client data, particularly transaction processing. By using Pivotal Greenplum as a basis for its data warehousing infrastructure, doubleIQ found that ETL (extract, transform and load) tasks are completed much faster than by other systems.
“We’ve run very similar ETL processes for one of our clients using a different database and I know they take around three days to complete,” Claridge says. “On our own infrastructure, essentially the same process takes three hours.”
doubleIQ has also been able to use its data warehouse infrastructure to handle big data more efficiently. Previously, to deal with the billions of rows of data that come from its clients, the company would have to break the data up into smaller packages to process it through the database. Using Pivotal Greenplum has allowed whole data sets to be processed according to the analytical functions being performed, rather than by volume. This has made the processing much more logical and also reduced the time and effort taken to run analytics.
“Breaking down data sets takes as much as 50 percent more time and effort,” Claridge points out, “and because it takes so much time and manpower to get an effective process working and then maintain it, it really erodes our productivity. Pivotal Greenplum removes the necessity for this work. We don’t have to run such a complex set of processes, which means the system is much easier to maintain.”
As demand was growing the business at a rate of 3 billion transactions per year, it was vitally important that the database meet doubleIQ’s scalability requirements. Providing new hosted data warehousing means the company has effectively doubled in size, but Pivotal Greenplum has enabled doubleIQ to maintain staff levels.
“From a capacity planning perspective, the Pivotal Greenplum environment is beautifully predictable,” Claridge concludes. “With a traditional database it’s harder to predict what scaling the environment will do. With Pivotal Greenplum it’s very linear; if we double the size we know that our processing times will typically halve. Pivotal Greenplum offers very simple scaling abilities. It worked very well for us.”