This week, we have the opportunity to sit down and do a Q&A session with Christian Tzolov, one of Pivotal’s technical architects and big data experts—he spends a lot of time working on open source projects related to Pivotal Big Data Suite. In addition to being an Apache® Committer and PMC Member, he has spent over a decade working with various Spring projects and has led several enterprises on large scale artificial intelligence, data science, and Apache Hadoop® projects.
Christian will be speaking at SpringOne 2GX later this month on “Federated Queries with HAWQ—SQL on Hadoop and Beyond.” and at ApacheCon BigData presenting “Leveraging Ambari to build comprehensive management UIs for your Hadoop applications” and “Federated SQL on Hadoop and Beyond: leveraging Apache Geode to build a poor man’s SAP HANA”. He also has developed a number of open source projects and built demonstrations for a wide variety of technologies in the open source space, including the installation of Hue 3 on Pivotal HD 3.0. Several more are listed below.
As a quick promo, we are looking for more engineers with Christian’s talent and expertise!
Would you tell us about how you grew up and got into software?
Sure. I was born in Bulgaria, which is interesting by itself. It was under the control of the Communist Party until 1989, and it began transitioning to a democracy at that point. My brother and I grew up in the town of Vratsa. In my younger years, there were two big things that helped me get interested in computers. One, there was a BBC TV show called Blake 7, and it featured a portable supercomputer called Orac. This was the first time I was exposed to computers. Two, I was in a special math school at a high school age, and it was the first place I saw a computer. It was a clone of the Apple II and was made by Pravetz. It was also my first computer.
So, I ended up graduating in 1990 from the Mathematics Gymnasium in Vratsa—this was at the same time Bulgaria’s political system was being revolutionized! It was an interesting time to graduate from University. After that, I went on to Technical University Sofia and graduated with a Masters of Science, Electronic and Automation Engineering. Then, I started working in embedded software and distributed systems.
Interestingly, my brother took a very different path. He travels the world as a circus actor!
Would you tell us about your work background and how you ended up at Pivotal?
In the 90s, I spent a lot of time with CORBA. It was a big thing and was the main reason I moved to the Netherlands. In 2000, I did research at the University of Twente related to CORBA and distributed systems. Then, I moved to the VU University Amsterdam and joined an international project called NEW TIES. We built a distributed platform where social scientists could run simulations with artificially intelligent agents, and these bots could learn, grow, eat, die, pollute, reproduce, etc. Social scientists were using the simulation to make observations.The project was inspired by a Stanislaw Lem‘s short story called “Non Serviam” and it was the craziest thing I have ever worked on. I lead an international team to build the distributed platform from the ground up. That is where I was first exposed to big data (although back then we just called it a lot of interesting data).
After that, I moved back to the commercial side of things and joined TomTom, where i worked on various location based software systems (LBS). There, I lead a team responsible for the Map Share platform, a collaborative service which allowed customers to make map corrections on their navigation device and share those corrections with the Map Share community of 21 million people. It was a social, local, IoT map application! While at TomTom, I also worked on the Point Of Interest (POI) capabilities—architecting and leading the integration of hundreds of data sources, tagging them with geolocation, and bringing it all together by making them searchable. This is when I introduced Apache Hadoop® to TomTom, and it was the first time I really got to use it in mission-critical applications. We also worked closely with what became Apache Crunch®, a project for MapReduce pipelines. As Crunch was introduced, incubated, and became a top-level project, I became an Apache committer.
While at TomTom, I also realized that I loved the data side of application development, and I decided to take on a big government project using Spring and Apache Hadoop for risk analysis and planning. Then, I heard about Pivotal. I realized they were an umbrella for Spring, cloud runtimes, and big data. This got my attention as they were fusing the main threads of the topics I had already been gravitating toward at a much higher level. So I applied for a job, and here I am.
In your words, what is your role within Pivotal and what challenges do you help our customers solve?
So, I’ve been an R&D guy and a customer in the past, but I have not been in a more consultative role like I am here. So, this is new territory for me. I am part of a field engineering team of specialists who bring a deep level of expertise to customers, and my specialty is big data and Apache Hadoop®. As well, I work on open source projects that complement my Pivotal work—you could say I try to connect the dots between the Pivotal Big Data Suite and what is out there in open source land, particularly with Apache projects.
For customers, there is a spectrum of things I help with. At times, I help with minor problems or operational tasks. Other times, I help with architecture—I love getting involved with higher-level, more abstract, and challenging problems that use various technologies. Most of the time, I work with field engineers and their customers when they need deeper big data expertise.
How does the open source work you are doing align with Pivotal products?
This is one of the reasons I love working here. As an engineer, we get to work a lot with open source technology. Let’s see. There are probably three good examples and a bunch of smaller ones.
The first set of open source contributions have to do with the underlying management platform. Everyone is looking for one management platform in the Apache Hadoop® ecosystem, and, like most, we see that as Apache Ambari®, which is built to provision, manage, and monitor Hadoop clusters. As well, most people probably know that Pivotal helped create the ODP. With the ODP announcement earlier this year, Pivotal began moving even more towards Ambari for management of Hadoop, Pivotal HAWQ, and everything else. Dan Baskette, in Pivotal’s Technical Marketing organization, came to me one Monday and asked how hard it would be to use Ambari Views for basic HAWQ monitoring. These are the challenges I love to solve. Before the week was over, we had a demo of HAWQ monitoring inside Ambari using an embedded web page viewer for Ambari that I built as a quick, first integration step. So, we can pretty much access all Pivotal stuff this way.
Second, I am doing more side projects to help bring all of our data services—Pivotal HD, Pivotal Greenplum, Pivotal HAWQ, Redis, RabbitMQ, Pivotal GemFire (Geode)—under Ambari management along with other pieces like Apache Solr, ElasticSearch for YARN, and Apache Zeppelin. With the ODP, we expect that all the adaptors for our products will also work with Hortonworks and anyone else who is certifying on ODP standards.
Third, I’ve most recently been working with the Apache Zeppelin™ project that is incubating. If you don’t know it, the project provides a web-based notebook for interactive data analytics, much like iPython Notebook. So, I developed Zeppelin interpreters for HAWQ, Greenplum, PostgreSQL and Gemfire/Geode. This allows Zeppelin to be used for use cases outside of Apache Spark™. These scenarios remind me of my previous research work—data scientists will see the power of this quite quickly. In any event, I am looking at use cases for Cloud Foundry and Spring XD too, to extend the lifecycle and concept of Zeppelin for things like streams and continuous queries.
From more infrastructure standpoint, I’ve also built a few things—like a project that leverages a docker image for building Hadoop with Apache BigTop™. If you don’t know BigTop, vendors use BigTop to build distros—it is the cornerstone to sync and integrate the ecosystem. It handles build, packaging, deployment, and interoperability of Hadoop-related projects with testing of packages, platforms, runtimes, upgrades, etc. One layer up from BigTop, I also built PHD3-Vagrant, and it leverages Vagrant and Ambari Blueprints to build Pivotal Hadoop or Hortonworks clusters on demand. You can find other stuff I’ve done on GitHub, and there are a number of videos I’ve put out there on Spring Flo, Spring Flo XD, Geode for Ambari, and Zeppelin.
Since you have worked with various Apache Hadoop and Pivotal products, what is your view on ODP?
Well, I know ODP isn’t fully baked yet, but it definitely cooking and will be soon. Beyond a standard build defined by the ODP, I think Apache Ambari and Apache BigTop are definitely going to play big roles—enterprises really need to be able to build, manage, and monitor their clusters and the whole ecosystem of big data products they’ve deployed in a robust way. Customers will also have more freedom to change and avoid lock in—they can really reduce risk this way. Having been a Hadoop customer and architect, it is very attractive and makes complete sense.
For software companies, any app that reads or writes to Hadoop can certify once instead of against a bunch of different pieces. That is a no-brainer. The concept of an open, community discussion about a data platform also attracts software engineers like me. As our data platform at Pivotal continues to evolve and become more interconnected with our own pieces and the ecosystem, it is going to allow for some really amazing data science and application development. To do this interconnection, Pivotal would need it’s own coordination. Now, that coordination can be created by a much bigger group. This is the way the world works today—engineers like me must stay connected to the open source world and all the cool, innovative things going on there.
So, I am excited to help connect the dots between the pieces with our expanding team of engineers. Right now, it’s a great opportunity with a lot of growth ahead, and it’s a lot of fun.
Last question. What do you like to do in your personal time when you aren’t working?
Well, i enjoy sports, particularly swimming. I am very into martial arts as well. While I’ve been in the Netherlands for 15 years, I still spend a lot of time keeping up with Bulgarian politics and trying to do what I can to help the young democratic movements there. As well, I have a lovely wife and two kids—my son is a little over one year old and my daughter is 4 years old. So, that keeps me plenty busy as well.
Editor’s Note: ©2015 Pivotal Software, Inc. All rights reserved. Pivotal, Pivotal HD, Greenplum Database, GemFire and HAWQ are trademarks and/or registered trademarks of Pivotal Software, Inc. in the United States and/or other countries. Apache, Apache Hadoop, Hadoop, Apache Ambari, Apache Solr, Apache Zeppelin, Apache Geode, Apache Big Top, Apache Zeppelin, and Apache Crunch are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries.
About the AuthorMore Content by Adam Bloom