Pivotal People–Sarah Aerni on How To Become a Data Scientist

April 23, 2015 Sarah Aerni

sfeatured-saerniData science continues to be a growing field, and, in this post, Pivotal Principal Data Scientist, Sarah Aerni, answers 7 questions, sharing the profile of colleagues on the Pivotal Data Labs team, the type of people Pivotal recruits in data science, and what it’s like working on some of the world’s most compelling data science projects.

In her past, Sarah has been a researcher, consultant, and entrepreneur. She graduated from UCSD with a biology degree and specialization in bioinformatics then completed her masters and PhD in Biomedical Informatics at Stanford.

Why should someone pursue expertise in data science?
Three thoughts come to mind.

One. As someone who works for one of the largest data science teams that supports broad industry experience and expertise, I can tell you firsthand that it is an area of tremendous growth. Pivotal is recruiting and has a significant need for more data scientists because data science is integrated into everything we do at Pivotal—mobile, IoT, cloud, agile, and big data. We get to work on some of the most amazing data science projects for some of the most amazing companies on the planet. We make a difference.

Two. From another perspective, many talk about the demand for data science skills in general. Perhaps you’ve run across articles published in trade magazines, like Network World, where people say data scientists get 100 recruiter emails a day. There are also executive recruiters who research the area, and they say some data scientists are earning more than doctors or lawyers. Or, maybe you saw McKinsey Global Institute’s study—saying that, by 2018, the U.S will have a shortage of 140,000 to 190,000 people with deep analytical expertise or a 50-60% gap between supply and demand. There is a need, and it is valuable. Data is only growing, and we need passionate people to mine for the golden nuggets.

Three. If you are in another field or in college, data science is not a brand new field. It is actually a rebranding of a number of different fields that are now enabled by massive advances in technology, and these advancements let you do some pretty extraordinary things. When you step back and look at the big picture, it is really quite amazing to use numbers and math with the right data and predict an outcome—or even change it. I think years ago it was all just speculative. We could imagine building models to predict if a hospital patient’s condition might deteriorate, and we could transfer them to the ICU before it happened, monitoring them more closely and avoiding a catastrophic outcome. Well, today you can do just that as covered in this Pivotal case study and it’s really just math and data! The work can have a big impact. It’s innovative and creative thinking in a technical realm. You can also change industries fairly easily—moving across financial services, healthcare, security, technology, oil and gas, bioinformatics, and others.

Why is data science so popular and why do people get excited about it?
Alongside big data economics, there have also been great, recent strides in the areas of machine learning, text analytics, natural language processing, and similar areas within data science along with open source tools. Together, we enter this new realm of finding insights in massive amounts of data. It crosses every industry, and the ability to provide value is clearly proven. Some people enter the field to save or improve lives, some get excited about sales and marketing innovations, and others want to focus on cost savings or process optimization. For example, we’ve worked on projects covering the potency of vaccines, TV viewer behavior, teen crisis hotlines, financial compliance, churn prediction, and much more. One of the more interesting, cool projects had to do with car data. We combined surveys, service visits, and sensor-based manufacturing data. With this combined data set, we looked at what is happening over a car’s lifetime—we were looking for relationships between manufacturing issues and consumer happiness to improve engineering.

Why is Pivotal a great place for data scientists to work?
Pivotal has world-class technology solutions for big data and data science—Hadoop, Pivotal Greenplum Database, support for Apache Spark™, Pivotal HAWQ (SQL on Hadoop), and MADlib. As well, Pivotal GemFire was just open sourced as Project Geode. We also have amazing data science customers and unparalleled opportunities to work on really innovative projects. Also, I would say our team is awesome—Pivotal Data Science Labs is full of smart data geeks who love sharing ideas and collaborating. Lastly, we are a leader in the space. In fact, our business unit leader spoke at the White House about 18 months ago. It’s not like we’re just kids in a candy store. It’s like we’re kids in a chocolate factory (that won’t go wonky) with only the best tools and the best ingredients—the customer data and the creative people to make the most amazing candy in the world!

What do people need to be good data scientists, particularly at Pivotal?
The number one thing is probably curiosity and number two is tenacity. To be a good data scientist, I think you have to have this underlying desire to understand data as much as possible and push through any challenges in understanding it. You might see something weird, and you just have to know why. You have to interrogate your own models to figure out why it worked and, just as importantly—why it didn’t. Regarding tenacity, you can’t say “oh well” and move along. You have to dig in and figure out why your model is not performing well, try multiple angles, and see if you can discover a better way of doing it. You also need to learn about new tools all the time in a quickly changing landscape.

Third, I would say you need to communicate and work well with people. For example, it’s important to collaborate with team members, incorporate feedback from domain experts, and present findings to executives. Our models often propose big insights, and it’s natural for others to want to make sense of them without always understanding the math underneath.

Of course, in all this, data scientists should naturally be passionate about technology and using it to crunch data and numbers. And that tenacity comes in to play here too. Sometimes, you have to come up with clever ways to represent things and really get down to the bits and bytes. You have to be creative and not just say “oh well, there isn’t a tool that does it”. After all, we are all still scientists. Searching for answers that haven’t been found before, to questions that may not have been asked before. If there is a boxed solution to it, then it doesn’t need a scientist.

What skills and experience have you seen port over to data science well—in terms of alternate skill sets, what types of backgrounds would Pivotal consider hiring?
We would look at people who do research and analysis with a lot of data—for example, biomedical informatics, electrical engineering, or operations research. Also, if people know basic programming but use it with a lot of data, they can be quite a good fit. In financial services, you find quantitative analysts—people who specialize in mathematical and statistical methods. These folks are very good as long as they don’t stay focused on answering questions in a knowledge driven way. Instead, we need to let the data take you places instead of coming out and asking a question. Actuarial science is the same. It is also fairly easy to migrate solid engineering skills with a strong interest in data. It’s the same with programmers. If you are a computer science engineer who is excited by A/B testing, how predictive models work, or how data could change your product, you probably could move into the field.

One question people ask is about a PhD—it is not necessary.

About the Author

Biography

More Content by Sarah Aerni
Previous
Pivotal Extends HAWQ, The SQL On Hadoop Engine, To Hortonworks HDP
Pivotal Extends HAWQ, The SQL On Hadoop Engine, To Hortonworks HDP

Pivotal continues to make quick progress on our mission to make our industry leading Big Data Suite product...

Next
All Things Pivotal Podcast–A/B Testing Using Pivotal Cloud Foundry
All Things Pivotal Podcast–A/B Testing Using Pivotal Cloud Foundry

Based on a user suggestion, Simon explores how Pivotal Cloud Foundry makes it super-easy to deploy, maintai...