The Case of Intermittent Test Suites

July 14, 2019 Rizwan Reza

I've never worked on a system level test suite that wasn't flaky. It feels like an eventuality at this point. Starting a Greenfield project where I'd promise myself that this time it won't happen, and after a few months, I am where I was for any other decent sized project: A flaky test. This gave me frustration, lack of confidence in my test suite and felt downright ugly when I tried to patch it up by adding sleep statements.

Later I learned that an integration test suite is bound to be flaky. This is the cost of all the abstractions, distributed nature of today's software, and network calls. Something somewhere doesn't respond according to contract and the lack of resiliency everywhere in the system leads to flaky test suites. It's even more frustrating when errors are unhelpful.

At Pivotal, the PAS Release Engineering team is responsible for shipping PAS at a sustainable pace with not only the latest patches and security fixes, but large sweeping features that result in refactorings and major additions to the heavily distributed codebase contributed to by over 50 teams.

As a result, our integration test suites exhibit flakiness. I consider anything flaky if I can run the same failing test with the same inputs and get it to green. We experience it more than any other team because of the number of sheer pipelines we have - a direct result of the number products and versions we support. As of today, we are supporting four backport version lines, one forward version across 4-5 different products[1]. Each version and product combination is run through 7 different scenarios, leading to 7 integration test suite runs every single time we want a change to be tested.[2]

Safe to say that our team has seen its fair share of flaky tests in our pipelines. And there is a heavy cost to a red pipeline given the nature of our team. Our team enables our products to go out to customers and if our pipeline is red due to an intermittent failure, it means the assembly line shuts down until human intervention. No bueno!

The Crossroads

I always thought putting the oxygen mask on yourself before a child sitting next to you felt selfish and less altruistic, but my dad taught me why: You don't want to die putting that oxygen mask on. To be capable to help others, you've to first help yourself.

Okay, this is a bit of a stretch while I am talking about intermittent tests, I digress but I will get to it.

Cloud Foundry test suites are written in Go using Ginkgo, which conveniently provides a flag to attempt running a failed test in isolation before it reports a test case as a failure. While it gives you that functionality, the docs warn you not to use this to cover bad tests.

Now, should a team that's a bottleneck to ship features out to customers use that flag in our test suites or not?

During my tenure of over two years on the team, we've brought up this question over and over as new team members have joined and tried to unravel the answer to this question.

Why You Should NOT Use the Flake Attempts Flag

Intermittent test suites result from non-resilient software. If there was error handling for every possible scenario in our code, we probably wouldn't have intermittent failures. Every time a failure occurs in the system, it's an opportunity for improvement. It's an opportunity to report to the maintaining team, follow through a Github issue, and see through that failure getting fixed in the test suite, or better yet, in the underlying code. Better yet, it's an opportunity to roll up your sleeves, dive deep into the codebase, and code up a pull request. This is what a good citizen would do.

On our team, every time we brought up this discussion, we had decided not to add this flag in our suites so that we can bubble up the failures to the maintainers and the corresponding component team to get it fixed.

Spoiler Alert: We Actually Added That Flag

After we saw our latest personnel rotations in the team[3], we were bound to have this discussion yet another time. I saw it coming, but one thing was different. We decided to pull the plug and actually add this flag. We had a deliberate discussion on this, here are some reasons:

PAS RelEng team does not maintain the test suite. Our system's goal is to detect faults, unhandled exceptions, and failures in the system under test. Intermittent test failures pollute that goal.
We have added a reporting system using Honeycomb that has the ability to detect failures. These failures are shared with the maintainers automatically.
Reducing the amount of flakiness in our system helps us deliver products to the customers reliably.

Am I happy and satisfied with this? No, but I think it allows us to concentrate at more important problems at hand as Release Engineers: efficiently shipping products out to customers. Handling flakiness PAS as a platform product is a responsibility our team aims to improve upon for the best possible day zero experience.

And to put it bluntly, this is an act of putting the oxygen mask before helping the child.

[1] We dropped releasing PASW-2012 when 2.5 was released.

[2] Of course, there is more to it than that. We run each change through canary pipelines, and only fanout once we are happy with a set of fixes together.

[3] At Pivotal, engineers in teams rotate frequently across different teams. A single tenure on one team most times last 6-9 months.

About the Author

Rizwan Reza is a Principal Software Engineer at Pivotal.

Roadmaps—Useful or legacy practice?

Roadmaps for software products are often used as funding and commitment mechanisms leading to a host of ine...

4 types of employees you'll need to manage carefully during digital transformation

There are four common types of employees who are at risk of becoming alienated or unhappy inside transformi...

The Case of Intermittent Test Suites

The Crossroads

Why You Should NOT Use the Flake Attempts Flag

Spoiler Alert: We Actually Added That Flag

About the Author

Previous

Next

The Case of Intermittent Test Suites

The Crossroads

Why You Should NOT Use the Flake Attempts Flag

Spoiler Alert: We Actually Added That Flag

About the Author

Previous

Next

Related content in this Stream

Introducing VMWare Tanzu Data Hub, a self-managed Database as a Service (DBaaS) Platform, providing enterprises a way to host their internal DBaaS offering for internal business users.

In the cloud-native landscape, MCAs drive seamless compliance integration. Their expertise ensures proactive security measures align with regulatory standards for sustained innovation & collaboration.

Tanzu Application Platform brings innovation faster with more frequent feature updates. With 1.9, take advantage of enhanced DORA metrics visibility and improved compliance options for companies.

We’re excited to share some great news! Spring Academy Pro content is now free. It will be available to everyone who registers a work, vocational, or educational email address.

March 28, 2024, marks the official minor release date of Spring Cloud Gateway for K8s version 2.2, and it's set to optimize how developers protect access to their GraphQL services.

We are excited to announce that VMware Tanzu Application Service 6.0 is now generally available!

Get a clear picture of your OSS supply chain, and the risks you face from your open source software dependencies, using the all-new Tanzu OSS Health Assessment.

Trivy can now utilize CSAF VEX data to filter out false positives in CVE reports, maximizing the value of VEX documents in VMware Tanzu Application Catalog.

Bitnami-packaged open source software container images available in DockerHub are now signed by Notation, an implementation of the Notary Project specifications and a CNCF-incubating project.

There’s never been a better time to be a Java and Spring developer! Let me show you why with a sneak peak into JD Conference 2024.

If you're into FinOps, you've probably heard of FOCUS. Introducing our FOCUS FlexReports template for AWS, Azure, and GCP. Turn your cloud bills into FOCUS-compliant reports in minutes!

The latest Spring Boot simplifies infrastructure setup with Docker Compose. Now, supporting Bitnami images, it opens new possibilities for developers. Exciting times ahead!

Shape the future of Spring! Participate in the State of Spring Survey 2024. Share insights, collaborate with the community, and drive innovation.

Extend Apache Tomcat support with Tanzu Spring Runtime. Seamless transition, enhanced security, and uninterrupted workflow for Java applications.

Welcome to another edition of What’s new with Tanzu Application Catalog. This is a quarterly round up of all things related to Tanzu Application Catalog.

As we stand at the threshold of a new era in data management, Greenplum continues to lead the industry with its commitment to innovation.

Experience enhanced security with Tanzu Application Platform. Elevate your organization's defenses from code to build with SLSA Level 3, image scanning scheduling & automatic upgrades for new patches.

Explore Spring's exceptional NPS score of 75, surpassing industry benchmarks by 18%. Discover why it matters.