January 2026 – Testing Tester

When I stepped into my current role, I inherited a testing practice that was doing its best to keep up with a rapidly accelerating development pace. There was one exceptionally talented tester managing a large and growing TestRail suite, while the engineering team was increasingly using AI-assisted development to ship features faster.

The result was predictable, even if it wasn’t immediately obvious how to respond. The number of test cases kept ballooning. Regression testing before a release was estimated at four full days of manual effort. Every new feature added more tests. Every release increased the perceived cost of confidence.

When I arrived as a quality and release manager, I likely added as much confusion as clarity. I was still learning the system, still trying to understand how everything fit together, and at the same time being asked to help scale quality and reduce release risk. I didn’t yet know which test cases truly mattered, which were redundant, and which were artifacts of past assumptions that no longer held.

The Growing Problem: More Tests, Less Understanding

Over time, it became clear that the problem wasn’t just the volume of test cases. It was the loss of shared understanding.

Regression estimates kept growing, even as we talked about improving coverage. But we couldn’t even agree on what “coverage” meant. We didn’t have a stable denominator. We had hundreds of test cases, but no clear sense of how they mapped to the system as it actually existed.

I also didn’t have the luxury of running every test manually to learn what should stay and what could go. The system was too large, releases were too frequent, and my own context was still forming. Before we could meaningfully reduce regression time or talk about automation strategy, we needed to see what we had.

Treating Test Cases as Data, Not Just Artifacts

The shift came when I stopped treating TestRail as the only way to understand our test knowledge.

I exported the entire test suite as a CSV and pulled it into Cursor. Instead of reviewing test cases one by one in the tool, I worked with them as a dataset. With AI assistance, I was able to identify near-duplicates, surface overlapping coverage, and filter out cases that were tied to short-lived or non-perennial behavior.

As I worked through the data, I began grouping test cases into categories based on an emerging understanding of the system itself. That structure wasn’t predefined. It surfaced gradually as patterns appeared. The shape of the system is still emerging, and the categories will continue to evolve, but they gave us something we didn’t have before: a way to reason about the whole.

From there, I transformed the cleaned and categorized test cases into JSON files. The goal wasn’t immediate execution. It was to create a kind of test case API, a structured, machine-readable representation of what we believe should continue to work.

I then committed those JSON files to a code repository. That step turned out to be more important than I initially expected. Having the test cases under version control made them visible, reviewable, and changeable in the same way as production code. They became a living representation of our current understanding of the system, rather than static artifacts trapped inside a tool.

Putting the test cases in a repo also opened the door to treating them as assets that could evolve alongside the product. They could be refactored, discussed, and eventually consumed by automation tooling as part of the delivery pipeline, rather than existing only for manual execution.

Momentum, Deep Work, and an Unexpected Acceleration

I should also say this work didn’t unfold the way I initially imagined.

What I thought would be an eight-week project, tackled in fits and starts alongside other work, ended up becoming a concentrated two-day effort over a holiday. Once I had the space to focus deeply, without competing urgent priorities, the shape of the work came into view much more quickly than I expected.

That experience was instructive in itself. It wasn’t that the problem was trivial. It was that the work required sustained attention more than raw effort. Given uninterrupted time, the task of extracting, cleaning, categorizing, and restructuring the test cases moved from feeling overwhelming to feeling possible.

This shift was also influenced by ideas I’d been turning over after listening to Ben Fellows on Joe Colantonio’s podcast. In particular, the way Ben talked about test cases as something closer to code than documentation helped unlock a different approach for me. Instead of trying to manage the test suite purely through a tool, I began treating it as a body of logic that could be packaged, transformed, and reasoned about more flexibly.

Important Caveats and Work Still Ahead

This wasn’t the end of the work. In many ways, it was the beginning.

There is still significant effort ahead in standardizing preconditions, steps, and expected results so that these test cases can truly function as reliable, reusable assets. That standardization is what will eventually allow tools like Playwright’s MCP server to do meaningful work on our behalf, not just generate scripts, but reason about behavior, coverage, and gaps.

I’m also very aware that testing goes far beyond test cases. Exploratory testing, observation, questioning, and learning remain essential. Test cases are not the whole of testing.

But they are important.

Good test cases cost money because they’re valuable. They encode long-term truths about the system. They give us clear signals that things that used to work still work. And when they’re structured well, they provide stable ground from which thoughtful exploration can happen, rather than replacing it.

What Changed the Conversation

This work didn’t magically eliminate regression effort overnight. But it changed how we talked about it.

Instead of asking, “How long will regression take?” we could ask:

– Which behaviors are truly perennial? Which deserve stable automated checks, and at which architectural level (API, service, integration, UI, observability)?

– Which tests should remain exploratory by nature?

– Where are we paying maintenance costs without proportional value?

Those questions helped shift the focus from quantity to intent, and from tooling alone to architectural judgment.

A Closing Reflection

This case study isn’t about a clever technical trick or a single tool. It’s about posture.

When systems grow faster than human understanding, adding more artifacts doesn’t automatically increase confidence. Sometimes the work is to slow down just enough to extract the knowledge we’ve accumulated and reshape it into a form we can reason about again.

Test cases, treated carefully, can become an executable source of truth. Not the only one, but an important one. And creating that clarity is often the work that makes everything else possible.

—

Beau Brown

Testing in the real world: messy, human, worth it.

Month: January 2026

A Case Study in Rethinking Test Cases at Scale