One of the key misconceptions in today's testing landscape is that testing in production requires only three skill sets: process expertise, knowledge of automated testing tools, and incident management capabilities. These are important, but conspicuous by its absence—not only in this list but in the minds of hiring managers—is something one would think would be the most fundamental:
The ability to design a practical test for production environments.
It would be comical if it weren't so alarming that almost no organization prioritizes key QA skill expertise when implementing testing in production strategies.
Whether a test is executed manually or through automation in a live production system, you must first know how to design one that delivers actionable insights without disrupting user experience, right?
It would be comical if it weren't so alarming that almost no organization prioritizes test design expertise when implementing testing in production strategies. Whether a test is executed manually in a controlled environment or automated in a live production system, you must first know how to design one that delivers actionable insights without disrupting user experience, right?
There's one obvious problem with excluding expertise in test design from your testing in production approach:
Isn't it essential to know that the person implementing tests in your production environment understands what constitutes a meaningful test?
Since automating poorly designed tests in production gives you unreliable data more quickly and creates unnecessary risk? And being "agile" while testing in production incompetently doesn't strike me as a real improvement, except perhaps to the scrum master.
The importance and discipline of test design, especially for testing in production scenarios, are long overdue for revival. Consider this my contribution to that effort.
What Is Essential for Effective Testing in Production?
The first step toward mastering testing in production is to make a few essential distinctions that will undoubtedly already be familiar to most QA people intuitively, but which are rarely presented systematically in production testing contexts.
Let's explore these fundamental concepts that will transform your testing in a production approach.
Explicit vs Implicit Functionality in Production Environments
Let's begin with the critical distinction between explicit and implicit functionality when testing in production.
The former is what most of us think of as "functionality." It refers to features and capabilities that are formally specified by the Product and whose formal specification guides its implementation in Engineering. When testing in production, these explicit functions are typically the focus of monitoring and observability tools.
Because of this, testing explicit functionality in production may seem straightforward, but even with well-defined features, this is not the case, as we will see later. At least the level of specificity possessed by explicit functionality makes it easier to build a scaffolding of production tests around it (or the illusion of that scaffolding).
Implicit functionality in a production environment is a very different animal.
It consists of behaviors and responses to user or environmental inputs that were not formally defined or anticipated. Failure to adequately design tests for this implicit functionality in production is, by far, the source of most critical bugs discovered after the product has been deployed (the other is inadequate testing in fringe hardware/software/device environments).
In other words:
Testing implicit functionality in production requires considerable ingenuity and imagination. It is the truly creative part of production testing strategies.
Testing implicit functionality in production environments requires a certain amount of fiendish cleverness to become good at, and sadly, that cannot be taught through standard QA processes.
No amount of agile methodology or automated testing frameworks will teach you how to test implicit functionality in production effectively, but proper test design can encourage it.
So, how do you define a test design strategy for implicit functionality in your production environment? Fortunately, it can be done. But before we dive directly into that question, let's explore a few further relevant distinctions critical for effective testing in production.
Positive vs Negative Testing
Key Takeaway: When testing in production, both positive and negative testing require special considerations that go beyond traditional QA approaches.
Most people working in QA understand the basic difference between these two testing types:
- Positive testing in production: Validating that features work as designed in live environments
- Negative testing in production: Strategically testing how your system responds to unexpected inputs without disrupting real users
Why Production Testing Is Different
Though these distinctions are conceptually clear, testing in production introduces unique challenges:
- Higher stakes: Edge cases affect real customers and business operations
- Real-world complexity: User behaviors rarely follow predictable patterns
- Business continuity: Testing must not disrupt normal operations
Beyond the Specification
Specifications rarely provide everything needed for effective testing in production:
Being overly dependent on the spec—whether from Product or Engineering—limits your thinking. This is especially problematic when testing in production where real-world usage patterns often deviate from specifications.
This illustrates what I call "the empirical fallacy": waiting for something to explicitly tell you what to do before you can understand it.
Successful test design for production environments requires:
- Proper functionality parameterization
- Logical thinking about what's possible
- Consideration of both user and environmental interactions
Real-World Example: Unexpected Behaviors
Back in the software Cretaceous (i.e., the 80s), I tested document conversion software and discovered surprising capabilities:
Show Image
Classic example: In WordPerfect, you could insert a line spacing command mid-paragraph affecting only subsequent lines, creating paragraphs with two different spacing values.
Modern equivalents when testing in production:
- Unexpected user permission combinations in cloud applications
- Unanticipated API request sequences in microservice architectures
- Race conditions in high-concurrency environments
Now let's explore the three key parameters that will sharpen your production testing approach:
1. Feature Scope Testing
Definition: Identifying how users might deploy features in contexts or ways never imagined during design.
Why It Matters for Testing
Not all constraints are foreseen during development. In production environments, this oversight can have serious consequences.
Beyond Simple Validation
When testing in production, consider these boundaries:
Remember: Validating a feature in production isn't just confirming it works as designed, but verifying it cannot be used in unintended ways that might impact system stability or security.
2. Workflow Interruption Testing in Live Systems
Warning: This approach assumes you're already testing defined workflows in production. If not, address those fundamentals first!
Beyond the "Happy Path"
When testing in production, explore these critical workflow disruptions:
✅ Interrupted workflows: Process started but never completed
✅ Canceled operations: User explicitly cancels mid-process
✅ Restarted processes: User attempts the same action multiple times
✅ Backtracking: User returns to earlier steps with different inputs
Implementation Tips for Production Testing
Use techniques like feature flags to safely control the exposure of these tests in production environments.
Real-World Applications
Pro tip: These scenarios are rarely captured in specifications but represent real user behavior in production.
3. Sequentiality in Production Testing
The Challenge: In production systems with multiple concurrent processes, the sequence of operations can lead to unexpected failures that are difficult to reproduce in test environments.
Two Critical Sequentiality Patterns
Antecedent Contributing Conditions
Show Image
- Definition: Events that must happen before a failing process
- Characteristics: May only manifest after specific sequences that take days to naturally occur
- Example in production: User permissions modified → cache expires → specific API called
Subsequent Contributing Conditions
- Definition: Failures that only become apparent through later interactions
- Manifestation: The failure remains masked until a later operation
- Example in production: Data corruption occurs silently until a report is generated
Why Is This So Challenging?
Defining the scope of sequentiality testing in production requires:
- Intimate knowledge of system interactions
- State transition modeling for complex processes
- Understanding of both intended and unintended state combinations
"Like those dual line spacing values in a single paragraph, these unexpected states can wreak havoc in production environments."
Essential Tools for Sequentiality Testing in Production
- Chaos engineering to deliberately introduce controlled failures
- Observability tools to track state changes across systems
- Distributed tracing to follow request paths through microservices
Bottom line: Sequentiality issues represent one of the richest sources of serious bugs in modern production systems. Prioritize this aspect in your testing strategy.
The Bottom Line
Feature flags transform testing in production from a risky practice into a controlled, data-driven process. By decoupling deployment from release and providing instant rollback capabilities, they create a safety net that enables teams to validate software in real conditions without compromising user experience or system stability.
Feature Management Platforms in Production Testing
Feature management platforms have transformed testing in production from a risky endeavor into a controlled, systematic process. Yet many organizations fail to leverage these tools effectively within their test design strategy.
The Evolution of Risk Management in Production
Traditional approaches to testing in production involved binary decisions—either a feature was live for everyone or for no one. Feature management platforms fundamentally change this paradigm:
- Granular control: Test features with specific user segments rather than all-or-nothing deployments
- Instant remediation: Disable problematic features without code deployments or rollbacks
- Progressive exposure: Gradually increase user exposure based on real-time performance data
Beyond Simple Feature Flags
While basic feature flagging has existed for years, modern feature management platforms provide critical capabilities that transform production testing:
When integrated properly into your test design strategy, feature management platforms enable:
- Targeted risk containment - Limit exposure of new features to specific user segments
- Real-user validation - Test with actual users rather than synthetic test data
- Immediate remediation - Address issues without emergency deploys or rollbacks
Integration with Observability Tools
The true power of feature management platforms emerges when they're integrated with observability and monitoring systems:
Traditional Approach:
- Detect issues after full deployment
- Manual verification of test results
- Post-mortem analysis after incidents
Feature Management Integration:
- Correlate feature activation with system metrics
- Automated performance monitoring per feature flag
- Real-time insight into feature impacts
Real-World Impact: Transforming Production Testing
Organizations implementing feature management platforms report significant benefits:
- Reduced incident severity: "We can test features in production well before a marketing launch. And if a feature causes problems on the day of the launch, we can just turn it off with a kill switch—no rollbacks." — Chris Guidry, VP of Engineering, O'Reilly Media.
- Accelerated release cycles: Teams at IBM, TrueCar, and other leading companies leverage feature management for testing in production, enjoying what they describe as "safe, unceremonious releases."
- Enhanced test coverage: Feature flags expose edge cases that are impossible to simulate in controlled environments.
Practical Implementation in Test Design
To effectively incorporate feature management platforms into your test design strategy:
- Design verification points that align with feature flag transitions
- Create fallback scenarios for each feature-flagged component
- Establish performance thresholds that trigger automatic feature disablement
- Define segment-specific test cases to validate behavior across user populations
- Document flag dependencies to prevent cascading failures
Feature management platforms don't replace thoughtful test design—they amplify it. The quality of your production testing still depends fundamentally on how well you've designed your tests.
Common Pitfalls to Avoid
Even with robust feature management, several challenges remain:
- Flag debt - Abandoned or forgotten flags creating technical debt
- Insufficient monitoring - Failing to correlate flag state with system performance
- Overconfidence - Reducing pre-production testing prematurely
- Flag interdependencies - Creating complex, difficult-to-troubleshoot relationships
The Bottom Line
Feature management platforms provide powerful capabilities for testing in production, but they must be thoughtfully integrated into your overall test design strategy. The organizations gaining the most value don't just deploy these tools—they fundamentally rethink how test design works in a feature-flagged environment.
When properly implemented as part of a comprehensive test design approach, feature management transforms testing in production from a necessary evil into a competitive advantage.
Tools for Testing in Production: Implementation Challenges
While feature flags and management platforms form the foundation of testing in production, the broader tooling ecosystem presents its own set of implementation challenges. Selecting and configuring the right tools requires careful consideration of team capabilities, system architecture, and organizational needs.
Monitoring and Observability Tools
Effective testing in production depends on comprehensive visibility into system behavior:
- Application Performance Monitoring (APM)
Benefits: Provides detailed performance insights across the application stack.
Challenges: Often requires significant instrumentation and can generate excessive data that overwhelms smaller teams. - Log Management Systems
Benefits: Essential for debugging and forensic analysis during production testing.
Challenges: Can generate overwhelming volumes of data without proper filtering and indexing strategies. - Distributed Tracing
Benefits: Offers end-to-end visibility across microservice architectures.
Challenges: Implementation complexity increases dramatically in heterogeneous environments with multiple technology stacks.
Alert Management Systems
Proper alerting is critical when testing new features in production:
- Alert Aggregation Platforms
Benefits: Consolidate notifications from multiple monitoring systems.
Challenges: Great for incident management but can be disruptive if not carefully configured to avoid alert fatigue. - Incident Response Systems
Benefits: Streamline communication during production incidents.
Challenges: Require well-defined runbooks and integration points to be effective; can create overhead for simple deployments.
Implementation Considerations
When implementing tools for testing in production, teams should evaluate:
- Scalability requirements - Will the tool handle your traffic volumes and data growth?
- Integration capabilities - How easily does it connect with your existing toolchain?
- Resource consumption - What overhead does the tool itself introduce?
- Team expertise - Do you have the skills needed to maximize the tool's value?
- Signal-to-noise ratio - Can you extract meaningful insights without drowning in data?
Common Implementation Pitfalls
- Tool sprawl - Accumulating too many overlapping solutions
- Incomplete instrumentation - Missing critical monitoring points
- Alert fatigue - Generating excessive notifications that teams eventually ignore
- Insufficient context - Failing to correlate metrics with user impact
- Data silos - Creating isolated monitoring systems that don't share information
Finding the Right Balance
The most successful testing in production implementations achieve balance between:
- Comprehensive monitoring vs. manageable complexity
- Detailed insights vs. information overload
- Automated responses vs. human decision points
When implementing testing tools in production environments, start small with focused objectives, then gradually expand coverage as your team develops expertise in both the tools and in interpreting the resulting data.
Load, Complexity, Latency
Consider the system macro factors of load, complexity, and latency in your test design. These may affect the execution or completion of a request, process, or event.
The relevance of system load should be obvious. How requests or transactions are processed during periods of high system load stress may fail (at any stage in the transaction process).
This should be a default part of your test planning. As we all know, load can also be generated as a result of the request itself—i.e., a data query that triggers the processing and transmission of vast amounts of data.
Complexity refers here primarily to the complexity of requests asserted against a system. This complexity may consist of the number of conditions specified (and their exclusions and exceptions), the number of databases (virtual or otherwise) implicated in the requests, or the processing topology of the system itself.
In the context of this discussion, I use latency to indicate the introduction of time lapses in the request process, which is not its usual meaning. I am referring here to user latency, not response latency from the system itself.
In other words, how does the feature or capability behave if the user comes to a specific step in the process and then stays paused there, doing nothing? Does the system time out (it probably should)? Should it prompt the user? Should it remain in that state until the end of time?
The answers to these questions may be provided in the specification, but the product under test may not behave that way, which is why we test to begin with.
User Roles
Of course, end users can interact with software systems in various roles. The same user can interact with the system in different roles, depending on their actions.
However, processes and services can also have different roles and associated privileges. In either case, be sure your test design and planning, whether for features or capabilities, considers and exercises all possible role states.
What's Next?
Looking for more test design tips? And want to boost your SaaS growth and leadership skills?
Subscribe to our newsletter for the latest insights from CTOs and aspiring tech leaders.
We'll help you scale smarter and lead stronger with guides, resources, and strategies from top experts!